Language Identification (ID)
Rosoka Software provides language identification for all the languages it supports. The software provides a probabilistic scoring of the languages present in a document, allowing for ready processing of multilingual or code-switched documents.
A Rosoka Software customer measures the volume of languages present within Twitter streams on a weekly basis. They have found that certain news events lead to volume spikes. For example, the Latin language went from almost a non-existent language on Twitter to one of the top ten languages during the week when the Pope joined Twitter. Similarly, major sports events, highly attended concerts, and internationally publicized news events show characteristics of steady volumetric rises of specific languages before and during the event, which then fall off afterwards.