newsmap is a dictionary-based semi-supervised model for geographical document classification. The core of the package is not the machine learning algorithm but multi-lingual seed dictionaries created by me and other contributors in English, German, French, Spanish, Japanese, Russian, Chinese. We recently added Chinese (traditional and simplified) and French dictionaries, and submitted the package to CRAN.
The number of native speakers of these languages accounts for 30% of world population, which is actually much smaller than I though. Creation of Arabic, Hindi and Portuguese dictionaries will increase the population coverage by 12%, but there is a long way to go!