Redefining word boundaries by collocation analysis

Quanteda’s tokenizer can segment Japanese and Chinese texts thanks to stringi, but its results are not always good, because its underlying function, ICU, recognizes only limited number of words. For example, this Japanese text “ニューヨークのケネディ国際空港” can be translated to “Kennedy International Airport (ケネディ国際空港) in (の) New York (ニューヨーク)”. Quanteda’s tokenizer (tokens function) segments this into […]

Newsmap paper in Digital Journalism

My paper on geographical news classification is finally published in Digital Journalism, a sister journal of Journalism Studies. In this paper, I not only evaluate Newsmap’s classification accuracy, but compare it with other tools such as Open Calais and Geoparser.io. This paper presents the results of an evaluation of three different types of geographical news […]

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top