I presented my PhD thesis titled “Measuring News Bias in Complex Media Systems: A New Approach to Big Media Analysis” in a departmental event on the 9th June.
Workshops on Japanese text analysis using quanteda
I have presented how to analyze Japanese texts using quanteda in half-day workshops at Waseda University (22 May) and Kobe University (2 June) organized by Mikihito Tanaka (Waseda) and Atshushi Tago (Kobe). Materials for these workshops are made available on Github as Introduction to Japanese Text Analysis (IJTA).
quantedaによる日本語テキスト分析入門
quantedaについてのワークショップを早稲田大学で行いました。資料はRによる日本語テキスト分析入門と題して公開し、今後少しずつ内容を充実させていきます。今後、積極的に日本語テキストについてのワークショップの開催していこうと思うので、興味のある方はご連絡ください。
Upcoming presentation at Waseda University
I am invited to present a new approach to comparative text analysis in a research seminar at Waseda Universtiy (Tokyo) on 17th. My talk is titled Data-driven approach to bilingual text analysis: representation of US foreign policy in Japanese and British newspapers in 1985-2016. Kohei Watanabe will present a new approach to text analysis of […]
Redefining word boundaries by collocation analysis
Quanteda’s tokenizer can segment Japanese and Chinese texts thanks to stringi, but its results are not always good, because its underlying function, ICU, recognizes only limited number of words. For example, this Japanese text “ニューヨークのケネディ国際空港” can be translated to “Kennedy International Airport (ケネディ国際空港) in (の) New York (ニューヨーク)”. Quanteda’s tokenizer (tokens function) segments this into […]
Analyzing Asian texts in R on English Windows machines
R is generally good with Unicode, and we do not see garbled texts as far as we use stringi package. But there are some known bugs. The worst is probably the bug that have been discussed on the online community. On Windows, R prints character vectors properly, but not character vectors in data.frame: > sessionInfo() […]
R and Python text analysis packages performance comparison
Like many other people, I started text analysis in Python, because R was notoriously slow. Python looked like a perfect language for text analysis, and I did a lot of work during my PhD using gensim with home-grown tools. I loved gensim’s LSA that quickly and consistently decomposes very large document-feature matrices. However, I faced […]
Paper on how to measure news bias by quantitative text analysis
My paper titled Measuring news bias: Russia’s official news agency ITAR-TASS’s coverage of the Ukraine crisis is published in the European Journal of Communication. In this piece, I estimated how much the news coverage of Ukraine crisis by ITAR-TASS was biased by the influence of the Russian government with quantitative text analysis techniques: Objectivity in […]
Newsmap paper in Digital Journalism
My paper on geographical news classification is finally published in Digital Journalism, a sister journal of Journalism Studies. In this paper, I not only evaluate Newsmap’s classification accuracy, but compare it with other tools such as Open Calais and Geoparser.io. This paper presents the results of an evaluation of three different types of geographical news […]
New paper on Russia’s international propaganda during the Ukraine crisis
My paper on Russia’s international propaganda during the Ukraine crisis, The spread of the Kremlin’s narratives by a western news agency during the Ukraine crisis, is published in the Journal of International Communication. This is very timely, because people are talking about spread of “fake news”! The description of the Ukraine crisis as an ‘information […]