Like many other people, I started text analysis in Python, because R was notoriously slow. Python looked like a perfect language for text analysis, and I did a lot of work during my PhD using gensim with home-grown tools. I loved gensim’s LSA that quickly and consistently decomposes very large document-feature matrices. However, I faced […]
Paper on how to measure news bias by quantitative text analysis
My paper titled Measuring news bias: Russia’s official news agency ITAR-TASS’s coverage of the Ukraine crisis is published in the European Journal of Communication. In this piece, I estimated how much the news coverage of Ukraine crisis by ITAR-TASS was biased by the influence of the Russian government with quantitative text analysis techniques: Objectivity in […]
Newsmap paper in Digital Journalism
My paper on geographical news classification is finally published in Digital Journalism, a sister journal of Journalism Studies. In this paper, I not only evaluate Newsmap’s classification accuracy, but compare it with other tools such as Open Calais and Geoparser.io. This paper presents the results of an evaluation of three different types of geographical news […]
New paper on Russia’s international propaganda during the Ukraine crisis
My paper on Russia’s international propaganda during the Ukraine crisis, The spread of the Kremlin’s narratives by a western news agency during the Ukraine crisis, is published in the Journal of International Communication. This is very timely, because people are talking about spread of “fake news”! The description of the Ukraine crisis as an ‘information […]
Handling multi-word features in R
Multi-word verbs (e.g. “set out”, “agree on” and “take off”) or names (e.g. “United Kingdom” and “New York”) are very important features of texts, but it is often difficult to keep them in bag-of-words text analysis, because tokenizers usually break up strings by spaces. You can preprocess texts to concatenate multi-word features with underscores like […]
Segmentation of Japanese or Chinese texts by stringi
Use of POS tagger like Mecab and Chasen is considered necessary for segmentation of Japanese texts because words are not separated by spaces like European languages, but I recently learned this is not always the case. When I was testing quanteda‘s tokenization function, I passed a Japanese text to it without much expectation, but the […]
Stringiによる日本語と中国語のテキストの分かち書き
MecabやChasenなどのによる形態素解析が、日本語のテキストの分かち書きには不可欠だと多くの人が考えていますが、必ずしもそうではないようです。このことを知ったのは、quantedaのトークン化の関数を調べている時で、日本語のテキストをこの関数に渡してみると、単語が Mecabと同じように、きれいに単語に分かれたからです。 > txt_jp quanteda::tokens(txt_jp) tokens from 1 document. Component 1 : [1] “政治” “と” “は” “社会” “に対して” “全体” “的” “な” [9] “影響” “を” “及” “ぼ” “し” “、” “社会” “で” [17] “生きる” “ひとりひとり” “の” “人” “の” “人生” “に” “も” [25] “様々” “な” “影響” “を” “及ぼす” “複雑” “な” “領域” [33] “で” “ある” “。” quantedaには、形態素解析の機能がないのですが、そのトークン化関数は、中国語のテキストもきれいに、分かち書きをしたのは意外でした。 > txt_cn […]
Visualizing media representation of the world
I uploaded an image visualizing foreign news coverage early this year, but I found that the the image is very difficult to interpret, because both large positive and negative values are important in SVD. Large positive values can be results of intense media attention, but what large negative values mean? A solution to this problem […]
Visualizing foreign news coverage
The challenge in international news research is identifying patterns in foreign news reporting, which cover thousands of events in hundreds of countries, but visualization seems to be useful. This chat summarizes foreign news coverage by the New York Times between 2012 and 2014 with heatmaps, where rows and columns respectively representing the most frequent countries […]
Best paper award at ICA methodology pre-conference
I presented my paper on geographical classification, in the methodology pre-conference at ICA in Fukuoka, Japan. The pre-conference has historical significance as the first methodology group at a major international conference of media and communication studies. There were a lot of interesting presentations, but, to my big surprise, I won a Best Paper Award from […]