Kohei

Text analysisApril 20, 2017December 22, 2019

R and Python text analysis packages performance comparison

Like many other people, I started text analysis in Python, because R was notoriously slow. Python looked like a perfect language for text analysis, and I did a lot of work during my PhD using gensim with home-grown tools. I loved gensim’s LSA that quickly and consistently decomposes very large document-feature matrices. However, I faced […]

Text analysisMarch 23, 2017March 25, 2017

Paper on how to measure news bias by quantitative text analysis

My paper titled Measuring news bias: Russia’s official news agency ITAR-TASS’s coverage of the Ukraine crisis is published in the European Journal of Communication. In this piece, I estimated how much the news coverage of Ukraine crisis by ITAR-TASS was biased by the influence of the Russian government with quantitative text analysis techniques: Objectivity in […]

Text analysisMarch 2, 2017March 9, 2017

Newsmap paper in Digital Journalism

My paper on geographical news classification is finally published in Digital Journalism, a sister journal of Journalism Studies. In this paper, I not only evaluate Newsmap’s classification accuracy, but compare it with other tools such as Open Calais and Geoparser.io. This paper presents the results of an evaluation of three different types of geographical news […]

Text analysisFebruary 8, 2017February 9, 2017

New paper on Russia’s international propaganda during the Ukraine crisis

My paper on Russia’s international propaganda during the Ukraine crisis, The spread of the Kremlin’s narratives by a western news agency during the Ukraine crisis, is published in the Journal of International Communication. This is very timely, because people are talking about spread of “fake news”! The description of the Ukraine crisis as an ‘information […]

Text analysisJanuary 11, 2017January 18, 2017

Handling multi-word features in R

Multi-word verbs (e.g. “set out”, “agree on” and “take off”) or names (e.g. “United Kingdom” and “New York”) are very important features of texts, but it is often difficult to keep them in bag-of-words text analysis, because tokenizers usually break up strings by spaces. You can preprocess texts to concatenate multi-word features with underscores like […]

Text analysisDecember 2, 2016January 18, 2017

Segmentation of Japanese or Chinese texts by stringi

Use of POS tagger like Mecab and Chasen is considered necessary for segmentation of Japanese texts because words are not separated by spaces like European languages, but I recently learned this is not always the case. When I was testing quanteda‘s tokenization function, I passed a Japanese text to it without much expectation, but the […]

Text analysisDecember 2, 2016January 18, 2017

Stringiによる日本語と中国語のテキストの分かち書き

MecabやChasenなどのによる形態素解析が、日本語のテキストの分かち書きには不可欠だと多くの人が考えていますが、必ずしもそうではないようです。このことを知ったのは、quantedaのトークン化の関数を調べている時で、日本語のテキストをこの関数に渡してみると、単語が Mecabと同じように、きれいに単語に分かれたからです。 > txt_jp quanteda::tokens(txt_jp) tokens from 1 document. Component 1 : [1] “政治” “と” “は” “社会” “に対して” “全体” “的” “な” [9] “影響” “を” “及” “ぼ” “し” “、” “社会” “で” [17] “生きる” “ひとりひとり” “の” “人” “の” “人生” “に” “も” [25] “様々” “な” “影響” “を” “及ぼす” “複雑” “な” “領域” [33] “で” “ある” “。” quantedaには、形態素解析の機能がないのですが、そのトークン化関数は、中国語のテキストもきれいに、分かち書きをしたのは意外でした。 > txt_cn […]

Text analysisDecember 2, 2016December 22, 2019

Visualizing media representation of the world

I uploaded an image visualizing foreign news coverage early this year, but I found that the the image is very difficult to interpret, because both large positive and negative values are important in SVD. Large positive values can be results of intense media attention, but what large negative values mean? A solution to this problem […]

UncategorizedAugust 26, 2016December 22, 2019

Visualizing foreign news coverage

The challenge in international news research is identifying patterns in foreign news reporting, which cover thousands of events in hundreds of countries, but visualization seems to be useful. This chat summarizes foreign news coverage by the New York Times between 2012 and 2014 with heatmaps, where rows and columns respectively representing the most frequent countries […]

UncategorizedJune 30, 2016December 22, 2019

Best paper award at ICA methodology pre-conference

I presented my paper on geographical classification, in the methodology pre-conference at ICA in Fukuoka, Japan. The pre-conference has historical significance as the first methodology group at a major international conference of media and communication studies. There were a lot of interesting presentations, but, to my big surprise, I won a Best Paper Award from […]

Develop efficient custom functions using quanteda v4.0 – Kohei Watanabe on New tokens object in quanteda v4.0April 16, 2024
[…] most important change in quanteda v4.0 is the creation of the external pointer-based tokens object, called tokens_xptr, that allows…
Setting fonts to plot Chinese polarity words in LSS – Kohei Watanabe on New paper on historical geopolitical threats to the USFebruary 19, 2024
[…] models are measuring to others. I am using this function myself in my project on construction of a geopolitical…
New paper on semantic temporality analysis – Kohei Watanabe on New paper on Latent Semantic ScalingAugust 29, 2023
[…] on temporal orientation of texts appeared in Research & Politics. In this study we applied latent semantic scaling (LSS)…
Kohei on Tutorial websites on LSS and Seeded LDAAugust 26, 2023
Please use base R's set.seed() before running the command.
Marli Fernandes on Tutorial websites on LSS and Seeded LDAAugust 24, 2023
I am currently using the seededlda package. I am using the following code: slda <- textmodel_seededlda(dfmt, dict, residual = 2)…