Text analysis – Page 9 – Kohei Watanabe

Text analysisFebruary 8, 2017February 9, 2017

New paper on Russia’s international propaganda during the Ukraine crisis

My paper on Russia’s international propaganda during the Ukraine crisis, The spread of the Kremlin’s narratives by a western news agency during the Ukraine crisis, is published in the Journal of International Communication. This is very timely, because people are talking about spread of “fake news”! The description of the Ukraine crisis as an ‘information […]

Text analysisJanuary 11, 2017January 18, 2017

Handling multi-word features in R

Multi-word verbs (e.g. “set out”, “agree on” and “take off”) or names (e.g. “United Kingdom” and “New York”) are very important features of texts, but it is often difficult to keep them in bag-of-words text analysis, because tokenizers usually break up strings by spaces. You can preprocess texts to concatenate multi-word features with underscores like […]

Text analysisDecember 2, 2016January 18, 2017

Segmentation of Japanese or Chinese texts by stringi

Use of POS tagger like Mecab and Chasen is considered necessary for segmentation of Japanese texts because words are not separated by spaces like European languages, but I recently learned this is not always the case. When I was testing quanteda‘s tokenization function, I passed a Japanese text to it without much expectation, but the […]

Text analysisDecember 2, 2016January 18, 2017

Stringiによる日本語と中国語のテキストの分かち書き

MecabやChasenなどのによる形態素解析が、日本語のテキストの分かち書きには不可欠だと多くの人が考えていますが、必ずしもそうではないようです。このことを知ったのは、quantedaのトークン化の関数を調べている時で、日本語のテキストをこの関数に渡してみると、単語が Mecabと同じように、きれいに単語に分かれたからです。 > txt_jp quanteda::tokens(txt_jp) tokens from 1 document. Component 1 : [1] “政治” “と” “は” “社会” “に対して” “全体” “的” “な” [9] “影響” “を” “及” “ぼ” “し” “、” “社会” “で” [17] “生きる” “ひとりひとり” “の” “人” “の” “人生” “に” “も” [25] “様々” “な” “影響” “を” “及ぼす” “複雑” “な” “領域” [33] “で” “ある” “。” quantedaには、形態素解析の機能がないのですが、そのトークン化関数は、中国語のテキストもきれいに、分かち書きをしたのは意外でした。 > txt_cn […]

Text analysisDecember 2, 2016December 22, 2019

Visualizing media representation of the world

I uploaded an image visualizing foreign news coverage early this year, but I found that the the image is very difficult to interpret, because both large positive and negative values are important in SVD. Large positive values can be results of intense media attention, but what large negative values mean? A solution to this problem […]

Newsmap is back

Text analysisMarch 10, 2016March 10, 2016

Text analysisDecember 21, 2015December 22, 2019

ITAR-TASS’s coverage of annexation of Crimea

My main research interest is estimation of media biases using text analysis techniques. I did a very crude analysis of ITAR-TASS’s coverage of the Ukraine crisis two years ago, but it is time to redo everything with more sophisticated tools. I created a positive-negative dictionaries for democracy and sovereignty, and applied them to see how […]

Develop efficient custom functions using quanteda v4.0 – Kohei Watanabe on New tokens object in quanteda v4.0April 16, 2024
[…] most important change in quanteda v4.0 is the creation of the external pointer-based tokens object, called tokens_xptr, that allows…
Setting fonts to plot Chinese polarity words in LSS – Kohei Watanabe on New paper on historical geopolitical threats to the USFebruary 19, 2024
[…] models are measuring to others. I am using this function myself in my project on construction of a geopolitical…
New paper on semantic temporality analysis – Kohei Watanabe on New paper on Latent Semantic ScalingAugust 29, 2023
[…] on temporal orientation of texts appeared in Research & Politics. In this study we applied latent semantic scaling (LSS)…
Kohei on Tutorial websites on LSS and Seeded LDAAugust 26, 2023
Please use base R's set.seed() before running the command.
Marli Fernandes on Tutorial websites on LSS and Seeded LDAAugust 24, 2023
I am currently using the seededlda package. I am using the following code: slda <- textmodel_seededlda(dfmt, dict, residual = 2)…