Kohei Watanabe

Text analysisJuly 26, 2022July 26, 2022

Computing robust polarity scores using LSS

One of the advantages of Latent Semantic Scaling (LSS) is that it can compute polarity scores of very short documents. It achieves this by assigning polarity scores to all the words in the corpus and then computing polarity scores of documents as the sum of polarity scores of words weighted by their frequency. However, this […]

Japanese, Text analysisJuly 6, 2022July 7, 2022

保守政権下での安全保障問題に関する新聞報道と首相支持率

先月、Discursive diversion: Manipulation of nuclear threats by the conservative leaders in Japan and Israelと題する論文が公表されました。この研究は日本学術振興会に支援され2019年に始まった日本とイスラエルの研究者による共同プロジェクトで、両国のリベラルと保守派の新聞を2009から2018年に渡って比較し、法案成立や総選挙の前に、新聞記事が北朝鮮またはイランの核兵器の脅威をどの程度強調したかを分析しました。この期間を通して、日本（安倍政権）とイスラエル（ネタニエフ政権）の両国では保守政権が続いていました。新聞記事の内容分析では、Latent Semantic Scalingという準教師あり機械学習のアルゴリズムを用いており、［危険, 敵意, 壊滅, 危害, 衝突, 攻撃］を脅威について、［対話, 支持, 機会, 交渉, 成功, 貿易］を安全についての種語として選びました。ヘブライ語の記事の分析でも、同様な種語を選んでいます。一番目の図は、種語との意味的な距離によって、コーパス内の語がどのように重みづけされたかを示しており、差し迫った武力行使を意味する語が正の値を得て、核兵器が開発途上であることを意味する語が負の値を得ていることがわかると思います。当研究では、日本の安全保障制度改革に注目し、特定秘密保護法（L1）、集団的自衛権容認（L2）、安全保障関連法（L3）、テロ等準備罪（L4）の成立に至る60日間にリベラルと保守派の新聞が、北朝鮮の脅威を強調する度合いがどの程度変化したかを統計的に分析しました。３番目の図では、安全保障関連法案（L3）の時だけ、読売が朝日より脅威を強調したことが示されています。イスラエルでは、ネタニエフが苦戦した2015年の総選挙の前に、保守派の新聞がイランの脅威を強調していたことが示されました。当研究での統計分析の結果は、以前から指摘されていた安倍政権と保守系メディアの近しい関係を明示するものであり、LSSを用いた新聞の量的テキスト分析が政治コミュニケーションの分野において有効であることを証明したと考えています。イスラエルでは、実際に保守系新聞のオーナーとネタニエフの癒着が明らかになり、両者が有罪になっています。さらに、本研究では新聞の内容分析と並行して心理学的な実験を両国で行い、脅威を強調されている新聞記事を読んだ場合、強調されていない記事を読んだ場合と比べて、指導者の支持率が有意に高まることが示されました。この実験は、Could Leaders Deflect from Political Scandals? Cross-National Experiments on Diversionary Action in Israel and Japanという論文として発表されています。当研究での発見を総合すると、武力紛争下で指導者の支持率が高まる、旗の下の集結現象（rally-around-the-flag phenomena）が必ずしも、実際の武力行使を伴わずとも、マスメディアを操作するだけで発生し、保守的な政治家が自身の政治的な利益のために、安全保障上の脅威を強調しがちであると言えるでしょう。

Text analysisJune 15, 2022June 16, 2022

Good and bad methods to extract context words

Many questions on quanteda’s kwic() function have been posted to Stack Overflow. It shows how much people like the human-friendly function, but it also shows that how much they are confused: the function is created only for manual inspection of tokens objects, not for extracting context words in preproccessing. If you apply kwic() to your […]

Publication, Text analysisJune 3, 2022April 4, 2023

New research on the effect of nuclear threats in news on leader popularity

Since 2019, the most important research project of mine has been about media coverage of North Korea and Iran’s nuclear threats and its political implications. I started this project in 2019 with Elad Segev (Tel Aviv) and Atushi Tago (Waseda) supported by a Japanese funding agency. We have analyzed how Japanese (Asahi and Yomiuri) and […]

Publication, Text analysisApril 20, 2022January 23, 2023

Replication data for The Geopolitical Threat Index: A Text-Based Computational Approach to Identifying Foreign Threats

I have received emails from the readers of my paper, The Geopolitical Threat Index: A Text-Based Computational Approach to Identifying Foreign Threats, that appeared in International Studies Quarterly (ISQ) last year. It is great that my paper is still attracting attention but they said they cannot not find the replication dataset… There is a page […]

Text analysisFebruary 11, 2022February 11, 2022

Presentation at the Language of Polarization conference

I presented a paper with Oul Han titled Vertical and Horizontal Polarization Comparative Analysis of German and British Newspapers’ Coverage of Refugees from 2006 to 2018 at the Language of Polarization conference on last Saturday (slides). Lena March started organizing this conference a few years ago and she made this to happen in the online […]

Text analysisNovember 17, 2021November 25, 2021

Replicating analysis with quanteda on multi-core systems

It is not always easy to write R scripts that always produce the same results. It is even more so when we analyse textual data that requires extensive preprocessing. One of our goals in developing quanteda was ensuring replicability of text analysis by making data preprocessing explicit and transparent. However, our package still produces different […]

Text analysisSeptember 25, 2021September 29, 2021

Analysis of financial texts using R

I tend to use political texts in my examples because of my academic background but quanteda and its associated packages can be used more broadly. There is a growing interest in analysis of textual data using NLP tools in the financial industry. I have been work for Lazard Asset Management as a data science consultant […]

Text analysisSeptember 19, 2021September 19, 2021

Automatically weight seed words for LSS

I have updated LSX on CRAN yesterday, changing the version number from 1.0.0 to 1.0.2. The jump is a subtle indication of my excitement with a new function that would improve make LSS more reliable: if textmodel_lss(auto_weight = TRUE), it automatically optimize weights given to user-provided seed words. I fitted two LSS models on a […]

Text analysisMay 21, 2021December 12, 2024

New paper on historical geopolitical threats to the US

I am very happy that our paper, The Geopolitical Threat Index: A Text-Based Computational Approach to Identifying Foreign Threats, has finally appeared in International Studies Quarterly. Peter and I created the Geopolitical Threat Index (GTI) that covers more than 150 years (1861 to 2017) through a computational analysis of the New York Times summaries in […]

Develop efficient custom functions using quanteda v4.0 – Kohei Watanabe on New tokens object in quanteda v4.0April 16, 2024
[…] most important change in quanteda v4.0 is the creation of the external pointer-based tokens object, called tokens_xptr, that allows…
Setting fonts to plot Chinese polarity words in LSS – Kohei Watanabe on New paper on historical geopolitical threats to the USFebruary 19, 2024
[…] models are measuring to others. I am using this function myself in my project on construction of a geopolitical…
New paper on semantic temporality analysis – Kohei Watanabe on New paper on Latent Semantic ScalingAugust 29, 2023
[…] on temporal orientation of texts appeared in Research & Politics. In this study we applied latent semantic scaling (LSS)…
Kohei on Tutorial websites on LSS and Seeded LDAAugust 26, 2023
Please use base R's set.seed() before running the command.
Marli Fernandes on Tutorial websites on LSS and Seeded LDAAugust 24, 2023
I am currently using the seededlda package. I am using the following code: slda <- textmodel_seededlda(dfmt, dict, residual = 2)…