I have received emails from the readers of my paper, The Geopolitical Threat Index: A Text-Based Computational Approach to Identifying Foreign Threats, that appeared in International Studies Quarterly (ISQ) last year. It is great that my paper is still attracting attention but they said they cannot not find the replication dataset… There is a page […]
Presentation at the Language of Polarization conference
I presented a paper with Oul Han titled Vertical and Horizontal Polarization Comparative Analysis of German and British Newspapers’ Coverage of Refugees from 2006 to 2018 at the Language of Polarization conference on last Saturday (slides). Lena March started organizing this conference a few years ago and she made this to happen in the online […]
Replicating analysis with quanteda on multi-core systems
It is not always easy to write R scripts that always produce the same results. It is even more so when we analyse textual data that requires extensive preprocessing. One of our goals in developing quanteda was ensuring replicability of text analysis by making data preprocessing explicit and transparent. However, our package still produces different […]
Analysis of financial texts using R
I tend to use political texts in my examples because of my academic background but quanteda and its associated packages can be used more broadly. There is a growing interest in analysis of textual data using NLP tools in the financial industry. I have been work for Lazard Asset Management as a data science consultant […]
Automatically weight seed words for LSS
I have updated LSX on CRAN yesterday, changing the version number from 1.0.0 to 1.0.2. The jump is a subtle indication of my excitement with a new function that would improve make LSS more reliable: if textmodel_lss(auto_weight = TRUE), it automatically optimize weights given to user-provided seed words. I fitted two LSS models on a […]
New paper on historical geopolitical threats to the US
I am very happy that our paper, The Geopolitical Threat Index: A Text-Based Computational Approach to Identifying Foreign Threats, has finally appeared in International Studies Quarterly. Peter and I created the Geopolitical Threat Index (GTI) that covers more than 150 years (1861 to 2017) through a computational analysis of the New York Times summaries in […]
日本経済学会での量的テキスト分析チュートリアル
先日の日本経済学会の春季大会で量的テキスト分析のチュートリアルをやらせてもらいました。座長である川田恵介さんが提供してくれた鳥取県のハローワークの求人票の分析を題材として、日本語の文書の前処理、頻度分析や共起分析、辞書分析、機械学習の使い方を説明しました。今回は、共起分析を用いた日本語のトークン化および準教師ありトッピクモデル(Seeded-LDA)によって分析の結果を大幅に改善できる点を強調しました。興味がある方は、講義に用いたスライドとファイルを見てください。
Preprint on nuclear threats using LSS
I have been leading a project with Elad Segev (Tel Aviv University) and Atsushi Tago (Waseda University) on implications of security threats for domestic politics. We have completed a content analysis of newspapers and a simultaneous survey experiment in both Japan and Israel since the beginning of the project in 2019. One of the goals […]
New report on the Kremlin’s influence on Twitter
My co-authored report on Russia’s influence on Twitter during the 2020 US presidential election has been published by Free Russia Foundation. I and Maria Snegovaya conducted a representative online survey of Americans during the election campaign along with quantitative content analysis of their Twitter posts over a year. We aimed to reveal the relationship between […]
武蔵大学データサイエンス研究所での講演
先日、武蔵大学データーサイエンス研究所で、「NYT紙の量的テキスト分析を通じた150年間の地政学的脅威の測定」と題する講演を行いました。主催者の方によれば、オフラインで30名でオンラインで70名程度の方が発表を聞いてくれたようです。今回の発表を通じて、量的テキスト分析の可能性を感じ、日本でより多くの人が研究や実務で、同手法を応用してくれることを期待しています。 今後しばらくは日本にとどまって研究を続けるつもりなので、Quanteda Tutorialsを使った量的テキスト分析の実践的なワークショップの開催を希望する大学や企業の方は連絡をください。 2020年12月23日更新:講演の録画がYoutubeで公開されました。