Kohei Watanabe

Event, PublicationFebruary 12, 2021February 12, 2021

New report on the Kremlin’s influence on Twitter

My co-authored report on Russia’s influence on Twitter during the 2020 US presidential election has been published by Free Russia Foundation. I and Maria Snegovaya conducted a representative online survey of Americans during the election campaign along with quantitative content analysis of their Twitter posts over a year. We aimed to reveal the relationship between […]

Text analysisNovember 8, 2020December 23, 2020

武蔵大学データサイエンス研究所での講演

先日、武蔵大学データーサイエンス研究所で、「NYT紙の量的テキスト分析を通じた150年間の地政学的脅威の測定」と題する講演を行いました。主催者の方によれば、オフラインで30名でオンラインで70名程度の方が発表を聞いてくれたようです。今回の発表を通じて、量的テキスト分析の可能性を感じ、日本でより多くの人が研究や実務で、同手法を応用してくれることを期待しています。今後しばらくは日本にとどまって研究を続けるつもりなので、Quanteda Tutorialsを使った量的テキスト分析の実践的なワークショップの開催を希望する大学や企業の方は連絡をください。 2020年12月23日更新：講演の録画がYoutubeで公開されました。

Japanese, Text analysisNovember 4, 2020November 4, 2020

単語埋め込みによる柔軟な日本語文書の感情分析

先日、Latent Semantic Scaling: A Semisupervised Text Analysis Technique for New Domains and Languagesと題する僕の論文がCommunication Methods and Measuresに掲載されました。当論文では、単語埋め込み（word embedding）を用いることで、すぐに利用できるキーワード辞書などが少ない日本語においても、英語と同様に量的テキスト分析を行えることを示しました。当論文では、LSSという手法を用いて、新聞の記事から政治に関する語を抽出し、それらを感情に関する種語との距離によって重みづけしています。肯定的な語は「絶好、美麗、秀逸、卓越、優雅、絶賛、善良」は、否定的な語「粗悪、醜悪、稚拙、非礼、貧相、酷評、悪徳」となっています。重みづけの結果は、図にあるように、「絶好、人類、民主化、安定、立国」などが肯定的な語、「私利私欲、暴力団、脱税事件、不透明、流用」などが否定的な語となり、直感的に納得できる結果になっています。これら感情によっての重みづけされた語を用いて、文書を重みづけると適当な感情辞書が無くても、政治的な感情分析ができます。 LSSを使うと、重みづけされる語を変えることで、政治以外のさまざまな主題における感情分析を行えます。さらに、種語を変えることで脅威認識や精神状態などのより特定化された尺度における分析を行うことできます。この論文での日本語文書の処理と分析は、quantedaとLSXというRパッケージだけを使っていて簡単なので、ぜひとも試してみてください。分析を再現するRスクリプトは、Harvard Dataverseからダウンロードできます。

Programing, Text analysisNovember 3, 2020November 3, 2020

LSX package upgraded as the paper published in CMM

I am please to tell you that my paper, Latent Semantic Scaling: A Semisupervised Text Analysis Technique for New Domains and Languages, has been published in Communication Methods and Measures a few days ago. This paper explains the Latent Semantic Scaling technique, which is implemented in the LSX package available on CRAN, taking sentiment analysis […]

Text analysisNovember 2, 2020November 2, 2020

Study political and economic changes with semisupervided text analysis methods

Earlier this year, I have published my first paper on semisupervised methods (Newsmap and seeded LDA) in Social Science Computer Review. My second paper on semisupervised method (Latent Semantic Scaling) has appeared in Communication Methods and Measures a few days ago. I wrote these research articles and developed software packages as part of my effort […]

Programing, Text analysisSeptember 14, 2020September 15, 2020

Uploaded two new semisupervised models to CRAN

In this summer, I have submitted two packages for quantitative text analysis to CRAN: seededlda and LSX. These packages have been available in my Github repositories but I though it is time to make them more readily available to promote semisupervised machine learning techniques. seededlda is a package that implements seeded-LDA using the GibbsLDA++ library. […]

Text analysisJune 27, 2020June 27, 2020

Quanteda and semisupervised models

I and my co-developers received the 2020 Statistical Software Award from the Society for Political Methodology for quanteda‘s contribution to research. The package has established the reputation as user-friendly and highly-efficient R package for quantitative text analysis in the political scientist community. I also know that there are many users of the package in other […]

Text analysisApril 8, 2020November 21, 2020

New paper on Latent Semantic Scaling

I developed Latent Semantic Scaling (LSS) to perform sentiment analysis of news articles about the Ukraine crisis in my PhD project in London. LSS only requires a small set of polarity words, called “seed words”, to perform large-scale document scaling about a specific subject, becasue it automatically identify synonyms of seed words by latent semantic […]

Text analysisMarch 25, 2020March 25, 2020

New stopwords collection for European and Asian languages

In quantitative text analysis, it is common to remove grammatical elements using stopword lists defined in Snowball, but it does not contain stopword for Asian languages. The lack of stopwords collection that cover both European and Asian-languages made cross-lingual analysis difficult. To solve this problem, I and my collaborators created a new stopwords collection, called […]

Develop efficient custom functions using quanteda v4.0 – Kohei Watanabe on New tokens object in quanteda v4.0April 16, 2024
[…] most important change in quanteda v4.0 is the creation of the external pointer-based tokens object, called tokens_xptr, that allows…
Setting fonts to plot Chinese polarity words in LSS – Kohei Watanabe on New paper on historical geopolitical threats to the USFebruary 19, 2024
[…] models are measuring to others. I am using this function myself in my project on construction of a geopolitical…
New paper on semantic temporality analysis – Kohei Watanabe on New paper on Latent Semantic ScalingAugust 29, 2023
[…] on temporal orientation of texts appeared in Research & Politics. In this study we applied latent semantic scaling (LSS)…
Kohei on Tutorial websites on LSS and Seeded LDAAugust 26, 2023
Please use base R's set.seed() before running the command.
Marli Fernandes on Tutorial websites on LSS and Seeded LDAAugust 24, 2023
I am currently using the seededlda package. I am using the following code: slda <- textmodel_seededlda(dfmt, dict, residual = 2)…