Kohei

Text analysisSeptember 19, 2021September 19, 2021

Automatically weight seed words for LSS

I have updated LSX on CRAN yesterday, changing the version number from 1.0.0 to 1.0.2. The jump is a subtle indication of my excitement with a new function that would improve make LSS more reliable: if textmodel_lss(auto_weight = TRUE), it automatically optimize weights given to user-provided seed words. I fitted two LSS models on a […]

Text analysisMay 21, 2021December 12, 2024

New paper on historical geopolitical threats to the US

I am very happy that our paper, The Geopolitical Threat Index: A Text-Based Computational Approach to Identifying Foreign Threats, has finally appeared in International Studies Quarterly. Peter and I created the Geopolitical Threat Index (GTI) that covers more than 150 years (1861 to 2017) through a computational analysis of the New York Times summaries in […]

Event, Japanese, Text analysisMay 16, 2021May 16, 2021

日本経済学会での量的テキスト分析チュートリアル

先日の日本経済学会の春季大会で量的テキスト分析のチュートリアルをやらせてもらいました。座長である川田恵介さんが提供してくれた鳥取県のハローワークの求人票の分析を題材として、日本語の文書の前処理、頻度分析や共起分析、辞書分析、機械学習の使い方を説明しました。今回は、共起分析を用いた日本語のトークン化および準教師ありトッピクモデル（Seeded-LDA）によって分析の結果を大幅に改善できる点を強調しました。興味がある方は、講義に用いたスライドとファイルを見てください。

Japanese, Publication, Text analysisApril 6, 2021June 3, 2022

Preprint on nuclear threats using LSS

I have been leading a project with Elad Segev (Tel Aviv University) and Atsushi Tago (Waseda University) on implications of security threats for domestic politics. We have completed a content analysis of newspapers and a simultaneous survey experiment in both Japan and Israel since the beginning of the project in 2019. One of the goals […]

Event, PublicationFebruary 12, 2021February 12, 2021

New report on the Kremlin’s influence on Twitter

My co-authored report on Russia’s influence on Twitter during the 2020 US presidential election has been published by Free Russia Foundation. I and Maria Snegovaya conducted a representative online survey of Americans during the election campaign along with quantitative content analysis of their Twitter posts over a year. We aimed to reveal the relationship between […]

Text analysisNovember 8, 2020December 23, 2020

武蔵大学データサイエンス研究所での講演

先日、武蔵大学データーサイエンス研究所で、「NYT紙の量的テキスト分析を通じた150年間の地政学的脅威の測定」と題する講演を行いました。主催者の方によれば、オフラインで30名でオンラインで70名程度の方が発表を聞いてくれたようです。今回の発表を通じて、量的テキスト分析の可能性を感じ、日本でより多くの人が研究や実務で、同手法を応用してくれることを期待しています。今後しばらくは日本にとどまって研究を続けるつもりなので、Quanteda Tutorialsを使った量的テキスト分析の実践的なワークショップの開催を希望する大学や企業の方は連絡をください。 2020年12月23日更新：講演の録画がYoutubeで公開されました。

Japanese, Text analysisNovember 4, 2020November 4, 2020

単語埋め込みによる柔軟な日本語文書の感情分析

先日、Latent Semantic Scaling: A Semisupervised Text Analysis Technique for New Domains and Languagesと題する僕の論文がCommunication Methods and Measuresに掲載されました。当論文では、単語埋め込み（word embedding）を用いることで、すぐに利用できるキーワード辞書などが少ない日本語においても、英語と同様に量的テキスト分析を行えることを示しました。当論文では、LSSという手法を用いて、新聞の記事から政治に関する語を抽出し、それらを感情に関する種語との距離によって重みづけしています。肯定的な語は「絶好、美麗、秀逸、卓越、優雅、絶賛、善良」は、否定的な語「粗悪、醜悪、稚拙、非礼、貧相、酷評、悪徳」となっています。重みづけの結果は、図にあるように、「絶好、人類、民主化、安定、立国」などが肯定的な語、「私利私欲、暴力団、脱税事件、不透明、流用」などが否定的な語となり、直感的に納得できる結果になっています。これら感情によっての重みづけされた語を用いて、文書を重みづけると適当な感情辞書が無くても、政治的な感情分析ができます。 LSSを使うと、重みづけされる語を変えることで、政治以外のさまざまな主題における感情分析を行えます。さらに、種語を変えることで脅威認識や精神状態などのより特定化された尺度における分析を行うことできます。この論文での日本語文書の処理と分析は、quantedaとLSXというRパッケージだけを使っていて簡単なので、ぜひとも試してみてください。分析を再現するRスクリプトは、Harvard Dataverseからダウンロードできます。

Programing, Text analysisNovember 3, 2020November 3, 2020

LSX package upgraded as the paper published in CMM

I am please to tell you that my paper, Latent Semantic Scaling: A Semisupervised Text Analysis Technique for New Domains and Languages, has been published in Communication Methods and Measures a few days ago. This paper explains the Latent Semantic Scaling technique, which is implemented in the LSX package available on CRAN, taking sentiment analysis […]

Text analysisNovember 2, 2020November 2, 2020

Study political and economic changes with semisupervided text analysis methods

Earlier this year, I have published my first paper on semisupervised methods (Newsmap and seeded LDA) in Social Science Computer Review. My second paper on semisupervised method (Latent Semantic Scaling) has appeared in Communication Methods and Measures a few days ago. I wrote these research articles and developed software packages as part of my effort […]

Programing, Text analysisSeptember 14, 2020September 15, 2020

Uploaded two new semisupervised models to CRAN

In this summer, I have submitted two packages for quantitative text analysis to CRAN: seededlda and LSX. These packages have been available in my Github repositories but I though it is time to make them more readily available to promote semisupervised machine learning techniques. seededlda is a package that implements seeded-LDA using the GibbsLDA++ library. […]

Develop efficient custom functions using quanteda v4.0 – Kohei Watanabe on New tokens object in quanteda v4.0April 16, 2024
[…] most important change in quanteda v4.0 is the creation of the external pointer-based tokens object, called tokens_xptr, that allows…
Setting fonts to plot Chinese polarity words in LSS – Kohei Watanabe on New paper on historical geopolitical threats to the USFebruary 19, 2024
[…] models are measuring to others. I am using this function myself in my project on construction of a geopolitical…
New paper on semantic temporality analysis – Kohei Watanabe on New paper on Latent Semantic ScalingAugust 29, 2023
[…] on temporal orientation of texts appeared in Research & Politics. In this study we applied latent semantic scaling (LSS)…
Kohei on Tutorial websites on LSS and Seeded LDAAugust 26, 2023
Please use base R's set.seed() before running the command.
Marli Fernandes on Tutorial websites on LSS and Seeded LDAAugust 24, 2023
I am currently using the seededlda package. I am using the following code: slda <- textmodel_seededlda(dfmt, dict, residual = 2)…