New papers on distributed LDA for sentence-level topic classification

I have been studying and developing an LDA algorithm for classification of sentences since 2022. Sentence-level topic classification allows us to analyze association between topics and other properties such as sentiments within documents. Also, sentence-level analysis has become more common in text analysis in general thanks to highly capable transformer models in recent years. My […]

Measuring emotional distress during COVID through words and emojis on Twitter

My co-authored article on public mental health has appeared recently in the Journal of Medical Internet Research. In this study, we combined survey research and social media analysis to infer Japanese people’s mental health during the COVID pandemic. The methodological novelty of this study is that (1) we collected individual characteristics (age, gender, occupation, income […]

Encyclopedia entries on text analysis from fresh perspectives

The Elgar Encyclopedia of Technology and Politics was published earlier this month. Andrea Ceron, the editor, compiled entries by many young political scientists to make the volume full of fresh perspectives. I have contributed to it by writing an entry on “text as data” (preprint) with an emphasis on the “string-of-words” approach that would improve […]

Replication data for The Geopolitical Threat Index: A Text-Based Computational Approach to Identifying Foreign Threats

I have received emails from the readers of my paper, The Geopolitical Threat Index: A Text-Based Computational Approach to Identifying Foreign Threats, that appeared in International Studies Quarterly (ISQ) last year. It is great that my paper is still attracting attention but they said they cannot not find the replication dataset… There is a page […]

New research paper on how to choose seed words for semi-supervised models

I have been developing and applying semi-supervised models, such as seeded-LDA, Newsmap and LSS, for classification and document scaling aiming to broader the scope of quantitative text analysis in recent years. These models are very cost efficient because they only require a small set of “seed words” to learn categories or dimensions of interest. However, […]

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top