Text analysis – Page 6 – Kohei Watanabe

Text analysisJanuary 29, 2019December 22, 2019

Measuring America’s historical threat perception

Last year, I wrote that the NYT API is a great source of historical anlaysis. Since then I have been working in a project with my colleagues at the LSE to create a historical index for America’s perceived threat. The project is coming to fruition, so I presented the latest results at the Waseda Data […]

Event, Text analysisJanuary 19, 2019May 17, 2019

POLTEXT is coming to Tokyo

I am organizing the POLTEXT symposium in Tokyo on 14-15 September, 2019. I have participated in the conference in 2016 (Croatia) as a presenter and in 2018 (Hungary) as a tutorial instructor, and learnt a lot from other participants. This is the time for me to offer such an opportunity people from inside and outside […]

Programing, Text analysisOctober 25, 2018January 19, 2020

Computing document similarity in large corpus

Since early this year, I was asked by many people how to compute document (or feature) similarity in large corpus. They said their functions stops because the lack of space in RAM: Error in .local(x, y, …) : Cholmod error ‘problem too large’ at file ../Core/cholmod_sparse.c, line 92 This happened in our textstat_simil(margn = “documents”) […]

Japanese, Text analysisOctober 6, 2018December 22, 2019

日本語の量的テキスト分析

より多くの日本人の研究者に量的テキスト分析について関心を持ってもらうために、『日本語の量的分析』という論文をニューヨーク大学のエイミー・カタリナックと一緒に書きました。これまでのところ、Twitterで多くの方からポジティブな反応を頂いています。本稿は、欧米の政治学者の間で近年人気を集めている量的テキスト分析（quantitative text analysis）と呼ばれる手法の日本語における利用について論ずる。まず、量的テキスト分析が登場した背景を述べたうえで、欧米の政治学においてどのように利用されているかを説明する。次に、読者が量的テキスト分析を研究で利用できるように、日本語の分析において注意すべき点に言及しながら、作業の流れを具体的に説明する。最後に、欧米で利用されている統計分析モデルを紹介した上で、それらが日本語の文書の分析にも利用できることを研究事例を用いて示す。本稿は、近年の技術的および方法論な発展によって、日本語の量的テキスト分析が十分に可能になったことを主張するが、この手法が日本の政治学において広く普及するためには、データの整備など制度的な問題に対処していく必要性があることにも触れる。

Text analysisSeptember 3, 2018December 22, 2019

Newsmap is available on CRAN

I am happy to announce that our R package for semi-supervised document classification, newsmap is available on CRAN. This package is simple in terms of algorithms but comes with well-maintained geographical seed dictionaries in English, German, Spanish, Russian and Japanese. This package was created originally for geographical classification of news articles, but it can also […]

Event, Text analysisAugust 24, 2018December 22, 2019

Presentation at ECPR Hamburg

I have presented my latest study on Sputnik News at ECPR Hamburg. This study shows that Russia is using conspiracy theory in Sputnik News articles to promote anti-establishment sentiment in the United State and Britain. The paper and slides are available.

Event, Text analysisJune 30, 2018February 10, 2021

Obstruction to Asian-language text analysis

In a presentation titled Internationalizing Text Analysis at a workshop on the 27th June at Waseda University, I and Oul Han discussed what obstructing adoption of quantitative text analysis techniques in Japan and Korea. Our question is why there are only few people who do quantitative analysis of Japanese and Korean texts, despite it is […]

Text analysisJune 24, 2018

New page on how to perform Japanese texts

We have added a new page to Quanteda Tutorials website on special handling of Japanese texts. This page will be used in Quantitative Political Methodology at Kobe University in the next week. This page summarizes my posts about Japanese text analysis in this blog. We are planing to add pages about other languages.

Event, Text analysisJune 6, 2018June 24, 2018

Presentation at BEAMS workshop

I presented a technique for a longitudinal analysis of media content at BEAMS (Behavioral and Experimental Analyses in Macro-finance) workshop at Waseda University.

Event, Text analysisMay 9, 2018June 24, 2018

Quantitative text analysis workshop at PolText 2018

I was invited to deliver a workshop on quantitative text analysis at PolText Incubator Workshop at Hungarian Academy of Science on 9 May 2018. Workshop materials are available in my Github repo.

Develop efficient custom functions using quanteda v4.0 – Kohei Watanabe on New tokens object in quanteda v4.0April 16, 2024
[…] most important change in quanteda v4.0 is the creation of the external pointer-based tokens object, called tokens_xptr, that allows…
Setting fonts to plot Chinese polarity words in LSS – Kohei Watanabe on New paper on historical geopolitical threats to the USFebruary 19, 2024
[…] models are measuring to others. I am using this function myself in my project on construction of a geopolitical…
New paper on semantic temporality analysis – Kohei Watanabe on New paper on Latent Semantic ScalingAugust 29, 2023
[…] on temporal orientation of texts appeared in Research & Politics. In this study we applied latent semantic scaling (LSS)…
Kohei on Tutorial websites on LSS and Seeded LDAAugust 26, 2023
Please use base R's set.seed() before running the command.
Marli Fernandes on Tutorial websites on LSS and Seeded LDAAugust 24, 2023
I am currently using the seededlda package. I am using the following code: slda <- textmodel_seededlda(dfmt, dict, residual = 2)…