Kohei Watanabe

UncategorizedJune 7, 2015December 22, 2019

Human-coded test data for geographical classification

Early this year, I crated a sizable human-coded test data for my news classifier using the Prolific Academic service, and the data set is now ready for download. The data is comprised of 5,000 news summaries collected from RSS feeds of the New York Times, The Times (UK), The Australian, Times of India, and Daily […]

UncategorizedMarch 20, 2015December 22, 2019

Geographical dictionary making technique

My new draft paper Newsmap: Dictionary expansion technique for geographical classification of very short longitudinal texts explains how to create a large geographical dictionary for text classification. Its algorithm is an updated version of the International Newsmap, and it is simpler and more statistically grounded. As I am arguing in the paper, this technique could […]

UncategorizedFebruary 22, 2015December 22, 2019

International news coding instruction

It was already four years ago when I created my Newsmap. It is time to update the whole system: fully rewritten in Python and developing a new classification algorithm. This is why I generated a 5,000 human-coded international news stories using the Prolific Academic. Thanks to the crowed-sourcing services, recruiting is no longer a problem, […]

UncategorizedFebruary 13, 2015February 22, 2015

Crowd-coding of international news by Prolific Academic

I recently created a sizable human-coded dataset (5,000 items) of international news using the Prolific Academic service. The Prolific Academic is an Oxford-based academic alternative to the Amazon Mechanical Turk. The advantage of using this services is that researchers only have to compensate for work that they approve. The potential drawback is its relatively high […]

UncategorizedJanuary 17, 2015December 22, 2019

Terrorism Dictionary 2014

After seeing mass media’s strong response to the extremists’ attack against Charlie Hebdo, I started thinking what can I do for this increasingly important topic? One simple work is making a dictionary containing keywords related to terrorism, so the Terrorism Dictionary 2014 is created. This dictionary is made from newswires submitted by the Associated Press […]

UncategorizedJanuary 12, 2015December 22, 2019

Left-right policy position dictionary

The Latent Semantic Scaling (LSS) not only works well with positive-negative sentiment but with left-right position on economic policy. The seed words for this dimension are {deficit, austerity, unstable, recession, inflation, currency, workforce} for the light and {poor, poverty, free, benefits, prices, money, workers} for the left. Left-right policy position dictionary was created from UK […]

UncategorizedJanuary 12, 2015December 22, 2019

Immigration dictionary

This is probably the final version of my immigration dictionary. This text analysis dictionary was created using technique called the Latent Semantic Scaling, which is based on the Latent Semantic Analysis, from British newspaper corpus. The result of the automated content analysis by this dictionary is strongly corresponds to manual coding by Amazon’s Mechanical Turks […]

UncategorizedDecember 3, 2014December 22, 2019

Text analysis dictionary on psychology

My automated dictionary creation project is making good progress, and I created a psychology dictionary from a large corpus of UK news on psychology from 1990 to 2011. Scores given to each entry word is interpreted as strength of association to psychology, and the list can be truncated based on the scores. The words are […]

ProgramingOctober 18, 2014May 11, 2020

Factor analysis in R and Python

Python has a number of statistical modules that allows us to perform analysis without R, but it is always good idea to compare the outputs of different implementations. I performed factor analysis using Scikit-learn module of Python for my dictionary creation system, but the outputs were completely different from that of R’s factanal function just […]

UncategorizedOctober 7, 2014December 22, 2019

Testing immigration dictionary

After making some changes in my automated dictionary creation system, I ran a test to validate the word choice for the new immigration dictionary. Latest version contains fewer intuitively negative words with positive scores, unlike the original version. The test was performed by comparing the computer content-analysis with human coding of the 2010 UK manifestos. […]

Develop efficient custom functions using quanteda v4.0 – Kohei Watanabe on New tokens object in quanteda v4.0April 16, 2024
[…] most important change in quanteda v4.0 is the creation of the external pointer-based tokens object, called tokens_xptr, that allows…
Setting fonts to plot Chinese polarity words in LSS – Kohei Watanabe on New paper on historical geopolitical threats to the USFebruary 19, 2024
[…] models are measuring to others. I am using this function myself in my project on construction of a geopolitical…
New paper on semantic temporality analysis – Kohei Watanabe on New paper on Latent Semantic ScalingAugust 29, 2023
[…] on temporal orientation of texts appeared in Research & Politics. In this study we applied latent semantic scaling (LSS)…
Kohei on Tutorial websites on LSS and Seeded LDAAugust 26, 2023
Please use base R's set.seed() before running the command.
Marli Fernandes on Tutorial websites on LSS and Seeded LDAAugust 24, 2023
I am currently using the seededlda package. I am using the following code: slda <- textmodel_seededlda(dfmt, dict, residual = 2)…