Application of the techniques developed with English language texts to other languages is not so easy, but I managed to adapt my LSS system to Russia language for a project on Russian media framing of street protests. In the project, I am responsible for data collection and analysis of Russian language news collected from state-controlled […]
Countries with state-owned news agencies
It is only little recognized, even among the students of mass media, that international news system is a network of national or regional news agencies, and that many of those are state-owned. Fully commercial agencies like Reuters are very rare, and even international news agencies, such as AFP, are often subsidized by the government. In […]
ITAR-TASS’s coverage of annexation of Crimea
My main research interest is estimation of media biases using text analysis techniques. I did a very crude analysis of ITAR-TASS’s coverage of the Ukraine crisis two years ago, but it is time to redo everything with more sophisticated tools. I created a positive-negative dictionaries for democracy and sovereignty, and applied them to see how […]
Russia’s foreign policy priority
Methodological papers are tasteless and boring without nice examples. For an exemplary application of my Newsmap, I downloaded all the news stories published by ITAR-TASS news agency from 2009 to 2014 both in English and Russian. From a public diplomacy point of view, I was interested in which countries are receiving the highest coverage in […]
Sentence segmentation
I believe that sentence is the optimal unit of sentiment analysis, but splitting whole news articles into sentences is often tricky because there are a lot of quotations in news. If we simply chop up texts based on punctuations, we get quoted texts are split into different sentences. This code is meant to avoid such […]
Nexis news importer updated
I posted the code Nexis importer last year, but it tuned out that the HTML format of the database service is less consistent than I though, so I changed the logic. The new version is dependent less on the structure of the HTML files, but more on the format of the content. library(XML) #might need […]
The Latent Semantic Scaling
I have posted document scaling results on different dimensions such as political left-right, and immigration positive-negative on this blog previously, but I did not explain the detail of the technique, call the Latent Semantic Scaling. The LSS is a type of lexicon expansion technique based on the Latent Semantic Analysis. Please have a look at […]
Human-coded test data for geographical classification
Early this year, I crated a sizable human-coded test data for my news classifier using the Prolific Academic service, and the data set is now ready for download. The data is comprised of 5,000 news summaries collected from RSS feeds of the New York Times, The Times (UK), The Australian, Times of India, and Daily […]
Geographical dictionary making technique
My new draft paper Newsmap: Dictionary expansion technique for geographical classification of very short longitudinal texts explains how to create a large geographical dictionary for text classification. Its algorithm is an updated version of the International Newsmap, and it is simpler and more statistically grounded. As I am arguing in the paper, this technique could […]
International news coding instruction
It was already four years ago when I created my Newsmap. It is time to update the whole system: fully rewritten in Python and developing a new classification algorithm. This is why I generated a 5,000 human-coded international news stories using the Prolific Academic. Thanks to the crowed-sourcing services, recruiting is no longer a problem, […]