My PhD thesis titled Measuring bias in international news: a large-scale analysis of news agency coverage of the Ukraine crisis has been archived electronically in the LSE Library and become publicly available. This thesis is a compilation of research papers, three of which have already been published, but its grand conclusion is more than a […]
Quanteda Tutorials
We launched the Quanteda Tutorials website for a workshop Introduction to Quantitative Text Analysis using Quanteda held at the WZB Berlin Social Science Center on 31st January. The website is still work-in-progress, but it already covers all the important Quanteda functions.
Release of Quanteda version 1.0
We have announced the release of quanteda version 1.0 at the London R meeting on Tuesday. I thank all the organizers and 150+ participants. In the talk, I presented the performance comparison with R and Python packages, but I actually compared the performance with its earlier CRAN versions to show how the package evolved to […]
A new paper on Russian media’s coverage of protests in Ukraine
A paper ‘Russian Spring’ or ‘Spring Betrayal’? The Media as a Mirror of Putin’s Evolving Strategy in Ukraine that I co-authored with Tomila Lankina as part of the British Academy-funded project appeared in Europe-Asia Studies. We analyse Russian state media’s framing of the Euromaidan protests using a novel Russian-language electronic content-analysis dictionary and method that […]
Historical analysis of NYT using web API
We usually use commercial database such as Nexis to download news stories in the past, but you should use New York Times APIs if you want to do historical analysis of news content. We can search NYT news articles until 1851 through the API, and it is free for anyone! We can only download meta-data, […]
What is the best SVD engine for LSA in R?
I use latent semantic analysis (LSA) to extract synonyms from a large corpus of news articles. I was very happy with Gensim‘s LSA function, but I was not sure how to do LSA in R as good as in Python. There is an R package called lsa, but it is unsuitable for large matrices, because […]
Workshop on text analysis at ESSCE in London
I and Ken Benoit will convene a half-day workshop at European Symposium Series on Societal Challenges in Computational Social Science in November. We will explain how quanteda can be allied to different types of textual data in dictionary-based analysis. The material for the workshop contains the most updated examples.
Applying LIWC dictionary to a large dataset
LIWC is a popular text analysis package developed and maintained by Pennebaker et al. The latest version of the LIWC dictionary was released in 2015. This dictionary seems more appropriate than classic dictionaries such as the General Inquire dictionaries for analysis of contemporary materials, because our vocabulary changes over years. However, LIWC did not work […]
Presentation on my PhD thesis at departmental event
I presented my PhD thesis titled “Measuring News Bias in Complex Media Systems: A New Approach to Big Media Analysis” in a departmental event on the 9th June.
Workshops on Japanese text analysis using quanteda
I have presented how to analyze Japanese texts using quanteda in half-day workshops at Waseda University (22 May) and Kobe University (2 June) organized by Mikihito Tanaka (Waseda) and Atshushi Tago (Kobe). Materials for these workshops are made available on Github as Introduction to Japanese Text Analysis (IJTA).