Kohei

Text analysisNovember 18, 2017December 22, 2019

What is the best SVD engine for LSA in R?

I use latent semantic analysis (LSA) to extract synonyms from a large corpus of news articles. I was very happy with Gensim‘s LSA function, but I was not sure how to do LSA in R as good as in Python. There is an R package called lsa, but it is unsuitable for large matrices, because […]

Event, Text analysisSeptember 17, 2017November 17, 2017

Workshop on text analysis at ESSCE in London

I and Ken Benoit will convene a half-day workshop at European Symposium Series on Societal Challenges in Computational Social Science in November. We will explain how quanteda can be allied to different types of textual data in dictionary-based analysis. The material for the workshop contains the most updated examples.

Text analysisSeptember 16, 2017December 22, 2019

Applying LIWC dictionary to a large dataset

LIWC is a popular text analysis package developed and maintained by Pennebaker et al. The latest version of the LIWC dictionary was released in 2015. This dictionary seems more appropriate than classic dictionaries such as the General Inquire dictionaries for analysis of contemporary materials, because our vocabulary changes over years. However, LIWC did not work […]

Text analysisJune 9, 2017December 22, 2019

Presentation on my PhD thesis at departmental event

I presented my PhD thesis titled “Measuring News Bias in Complex Media Systems: A New Approach to Big Media Analysis” in a departmental event on the 9th June.

Text analysisJune 3, 2017January 19, 2020

Workshops on Japanese text analysis using quanteda

I have presented how to analyze Japanese texts using quanteda in half-day workshops at Waseda University (22 May) and Kobe University (2 June) organized by Mikihito Tanaka (Waseda) and Atshushi Tago (Kobe). Materials for these workshops are made available on Github as Introduction to Japanese Text Analysis (IJTA).

Japanese, Text analysisMay 25, 2017January 19, 2020

quantedaによる日本語テキスト分析入門

quantedaについてのワークショップを早稲田大学で行いました。資料はRによる日本語テキスト分析入門と題して公開し、今後少しずつ内容を充実させていきます。今後、積極的に日本語テキストについてのワークショップの開催していこうと思うので、興味のある方はご連絡ください。

JapaneseMay 17, 2017December 22, 2019

早稲田大学で多言語テキスト分析法について発表

早稲田大学の政治学研究科セミナーにて、『バイリンガル分析へのデータ駆動アプローチ：30年間の日英新聞における米国外交政策の表象』と題するプレゼンテーションを行いました。当プレゼンテーションは、アメリカの政治・外交について研究プロジェクトにおいて、異なる言語（英語と日本語）の文書に対して同一の量的テキスト分析手法を適用する方法に関するものです。本セミナーで発表した手法の一部は、５月２２日の１５時から行われる日本語の量的テキスト分析に関するワークショップでより具体的に説明します。

Text analysisApril 21, 2017December 22, 2019

Upcoming presentation at Waseda University

I am invited to present a new approach to comparative text analysis in a research seminar at Waseda Universtiy (Tokyo) on 17th. My talk is titled Data-driven approach to bilingual text analysis: representation of US foreign policy in Japanese and British newspapers in 1985-2016. Kohei Watanabe will present a new approach to text analysis of […]

Text analysisApril 20, 2017December 22, 2019

Redefining word boundaries by collocation analysis

Quanteda’s tokenizer can segment Japanese and Chinese texts thanks to stringi, but its results are not always good, because its underlying function, ICU, recognizes only limited number of words. For example, this Japanese text “ニューヨークのケネディ国際空港” can be translated to “Kennedy International Airport (ケネディ国際空港) in (の) New York (ニューヨーク)”. Quanteda’s tokenizer (tokens function) segments this into […]

Text analysisApril 20, 2017December 22, 2019

Analyzing Asian texts in R on English Windows machines

R is generally good with Unicode, and we do not see garbled texts as far as we use stringi package. But there are some known bugs. The worst is probably the bug that have been discussed on the online community. On Windows, R prints character vectors properly, but not character vectors in data.frame: > sessionInfo() […]

Develop efficient custom functions using quanteda v4.0 – Kohei Watanabe on New tokens object in quanteda v4.0April 16, 2024
[…] most important change in quanteda v4.0 is the creation of the external pointer-based tokens object, called tokens_xptr, that allows…
Setting fonts to plot Chinese polarity words in LSS – Kohei Watanabe on New paper on historical geopolitical threats to the USFebruary 19, 2024
[…] models are measuring to others. I am using this function myself in my project on construction of a geopolitical…
New paper on semantic temporality analysis – Kohei Watanabe on New paper on Latent Semantic ScalingAugust 29, 2023
[…] on temporal orientation of texts appeared in Research & Politics. In this study we applied latent semantic scaling (LSS)…
Kohei on Tutorial websites on LSS and Seeded LDAAugust 26, 2023
Please use base R's set.seed() before running the command.
Marli Fernandes on Tutorial websites on LSS and Seeded LDAAugust 24, 2023
I am currently using the seededlda package. I am using the following code: slda <- textmodel_seededlda(dfmt, dict, residual = 2)…