New paper on Latent Semantic Scaling

KoheiApril 8, 2020November 21, 2020

I developed Latent Semantic Scaling (LSS) to perform sentiment analysis of news articles about the Ukraine crisis in my PhD project in London. LSS only requires a small set of polarity words, called “seed words”, to perform large-scale document scaling about a specific subject, becasue it automatically identify synonyms of seed words by latent semantic analysis (LSA), the seminal word-embedding technique.

When I was working in Tokyo, I found LSS is very useful for quantitative text analysis in non-English languages because many languages lack content analysis dictionary suitable for social science research. In fact, many of my colleagues used LSS to study Asian countries, such as Japan, China, Iraq and Philippines. They measured various dimensions, such as hawkish vs. dovish, pro vs anti-regime, sectarian vs. conciliatory, and conflict vs. peace, using tailor made seed words.

Since last year, I started teaching LSS as one of semisupervided methods for quantitative text analysis in data science courses at Innsbruck and the ECPR Method Summer School in Budapest. My tutorial on semisupervided techniques was also schedule as part of COMPTEXT conference. Although both the summer school and the conference were canceled due to the Coronavirus pandemic in Europe, I felt it is time for me to write about LSS for the users.

In my working paper, titled Latent Semantic Scaling: A Semisupervised Text Analysis Technique for New Domains and Languages, I explain LSS using two examples: sentiment analysis of English economic news and Japanese political news; I also introduce a new diagnostic measure to determine the near-optimal sizes of word vectors for LSA. I hope this paper will help the users of LSS to understand the method better.

Update on 22/11/2020: I changed the link to the published paper in Communication Methods and Measures.

Kohei

Posts created 117

8 thoughts on “New paper on Latent Semantic Scaling”

Sébastien says:

May 27, 2020 at 12:44 pm

Dear Kohei,
Thank you for the paper, which I read with great interest. I want to use LSS for an analysis of interest group position papers. That is typically one area where both supervised and unsupervised scaling techniques seem to fail.
I already listed a number of candidate seed words, but I have two questions:
1) Do I absolutely need to have the same number of positive and negative seed words? (right now, I have roughly twice as much negative words than positive words);
2) Is there a statistical indicator to assess the “quality” of a seed word?

Reply
1. Kohei says:
  
  May 27, 2020 at 3:46 pm
  
  You can have different numbers of seed words, because a negative word is weighted half of a positive word automatically to make total weight to be 1.0 for both sides. I am developing a statistical indicator, but it is not ready to be used. My suggestion is therefore check the seed words one by one if they weight words that they should using coef(). You can use a single word (without polarity score) as a seed words in testing.
  
  Reply
Sébastien says:

June 4, 2020 at 8:39 am

Thanks Kohei!
One additional question: can I use multi-word expressions as seed words? E.g. “fair competit*”.

Reply
1. Kohei says:
  
  June 4, 2020 at 12:54 pm
  
  Yes, but do not forget to form n-grams by tokens_compound(). Otherwise they will be lost in the document-feature matrix
  
  Reply
Pingback: Uploaded two new semisupervised models to CRAN – Kohei Watanabe
Pingback: The Latent Semantic Scaling – Kohei Watanabe
Pingback: Measuring emotional distress during COVID through words and emojis on Twitter – Kohei Watanabe
Pingback: New paper on semantic temporality analysis – Kohei Watanabe

Share this:

Kohei

8 thoughts on “New paper on Latent Semantic Scaling”

Leave a Reply Cancel reply

Related Posts