New papers on distributed LDA for sentence-level topic classification

I have been studying and developing an LDA algorithm for classification of sentences since 2022. Sentence-level topic classification allows us to analyze association between topics and other properties such as sentiments within documents. Also, sentence-level analysis has become more common in text analysis in general thanks to highly capable transformer models in recent years.

My research was published in a paper, Seeded Sequential LDA: A Semi-Supervised Algorithm for Topic-Specific Analysis of Sentences in Social Science Computer Review. In this co-authored paper, we applied Seeded Sequential LDA to classify sentences from speeches delivered by delegates at the United Nations General Assembly meetings and presented an example of topic-specific sentiment analysis. The plot shows how sentiment on security and development has been for countries from five world regions during the post-Cold War period. We measured sentiment using the Lexicoder Sentiment Dictionary for simplicity, but more sophisticated methods can be combined with topic analysis.

Through above research, I found that it takes too long to analyze increasingly large corpora using LDA. To make topic modeling faster, I implemented algorithms for parallel computing and convergence detection in my seededlda package (v1.0). I tested the algorithms by running them to identify 100 topics in a corpus of 10,000 news articles and described the results in a working paper, Speed Up Topic Modeling: Distributed Computing and Convergence Detection for LDA.

The next plot shows the execution time fell sharply from 1 processor to 8 processors both in the sequential models (sequential = TRUE) and the non-sequential models thanks to parallel computing. When iterative Gibbs sampling is terminated on convergence (auto_iter = TRUE), the execution time become another 50% shorter in the non-sequential models (sequential = FALSE).

Faster sequential LDA algorithms are very useful not only for topic classification at the sentence-level but also for creation of sentence vectors as summaries of their content. Once sentence vectors are created, other machine learning methods could applied to the textual data easily.

New papers on distributed LDA for sentence-level topic classification

Kohei

Leave a Reply Cancel reply

Share this:

Kohei

Leave a Reply Cancel reply

Related Posts