I have written about my packages in different places including in my blog posts, but I decided to explain how to use them in dedicated websites about Latent Semantic Scaling and Seeded LDA. I though this is necessary because the methodology with these packages are getting more established with new functions that I added to the packages to solve real-world research problems. The latest addition is Distributed LDA that hugely speeds up topic modeling.
Yet, I will continue writing about them on my blog because there are small tips and experimental features that fit better here. Please keep checking my blog.
I am currently using the seededlda package. I am using the following code:
slda <- textmodel_seededlda(dfmt, dict, residual = 2)
terms(slda)
Everytime that I run, the top words are not exactly the same. Is there a possible way to ensure that every time that I run, the results are the same, to guarantee replicability?
In the gensim library there is a ‘random_state=1’ option (https://stackoverflow.com/questions/15067734/lda-model-generates-different-topics-everytime-i-train-on-the-same-corpus/15069580#15069580).
Please use base R’s set.seed() before running the command.