My co-authored paper on temporal orientation of texts appeared in Research & Politics. In this study we applied latent semantic scaling (LSS) to a corpus of English and German texts to identify features related to the future or the past automatically. Only with a set common verbs as seed words, the algorithm could classify sentences […]
Measuring emotional distress during COVID through words and emojis on Twitter
My co-authored article on public mental health has appeared recently in the Journal of Medical Internet Research. In this study, we combined survey research and social media analysis to infer Japanese people’s mental health during the COVID pandemic. The methodological novelty of this study is that (1) we collected individual characteristics (age, gender, occupation, income […]
Improved tokenization of hashtags in Asian languages
Quanteda can tokenize Asian texts thanks to the ICU library’s boundary detection mechanism, but it causes problems when we analyze social media posts that contain hashtags in Chinese or Japanese. For example, a hashtag “#英国首相仍在ICU但未使用呼吸机#” in a post about the British prime minister is completely destroyed by current quanteda’s tokenizer. Altough we can correct tokenization […]