My co-authored article on public mental health has appeared recently in the Journal of Medical Internet Research. In this study, we combined survey research and social media analysis to infer Japanese people’s mental health during the COVID pandemic. The methodological novelty of this study is that (1) we collected individual characteristics (age, gender, occupation, income etc.) of the people; (2) correlated their characteristics with their mental distress levels measured in their tweets.
We applied the Latent Semantic Scaling (LSS) to a corpus of tweets with seed words about good (楽しみ, 絶好調, 喜ぶ, 笑う, 嬉しい, 幸せ, のんびり, 元気) and bad (悩み, 心配, 嘆く, 泣く, 悔しい, 困る, しんどい, 苦痛) mental states. With the seed words, LSS accurately identified words and emojis that are associated with emotional distress. In the plot below, “polarity” is the distress levels that the words and emojis indicate.
Our analysis revealed, for example, that the female’s distress level increased more greatly than male’s distress level when Japanese schools were closed due to the COVID pandemic (the blue lines). This is understandable because mother’s often bare greater responsibility for taking care of children in Japanese families.
While pre-trained models are becoming increasingly popular among data scientists, I hope that our study demonstrates the possibility of novel text analysis with word embedding models trained on a original corpus. With the LSX package, you only need a corpus of tweets and seed words to perform a similar analysis.