Use pre-trained Glove word vectors in LSS

I mad it possible to use pre-trained word vector for Latent Semantic Scaling (LSS) in the version 0.9 of the LSX package, but I don’t think I explained how to do. It can be done easily by using the as.textmodel_lss() function but you need to load the word vectors to R as dense matrix beforehand.

For example, pre-trained Glove word vectors are provided in a text file, in which values are separated by the white-space without quote. In order to load such a file using the read.table() function, you have to set several arguments correctly. You also have to transpose the loaded matrix to store word vectors along its columns.

require(LSX)
mt <- read.table("glove.6B/glove.6B.200d.txt", quote = "", sep = " ", fill = FALSE,
                 comment.char = "", row.names = 1, fileEncoding = "UTF-8")
colnames(mt) <- NULL
mt <- mt[stringi::stri_detect_regex(rownames(mt), "[a-zA-Z]"),] # exclude numbers and punctuations
mt <- t(mt) # transpose

seed <- as.seedwords(data_dictionary_sentiment)
lss <- as.textmodel_lss(mt, seed) # create LSS object

Once a LSS object is created, you can check polarity of words using coef(). Since the word vectors are trained on a large Wikipedia corpus (6 billion tokens), the words are diverse and estimation of scores are very intuitive.

> head(coef(lss), 20)
  excellent  excellence  impressive   versatile versatility     rapport      superb     optimum 
  0.2401436   0.1777782   0.1712947   0.1697045   0.1696126   0.1687611   0.1648426   0.1575858 
 reasonably    achieved   wonderful    enjoying    terrific       enjoy     achieve     enjoyed 
  0.1561745   0.1561181   0.1557793   0.1547977   0.1537846   0.1533817   0.1524274   0.1523841 
    elegant world-class sustainable   confident 
  0.1517054   0.1508276   0.1502961   0.1481712 

> tail(coef(lss), 20)
      racist       doings   needlessly misogynistic   infighting       blamed       sexist     plaguing 
  -0.1938094   -0.1939329   -0.1946107   -0.1954958   -0.1967688   -0.1969247   -0.1980838   -0.1992275 
     hateful     shameful    heartless stereotyping      brutish         vile         ugly exacerbating 
  -0.2031186   -0.2033597   -0.2043213   -0.2052691   -0.2076061   -0.2087724   -0.2130335   -0.2169766 
     vicious      blaming  exacerbated        nasty 
  -0.2196787   -0.2196897   -0.2216914   -0.2366194
Posts created 113

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top