Factor analysis in R and Python

Python has a number of statistical modules that allows us to perform analysis without R, but it is always good idea to compare the outputs of different implementations. I performed factor analysis using Scikit-learn module of Python for my dictionary creation system, but the outputs were completely different from that of R’s factanal function just like someone’s post to stackoverflow. After long hours, I finally found that it is because I had’t have normalized data for Scikit-learn. Factanal does normalization automatically, but Scikit-learn doesn’t. The right way of performing factor analysis must be this:

from sklearn import decomposition, preprocessing

data_normal = preprocessing.scale(data) # Normalization
fa = decomposition.FactorAnalysis(n_components=1)
fa.fit(data_normal)
print fa.components_ # Factor loadings

If you I do like this, factor loadings estimated by Scikit-learn become very close to R’s estimates:

# Python (Scikit-learn)
1: 0.24705429
2: 0.56100678
3: 0.48559474
4: 0.54208185
5: 0.50989289
6: 0.33028625
7: 0.38651951

# R (factanal)
1: 0.285719656390773
2: 0.633553717909623
3: 0.493731965398187
4: 0.527418210503982
5: 0.487150249901473
6: 0.312724093202758
7: 0.378827084637606

Update on 11/5/2020: a new Python package for factor analysis has been released. I haven’t tried it yet but looks great. Please see the comment below.

One thought on “Factor analysis in R and Python”

Good one! I agree that you can learn a lot by comparing different implementations and it sometimes needs some time to understand why they differ. I have also struggled with factor analysis in python as I was used to factanal and hence created a python package that wraps the R factanal function so that you can just call it from python with a pandas data frame like this:

from factanal.wrapper import factanal

fa_res = factanal(pdf, factors=4, scores='regression', rotation='promax', 
                  verbose=True, return_dict=True)

For everyone interested you can find more information here: https://pypi.org/project/factanal/

Factor analysis in R and Python

Kohei

One thought on “Factor analysis in R and Python”

Leave a Reply Cancel reply

Share this:

Kohei

One thought on “Factor analysis in R and Python”

Leave a Reply Cancel reply

Related Posts