Factor Analysis MARTIN SEBERA MAGDEBURG 15. 12. 2023 Martin Sebera Faculty of Sports Studies, Masaryk University Brno Czech Republic Interests: mathematics, statistics, programming, artificial intelligence, esports https://www.muni.cz/en/people/55084-martin-sebera Email: sebera@fsps.muni.cz Lecture schedule – Factor Analysis What is it used for Requirements Procedure Weaknesses Conclusion Example - Decathlon - sw TIBCO Statistica 14 - sw IBM SPSS 28 When was it created and who discovered it? 1.Factor analysis originated in the field of psychology. 2.Its founder is considered to be Charles Spearman 3.in 1904 in an article on the nature of intelligence proposed the hypothesis of the existence of a common factor of "general intellectual ability", causing correlations between the results of various intelligence tests What is it used for 1.To identify groups of variables that are interrelated and can be represented by a smaller number of factors or latent variables. 2.Dimensionality reduction: Factor analysis allows data simplification by reducing many measured variables to a smaller number of factors. 3.Structure Identification: Helps identify hidden structure in a data set, which can for example reveal groups of variables that may represent a common concept or factor. 4.Data Exploration: It is useful for data exploration when researchers are looking for patterns or relationships in complex datasets. Requirements 1.Sample size: at least 5-10 observations for each variable, but ideally the total sample should have at least 100 observations. Larger samples provide more robust and stable results. 2.Linear relationships: Factor analysis assumes linear relationships between variables. 3.Normal distribution of data: Although factor analysis can be performed on data that is not normally distributed, the normal distribution increases the reliability. 4.Homogeneity of the sample: The data should come from a homogeneous group or population so that the results of the factor analysis are relevant and interpretable for that population. 5.No or minimal missing values Procedure 1/2 Data preparation: ◦sample size, linear relationships between variables, normal distribution, etc. ◦Data cleaning - including addressing missing values and removing outliers. Method selection - Deciding whether to use exploratory factor analysis (EFA) or confirmatory factor analysis (CFA). EFA is used for discovering potential structures, while CFA for testing hypotheses about the structure. Calculation of the correlation matrix - Creating a correlation matrix of variables. This matrix provides the basis for factor analysis. Choosing an extraction method - principal axes (Principal Axis Factoring) or maximum likelihood Procedure 2/2 Selection of the number of factors to be extracted, which can be done using the Kaiser criterion (eigenvalues > 1), the scree test, or based on theoretical considerations. Factor rotation – better interpretation. Orthogonal rotation (e.g. Varimax) maintains factor independence. Interpretation of factors - Each factor is interpreted on the basis of variables that have high loadings on it. Interpretation depends on the research context. Assessment of model fit and reliability - In CFA, model fit is evaluated using various measures such as RMSEA (Root Mean Square Error of Approximation), CFI (Comparative Fit Index), and others. Important terms again - 1/3 The principal component method gives uncorrelated factors, which are additionally ordered according to their variance, such that the first factor has the largest variance and the last the smallest. Factor analysis can be considered as its extension. While principal component analysis tries to reduce the number of variables so that the variance of the original variables is best clarified, factor analysis tries to clarify the correlations of the original variables as best as possible. Important terms again - 2/3 Factor rotation 1.there are infinitely many factor solutions. 2.The factors are transformed so that we can interpret them as best as possible. 3.At the same time, practice has shown that factors whose factor loadings take on values close to either one or zero are best interpreted Important terms again - 3/3 Interpretation of factors 1.We describe a factor as having something in common in content with those variables that have high factor loadings on that factor. 2.When interpreting the factors, one must be careful and think about whether the name of the factor is really behind its real existence. 3.If it does not have a logical explanation for the factor, we cannot use factor analysis Weaknesses 1.Complexity and subjectivity: The interpretation of factors can often be subjective and depends on the researcher's decisions (e.g. choice of number of factors, rotation). 2.Assumptions about the data: linear relationships between variables and normal distribution, which do not always correspond to the actual data. 3.Dependence on sample size: A large enough sample is needed for reliable results. 4.Limitation to linear relationships: Factor analysis cannot effectively handle non-linear relationships between variables. 5.Unclear meaning of factors: Identified factors may not always have a clear or intuitive meaning and may require further research to be fully understood. Conclusion It is always important to remember that 1.no statistical technique is all-powerfull, 2.it is necessary to evaluate the appropriateness of the method in relation to the data and objectives of your research. 3.If you are unsure, it may also be helpful to consult an expert in statistics or research methodology about the issue. Example - Decathlon results from the Athens Olympics 2004 1.sw TIBCO Statistica 14 2.sw IBM SPSS 28 Example - Decathlon results from the Athens Olympics 2004 1. Objective of the analysis - We are interested in whether it is possible to identify the factors on which the results in individual disciplines depend. And further which factors are most important for victory. 2. Data standardization - Not necessary. Factor analysis, as a method based on the correlation matrix, is not dependent on the scale of the input values. 3. Factor estimation methods - the method of principal components, Example - Decathlon results from the Athens Olympics 2004 4. Eigennumbers & How many factors to create There are three eigenvalues > 1 and the factors/components corresponding to them describe roughly 70% of the variability of the original variables. It is to be considered whether to use a fourth factor, which would increase the percentage of explained variance to 78%. Example - Decathlon results from the Athens Olympics 2004 5. Factor loadings & Rotation and interpretation of factors We try to achieve that each factor is correlated only with a certain group of variables and the correlations with the other variables are zero. The goal is to find meaningful factors. We select using the Varimax method. Example - Decathlon results from the Athens Olympics 2004 6. Interpretation. The first factor is clearly related to the results of short sprints and long jump - the better the result, the higher the value of the factor. F1 Speed factor. The strongest correlations of the second factor are with all "throwing" events and the high jump. F2 „Trunk“ strength (abdominal, back, core). The last third factor is clearly correlated with longer F3 distance running, which shows that this discipline is the most different from the others. sw IBM SPSS 28 Basic characteristics of observed variables Descriptive Statistics Mean Std. Deviation Analysis N v100_m 10,92 0,23 28 Long_jump 7,27 0,34 28 Shot_put 14,63 0,86 28 High_jump 1,98 0,09 28 v400_m 49,61 1,27 28 v110_m_hurdles 14,55 0,44 28 Discus_Throw 44,38 3,30 28 Pole_vault 4,73 0,29 28 Javelin 58,95 4,98 28 v1500_m 277,54 11,32 28 Conditions of use of factor analysis correlation coefficients are high we see positive and negative correlations, there is a conditional formatting tool for that in Excel The Kaiser-Meyer-Olkin measure takes on a value of 0.58, is high and indicates the appropriateness of using factor analysis Conditions of use of factor analysis Bartlett's test of sphericity: Test criterion value = 112.179 Number of degrees of freedom = 45 Significance ( = observed significance level) < 0.001 Conditions of use of factor analysis Significance level is < 0,001, we reject H0: The correlation matrix is unity (correlation coefficients off the diagonal are zero). Thus, the basic assumption for the use of factor analysis is fulfilled. Bartlett's test of sphericity: Test criterion value = 112.179 Number of degrees of freedom = 45 Significance ( = observed significance level) < 0.001 Conditions of use of factor analysis All KMO values for individual observed variables are satisfactory – greater than 0.5 Conditions of use of factor analysis Factor extraction Principal component method Factor extraction Principal component method Factor extraction Principal component method Factor extraction Principal component method Factor extraction Principal component method Factor extraction Principal component method absolute Factor extraction Principal component method percentage Factor extraction Principal component method in cumulative percentage form Factor extraction Principal component method Factor extraction Principal component method Factor extraction Principal component method Factor extraction Principal component method Factor extraction Principal component method Factor extraction Principal component method Factor extraction Principal component method Rotated Component Matrix Principal Component Analysis Finally the result! We can now try to interpret the factors Loads less than ±0.4 are deleted Transfer to sports training 1.What to train (it is not possible to train all 10 disciplines at the same time) to be the best decathlete? Transfer to sports training 1.What to train (it is not possible to train all 10 disciplines at the same time) to be the best decathlete? 2.You can't always generalize, but it's a 400m run and a shot put spin! 3.Why? It said Mr. Váňa, the coach of the Czech decathletes Roman Šebrle (gold and silver from the Olympics) and Tomáš Dvořák (3x world champion, bronze from the Olympics). Váňa bet on the speed of execution of individual disciplines. And what to do if we are looking for relationships and the conditions for classic tests known from statistics, such as linear regression, factor analysis, etc., are not met? de Oliveira Abrahão, A. A., Marcos de Andrade Júnior, É., de França Ferraz, A., Kuang Hongyu, & Fett, C. A. (2022). Factor Analysis for detection of sports talent in football players. Saúde e Pesquisa, 15(1), 1–12. https://doi.org/10.17765/2176-9206.2022v15n1.e9766 Jinrui Zhang, Zhiwen Zhang, Shuo Peng, Veloo, A., Bailey, R. P., & Wee Hoe Tan. (2023). Psychometric properties of the Chinese version of Sport Anxiety Scale-2. Frontiers in Psychology, 1–10. https://doi.org/10.3389/fpsyg.2023.1260253 Lan Zhou, Sang-Ho Lee, & Youshen Cao. (2022). An empirical analysis of sport for mental health from the perspective of a factor analysis approach. Frontiers in Psychology, 13. https://doi.org/10.3389/fpsyg.2022.960255 Li, X., Chen, J., Zhan, J., & Liu, L. (2016). A Study on Sports Tourism Competitiveness Based on Factor Analysis Method. 2016 12th International Conference on Computational Intelligence and Security (CIS), Computational Intelligence and Security (CIS), 2016 12th International Conference on, CIS, 673–676. https://doi.org/10.1109/CIS.2016.0162 Putra, M. F. P. (2022). Construct validity test of spirituality in sports test (SIST) using confirmatory factor analysis (CFA) method. Ovidius University Annals, Series Physical Education and Sport/Science, Movement and Health, 22(2), 139. Děkuji za pozornost Thank you for your attention Danke für Ihre Aufmerksamkeit