Understanding Statistics Outline u First regression, correlations, etc. u Then scaling Regression u y= output, such as income, support for welfare u c= constant u b= coefficient (an increase of one year of education increases income by 1000 crowns) u e=error Significance u p=probability that the relationship is NOT real u p<.05 u t>1.96 u Socially constructed norm u Would you accept a 10% risk of being wrong if you bought a stock for one crown and expected a profit of one million? u Would you accept a 1% risk that pressing a button could end the world? Case 1 Case 2 Factors influencing significance u Size of the population u Size of the error Strength of a variable u Coefficient u Standardized coefficient (0≤b≥1) u Correlation (similar to standardized coefficient, but only used with two variables and no constant) u r=variance (how much the dependent variable varies) u r۶explained variance Multiple Regression u Usually several variables influence the dependent variable u Example: income is influenced by years of education and gender u What does it mean if years of education has a coefficient of 1000 u What does it mean if gender has a coefficient of 5000? u Which variable explains income better? u Dummy variables = 0 or 1 Control variables u We want to control whether our variable really is explaining the result or whether some other underlying variable is really at work u Example: we might want to control if the labor market really discriminates against women who have the same jobs as men or whether the problem is that women choose different types of jobs u So we can add “working in the private sectors” as a control variable, since women are more likely to work in the private sector u If gender is still significant then it means that even women, who work in the private sector receive lower salaries than men u If gender is no longer significant it means that the real problem is that either women choose to work in the public sector (but why do they choose this?) u or that women cannot get jobs as easily in the private sector (again the question is why?) Comparing Models u R-square= the amount of variance that a model explains u The higher the r-square the better u But if we have 1,000 variables it will normally explain more variance than if we have one. What is the problem? u We want models to be as “parsimonious” as possible. u Adjusted r-square takes into account also the number of variables u Also we want all the variables to be statistically significant at the 5% level. Scaling u Often we cannot find one question that can measure exactly what we want to measure u But a collection of questions can measure it more clearly u An example is support for welfare u We also get a more exact answer if we can create a scale from 1-50 than from 1-4 u We can also use simpler statistical methods if we have a “continuous” dependent variable u If the dependent variable is 0-1 we need to use things like “logit” and “probit” u If it is 0-4 we should use things like “ordinal logit” or “ordinal probit” One Dimensional Scales u Cronbach alfa u Test to see if all the questions are consistent u We might think a group of questions belong together, but the respondents could interpret them differently u Cronbach’s alfa expects all the questions to measure approximately the same thing Mokken Scale u Is not used much, but it is also valuable u It allows us to test the consistency for questions that we can rank u For example: do you support the women’s right to abortion anytime she wants it? u Do you support the right for women to have an abortion if they are too poor to take care of the child? u Do you support the right for women to have an abortion if they were raped? u Do you support the right for women to have an abortion if their own life is threatened? Multidimensional scaling u Factor analysis u There can be several dimensions to an issue u For example: “support for big government” u One dimension could be support for welfare programs u Another dimension could be support for the military and police u A third dimension could be support for education u We use statistical programs to see which questions scale well together. u The most common is called “principle components analysis”