Regression Regression u y= output, such as income, support for welfare u c= constant, does not change u b= coefficient (an increase of one year of education increases income by 1000 crowns) u e=error Example u Y = c +aX + e u Y= 2000 + 3000X u Income = 2000$ + 3000 (years of education) u If somebody studies for 10 years her expected income will be 2000$ +3000$ * 10years = 32,000$ u If she studies for 2 years her expected income will be 2000$ +3000$ * 2years = 8,000$ New Example u Y = c +aX + e u Y= 10 + 200X u y= Live births in Prague that year u X = Amount of hours they show films with Tom Cruise on TV that year u How many children will be born if they show 5 of his films and each film is 2 hours? The Answer u Y= 10 + 200 (5 x 2) u = 10 + 200 (10) u = 10 + 2000 u = 2010 Variables u Variables = vary u Both income and education change, so they vary u Variation = how much they vary u Explained variation (r-square) = we want to explain how much they vary Dependent and Independent Variables u Y (income) is the dependent variable, because its outcome depends on a different variable u X (Education) is an independent variable, because it exists independently of income and it influences income u A clearer case: gender influences income, but income does not influence gender The constant u Y= 2000 + 3000X u This means that even if somebody would not study any years at all he or she would have some income u Even people without an education can do some jobs like cleaning toilets, emptying garbage u (or becoming a politician?) u Even if we do not have a job, we get money from the state, family or friends (or through stealing?) u Education cannot explain everything, so what is left over comes under the constant The Errors u We expect that in each specific case there will be some error u But on the average the errors will be equal to 0 u That is, in some cases the actual income will be a little bit higher than expected, but in other cases it will be a little bit lower than expected u So the cases will cancel out each other Significance u p=probability that the relationship is NOT real u p<.05 u t>1.96 u Socially constructed norm u Would you accept a 10% risk of being wrong if you bought a stock for one crown and expected a profit of one million? u Would you accept a 1% risk that pressing a button could end the world? Case 1 Case 2 Factors influencing significance u Size of the population u Size of the error Strength of a variable u Coefficient u Standardized coefficient (0≤b≥1) u Correlation (similar to standardized coefficient, but only used with two variables and no constant) u r=variance (how much the dependent variable varies) u r۶explained variance More on Coefficients u The standardized coefficient makes it easier to compare the relative strength of two variables. u Example: income = 2000 + 1000 *Education u Or income = 2000 + 5000*gender u Gender has a higher efficient, but can only be 0 or 1, while education can be 1-25 u So if we can make both coefficients be between 0 and 1 then they are easier to compare Disadvantages of Standardizing u It is more difficult to interpret standardized coefficients u It is clear to say that income increases by 1000 dollars for evey year of education then to say that the standard deviation of income increases by .7 for every standardized increase in education. u Note: you do not have to know here what a standard deviation is, but it has to do with how much a measure deviates from the expected value Straight line? u Normally one should make a plot of the dependent and independent variables to see if it really makes a straight line u Sometimes it could be curved u Then we can use a log-function u But since we are working with attitudes, this is not necessary u Instead we will either use ordinal regression or scaling, which we will discuss later An example of why it only makes sense to use this kind of regression if we group questions together to a larger scale. Here I have created a chart for the bivariate regression where LESSREG is the dependent variable and SEX is the indpendent variable