ENV006 Statistical Thinking and Data Treatment

Faculty of Science
Autumn 2018

The course is not taught in Autumn 2018

Extent and Intensity
1/2. 3 credit(s) (fasci plus compl plus > 4). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. Mgr. Dominik Heger, Ph.D. (lecturer)
Mgr. Ján Krausko (seminar tutor)
Mgr. Ľubica Vetráková, Ph.D. (seminar tutor)
Guaranteed by
prof. RNDr. Jana Klánová, Ph.D.
Research Centre for Toxic Compounds in the Environment (RECETOX) - Chemistry Section - Faculty of Science
Contact Person: doc. Mgr. Dominik Heger, Ph.D.
Supplier department: Research Centre for Toxic Compounds in the Environment (RECETOX) - Chemistry Section - Faculty of Science
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
Fields of study the course is directly associated with
Course objectives
Statistics is a science of drawing conclusions from data. The aim of this introductory statistical course is to learn how to think about the data, and explain few basic concepts in statistics, all applied on practical examples mostly form natural sciences. The course consists of three parts: 1. Descriptive statistics. 2. Probability 3. Inference. Each class weakly requires reading of the textbooks and working down the homework. At the end of the course, the student should be able to appreciate the language of statistics, independently solve certain the problems and know where to look for more advanced methods. The course offers a minimum knowledge to understand terms like standard error of mean, coefficient of determination and regression, statistical testing and others.
Syllabus
• 1. Statistics, data, variables 1. Variable, data, statistics 2. Categories of variables 3. Depicting a categorical variable - bar graph 4. Physical quantities and their notation 5. Number of significant figures 6. Accuracy, precision, and uncertainty 7. Depicting a quantitative variable - Stem and leaf plot
• 2. Measures of location and spread 1. Depicting a quantitative variable - histogram, percentiles 2. Frequency distribution - histogram, distribution function 3. Measures of location – median, mode, mean 4. Markov's inequality 5. Root meand square 6. Measures of spread – range, interquartile range, Variance, standard deviation, confidence limits
• 3. Measures of spread 1. SD 2. Chebychev’s inequality 3. Error propagation
• 4. Normal distribution 1. Affine transformation 2. Standard units 3. Normal distribution
• 5. Relations between two variables 1. Scatterplot and visual inspection of association 2. Bivariant data, Point of averages. 3. Post Hoc Ergo Propter Hoc fallacy. 4. Linear association - correlation. 5. Correlation coefficient 6. Ecological correlation
• 6. Regression, Regression analysis 1. Point of averages 2. Graph of averages 3. Regression line 4. Interpolation x extrapolation 5. Vertical residual 6. Regression diagnostics 7. RMS error in regression
• 7. Probability 1: How to count without counting? 1. What is randomness? 2. Probability of one draw (additional rule, inclusion-exclusion formula, complement rule). 3. Probability of multiple draws (multiplication rules with and without replacement). 4. Probability distributions. 5. Binomial distribution and formula for random variable. 6. Hypergeometrical distribution and formula for simple random samples.
• 8. Probability 2: Large random samples 1. What is randomness? 2. Probability of one draw (additional rule, inclusion-exclusion formula, complement rule). 3. Probability of multiple draws (multiplication rules with and without replacement). 4. Probability distributions. 5. Binomial distribution and formula for random variable. 6. Hypergeometrical distribution and formula for simple random samples.
• 9. Probability 3: Central limit theorem 1. Binomial distribution - expected value and standard error 2. de Moivre-Laplace Theorem 3. Central limit theorem 4. Sampling without replacement: the correction factor
• 10. Inference 1 1. Population x Sample; Parameter x Estimator (Statistics) 2. Mean squared error; Unbiased estimator 3. s* - Bootstrap estimate of SD of the box 4. Sample standard deviation - s 5. Confidence interval 6. Statistical tests
• 11. Inference 2 1. One-sample z test - Significance level, power of the test, P-value - the observed significance level 2. Two-tailed test 3. One-sample t-test 4. Two samples test - standard error of the difference, s_pooled 5. Paired-samples 6. Multisample hypotheses - ANOVA, Multiple comparisons - Tukey honestly significant difference test, Goodness of the fit 7. R^2 - coefficient of determination
Literature
• https://www.edx.org/course/uc-berkeleyx/uc-berkeleyx-stat2-1x-introduction-1138#.VBhMghaqIhY http://www.stat.berkeley.edu/~stark/SticiGui/ Jerrold H. Zar: Biostatistical Analsis Statistics (4th edition) by Freedman, Pisani, and Purves; W.W. Norton, 2007
Teaching methods
Lectures are intended to verbally explain the problems with the help of blackboard. Sometimes they are supported by necessary presentations. There are always handful of practical examples that we try to solve together during the class - mostly on the blackboards, sometimes with the help of some software (Excel, Statiscica, R, Maple]. Each lecture ends with an overview and is summarized by comprehension questions, which are used for the discussion at the beginning of next class. Students are asked to explain topics to their colleagues. Each week, there is a recommended reading in the online textbook, including a video and java active exercises (http://www.stat.berkeley.edu/~stark/SticiGui/index.htm). Also, there are an IS graded homework, weekly.
Assessment methods
Grading rules: Homework 25 % Partial Tests 25 % Final Test 50 % Oral exam - if required A > 90 %, B > 80 %, C > 70 %, D > 60 %, E > 50 %
Language of instruction
English