FI:PV033 Scientific Data Processing - Course Information
PV033 Scientific Data Processing
Faculty of InformaticsSpring 2003
- Extent and Intensity
- 2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
- Teacher(s)
- doc. RNDr. Vladimír Znojil, CSc. (lecturer)
- Guaranteed by
- prof. PhDr. Karel Pala, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. RNDr. Vladimír Znojil, CSc. - Timetable
- Mon 14:00–16:50 B411
- Prerequisites (in Czech)
- ! P033 Scientific Data Processing
- Course Enrolment Limitations
- The course is also offered to the students of the fields other than those the course is directly associated with.
- fields of study / plans the course is directly associated with
- Informatics (programme FI, B-IN)
- Informatics (programme FI, M-IN)
- Upper Secondary School Teacher Training in Informatics (programme FI, M-IN)
- Upper Secondary School Teacher Training in Informatics (programme FI, M-SS)
- Information Technology (programme FI, B-IN)
- Course objectives (in Czech)
- Výklad je zaměřen na použití statistických metod při zpracování rozsáhlých souborů vědecko-výzkumných dat. Součástí výkladu je seznámení s komerčními statistickými programy.
- Syllabus
- Data set, objects and atributes, types od data: alternative, categorial, quantitative. Elementary characteristics of methods for data gathering. Methods for description of the data: histogram, average, median, modus, alpha-quantils. Frequency functions and frequency density. Application to one- and two-dimensional data sets.
- Elementary terms in the theory of probability. Discrete and continuous probability. Probability density and distribution functions. Stochastically independent and dependent phenomena, conditional probability. Bayes relation.
- Elementary types of distribution functions, binomic, Poisson, normal and log-normal distribution. Their basic characteristics and applications. Some types of special distribution functions, restricted distributions.
- The law of large numbers, central limiting theorems. Their importance for statistical evaluating and restricting assumptions of their validity.
- Characteristics of distribution functions, moments and their characteristics, principals of testing various types of distributions. The role of normal distribution in statistics.
- Interval estimation, confidence intervals separate and simultaneous.
- Hypothesis testing, types of tests, sequential tests. Errors of the first and second types, their mutual relation. Parametric and non-parametric procedures. Some other modern approaches and comparison of various methods.
- Frequent statistical calculations: correlation and regression, analysis of variance in simple and complex situations. The least square method, its advantages and disadvantages. Some interesting applications of LSM as a substitution of ANOVA.
- Comparison of averages and deviations of experimental values, comparison of groups, Holms method.
- Multidimensional data and methods of their processing: reduction of dimensionality and exploiting methods of data analysis. Representability of data and problems of data distortion. Statistical models of data sets.
- Principal component analysis (PCA), method of "reciprocal averaging" (RA), detrended correspondence analysis (DCA).
- Factor analysis, its tasks and methods, searching for factors and basic types of factor rotation. Relations and problems with interpreting of results. The use of factor analysis.
- Cluster analysis: metrics of similarity spaces, the use of alternative and categorial data, "mixed data" and their metrics. Methods for evaluations of cluster distance. Hierarchic methods for clustering "from the top" and "from the bottom", non-hierarchic methods for clustering. Advantages and disadvantages of the methods. The method of "two-way clustering". Applications of cluster analysis in ecology and biology.
- Discrimination analysis, selection of parametric space. Aposterior clasification probabilities. The use of discriminating methods in biology and medicine.
- Heuristic methods of data analysis, GUHA methods. Their use and risks in using these.
- Short review of what not to forget and what and when to use. Statistical programme packages and their content (Statgraph, BMDP, SPSS, SyStat, Statistica).
- Literature
- Sylaby přednášek, dokumentace statistických programových balíků.
- Language of instruction
- Czech
- Further Comments
- The course is taught annually.
- Enrolment Statistics (Spring 2003, recent)
- Permalink: https://is.muni.cz/course/fi/spring2003/PV033