PV033 Scientific Data Processing

Faculty of Informatics
Spring 2006
Extent and Intensity
2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Vladimír Znojil, CSc. (lecturer)
Guaranteed by
prof. Ing. Václav Přenosil, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: doc. RNDr. Vladimír Znojil, CSc.
Timetable
Mon 13:00–15:50 B007
Prerequisites (in Czech)
! P033 Scientific Data Processing
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 19 fields of study the course is directly associated with, display
Course objectives (in Czech)
Výklad je zaměřen na použití statistických metod při zpracování rozsáhlých souborů vědecko-výzkumných dat. Součástí výkladu je seznámení s komerčními statistickými programy.
Syllabus
  • Data set, objects and atributes, types od data: alternative, categorial, quantitative. Elementary characteristics of methods for data gathering. Methods for description of the data: histogram, average, median, modus, alpha-quantils. Frequency functions and frequency density. Application to one- and two-dimensional data sets.
  • Elementary terms in the theory of probability. Discrete and continuous probability. Probability density and distribution functions. Stochastically independent and dependent phenomena, conditional probability. Bayes relation.
  • Elementary types of distribution functions, binomic, Poisson, normal and log-normal distribution. Their basic characteristics and applications. Some types of special distribution functions, restricted distributions.
  • The law of large numbers, central limiting theorems. Their importance for statistical evaluating and restricting assumptions of their validity.
  • Characteristics of distribution functions, moments and their characteristics, principals of testing various types of distributions. The role of normal distribution in statistics.
  • Interval estimation, confidence intervals separate and simultaneous.
  • Hypothesis testing, types of tests, sequential tests. Errors of the first and second types, their mutual relation. Parametric and non-parametric procedures. Some other modern approaches and comparison of various methods.
  • Frequent statistical calculations: correlation and regression, analysis of variance in simple and complex situations. The least square method, its advantages and disadvantages. Some interesting applications of LSM as a substitution of ANOVA.
  • Comparison of averages and deviations of experimental values, comparison of groups, Holms method.
  • Multidimensional data and methods of their processing: reduction of dimensionality and exploiting methods of data analysis. Representability of data and problems of data distortion. Statistical models of data sets.
  • Principal component analysis (PCA), method of "reciprocal averaging" (RA), detrended correspondence analysis (DCA).
  • Factor analysis, its tasks and methods, searching for factors and basic types of factor rotation. Relations and problems with interpreting of results. The use of factor analysis.
  • Cluster analysis: metrics of similarity spaces, the use of alternative and categorial data, "mixed data" and their metrics. Methods for evaluations of cluster distance. Hierarchic methods for clustering "from the top" and "from the bottom", non-hierarchic methods for clustering. Advantages and disadvantages of the methods. The method of "two-way clustering". Applications of cluster analysis in ecology and biology.
  • Discrimination analysis, selection of parametric space. Aposterior clasification probabilities. The use of discriminating methods in biology and medicine.
  • Heuristic methods of data analysis, GUHA methods. Their use and risks in using these.
  • Short review of what not to forget and what and when to use. Statistical programme packages and their content (Statgraph, BMDP, SPSS, SyStat, Statistica).
Literature
  • Sylaby přednášek, dokumentace statistických programových balíků.
Language of instruction
Czech
Further Comments
The course is taught annually.
The course is also listed under the following terms Spring 2003, Spring 2004, Spring 2005.
  • Enrolment Statistics (recent)
  • Permalink: https://is.muni.cz/course/fi/spring2006/PV033