Bi7490 Advanced non-parametric methods

Faculty of Science
Spring 2011
Extent and Intensity
2/1/0. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium).
Teacher(s)
Mgr. Klára Komprdová, Ph.D. (seminar tutor)
prof. Ing. Jiří Holčík, CSc. (alternate examiner)
Guaranteed by
prof. RNDr. Ladislav Dušek, Ph.D.
RECETOX – Faculty of Science
Contact Person: prof. RNDr. Ladislav Dušek, Ph.D.
Timetable
Wed 17:00–19:50 F01B1/709
Prerequisites
Bi5040 Biostatistics - basic course && Bi8600 Multivariate Statistical Meth.
Knowledge on basic unidimensional exploratory statistical techniques, analysis of variance, correlation analysis.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
Course objectives
At the end of the course, students should be able to:
- critically evaluate the data set in terms of distribution of data
- use classification and regression nonparametric methods
- validate the model outputs using different validation techniques
- compare results from different models
- acquisition of various software to create models(R-project, Matlab, Statistica)
- compare the advantages and disadvantages of different methods
Syllabus
  • Introduction to Nonparametric Methods

  • Basic concepts - process modeling, types of variables, classification model, classification x regression, parametric and nonparametric multivariate statistics - a comparison of different approaches, the introduction of various software (statistics, R-project, MATLAB)

  • Decision tree I

  • tree topology, criterial statistics, stability of the tree, crossvalidation, measurement of accuracy, tree pruning, surrounding variables, classification vs. regression trees, CART algorithm, the advantages and disadvantages of decision trees

  • Decision tree II

  • another algorithm of building tree: Patient Rule Induction Method (PRIM), Chi-squared Automatic Interaction Detector (CHAID), Quick, Unbiased and Efficient Statistical Tree (QUEST), Hierarchical Mixture of Experts (HME), Multivariate Adaptive Regression Splines (MARS)

  • Random Forests I

  • extension of decision trees, creation of validation of forests, different types of forests: Bagging, Boosting, Arcing

  • Random Forests II

  • measuring importance of variables, the effect of variables on the prediction, clustering, outlier detection, precision, prediction

  • Accuracy of models I

  • matrix of confusion, definition of threshold dependent and independent indexes threshold dependent indexes: Normalized Mutual Information (MI), - Average of Mutual Information (AMI), Overall Accuracy, Cohenovo kappa, Tau index

  • Accuracy of models II

  • threshold independent indexes, specificity x sensitivity, Receiver Operating Characteristic curve (ROC) , Area Under the ROC Curve (AUC), coefficient of determination R2, deviation D2, maximum overall accuracy MXOA, maximum kappa (MXKp), Mean cross entropy (MXE), Mean absolute prediction error (MAPE)

  • Validation technique I

  • validation, testing and training subsets, analytical methods for validation: Akaike's information criterion (AIC), Bayesian information criterion (BIC), Minimum description length (MDL), Structural risk minimization (SRM)

  • Validation technique II

  • Monte Carlo methods, principles of resampling techniques: simple splitting, crossvalidation, bootstrap and jackknife

  • Real examples of using nonparametric models:

  • Predictive modeling of species occurrence, concentration of pollutants

Literature
  • Lažanský et. Kol.: Umělá inteligence I.- IV.
  • Legendre P., Legendre L. (1998) Numerical ecology (second ed.), Elsevier, Amsterdam
  • Jan Klaschka, Emil Kotrč: Klasifikační a regresní lesy, sborník konference ROBUST 2004
  • Breiman, L. et al (1984) Classification and Regression Trees, Chapman and Hall
  • Hastie T., Tibshirani R., Friedman J.: The Elements of Statistical Learning, Data mining, Inference and Prediction, Springer 2003
  • Breiman L. (2001) Random forests. Machine Learning 45, pp. 5 32.
  • Breiman L. (1996) Bagging predictors. Machine Learning 24, pp.123 140.
  • McCullagh C. E., Searle S. R. (2001): Generalized, Linear, and Mixed Models, John Wiley & Sons.
  • MANLY, Bryan F. J. Randomization, bootstrap and Monte Carlo methods in biology. 3rd ed. Boca Raton, Fla.: Chapman & Hall, 2007, 455 s. ISBN 9781584885412. info
  • EDGINGTON, Eugene S. and Patrick ONGHENA. Randomization tests. 4th ed. Boca Raton, FL: Chapman & Hall/CRC, 2007, 345 s. ISBN 9781584885894. info
Teaching methods
Education is performed as lectures with PowerPoint presentation. Each lecture block will be supplemeted with practical lesson on PC where different approaches will be tested on various SW. Real examples from experimental bilology, ecology and chemistry will be presented during these lectures. Students are asked to interpret results of practical examples. Student develop a project on a selected topic during the semester.
Assessment methods
Final assesment (at the end of semester) is combination of written examination and project evaluation.
Language of instruction
Czech
Further Comments
Study Materials
The course can also be completed outside the examination period.
The course is taught annually.
Teacher's information
http://www.cba.muni.cz/vyuka/
The course is also listed under the following terms Spring 2008 - for the purpose of the accreditation, Spring 2011 - only for the accreditation, Autumn 2002, Autumn 2003, Autumn 2004, Spring 2006, Spring 2007, Spring 2008, Spring 2009, Spring 2010, Spring 2012, spring 2012 - acreditation, Spring 2013, Autumn 2014, Autumn 2015, Autumn 2019, Autumn 2020, autumn 2021.
  • Enrolment Statistics (Spring 2011, recent)
  • Permalink: https://is.muni.cz/course/sci/spring2011/Bi7490