Bi7496 Modern regression and classification techniques in computational biology

Faculty of Science
Spring 2011 - only for the accreditation
Extent and Intensity
0/0. 4 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: z (credit).
Teacher(s)
prof. Michael Schimek, Ph.D. (lecturer)
RNDr. Tomáš Pavlík, Ph.D. (seminar tutor)
RNDr. Eva Gelnarová (seminar tutor)
Guaranteed by
prof. RNDr. Jiří Hřebíček, CSc.
RECETOX – Faculty of Science
Prerequisites
Students should be familiar with the basics of the regression modelling. There will be an introduction into R at the beginning of the computer laboratory.
Course Enrolment Limitations
The course is offered to students of any study field.
Course objectives
The aim of this course is to introduce students to modern regression and classification methods and its extensions that constitute a core part of multivariate statistics. The goals can be summarised as follows:
To demonstrate various new approaches that have been developed in last twenty years and which are appropriate for the analysis of biodata.
To describe the assumptions of parametric models and how to check them.
To show how to control the model flexibility when our task is fitting models to quantitative observations.
To teach students how to perform predictive tasks, e.g. in risk estimation.
To demostrate how to cope with the size and complexity of the data using special techniques that have been proposed recently.
Syllabus
  • * Typical applications of regression and classification techniques in computational biology.
  • * Typical data structures (errors, complexity, size, and dimensionality) in the modern biosciences.
  • * The concept of regression model fitting.
  • * The concept of statistical learning (prediction).
  • * Curse of dimensionality and ill-posed problems (incl. n much smaller then p-problem).
  • * Complexity control, regularization, and penalization.
  • * The role of computing and algorithms.
  • * Introduction to smoothing techniques (including k-Nearest-Neighbors).
  • * Generalized additive non- and semiparametric regression models.
  • * Metric, distance, and similarity.
  • * Regression and classification trees.
  • * Linear classification methods and extensions.
  • * Nonparametric classification methods.
  • * Support vector machines as statistical learning tool.
Literature
  • Hastie T., Tibshirani R., and Friedman J. The elements of statistical learning - data mining, inference and prediction. Springer, NewYork, 2001.
Teaching methods
In the lectures (2 hours) selected statistical approaches and appropriate computer concepts are introduced. In the computer laboratory (2 hours) applied data problems are discussed and analyzed with R procedures.
Assessment methods
Each student is requested to do a small case study on her/his own as an exercise. The results should be summarized in a short written report in English and subsequently presented to other students for discussion.
Language of instruction
English
Further Comments
The course is taught only once.
The course is also listed under the following terms Autumn 2007 - for the purpose of the accreditation, Spring 2007, Autumn 2007, Autumn 2008, Autumn 2009, Spring 2010, Spring 2011, Spring 2012, spring 2012 - acreditation, Spring 2013, Spring 2014.