Bi7496 Modern regression and classification techniques in computational biology

Přírodovědecká fakulta
podzim 2008
Rozsah
0/0. 4 kr. (plus ukončení). Doporučované ukončení: zk. Jiná možná ukončení: z.
Vyučující
prof. Michael Schimek, Ph.D. (přednášející)
RNDr. Tomáš Pavlík, Ph.D. (cvičící)
RNDr. Eva Gelnarová (cvičící)
Garance
prof. RNDr. Jiří Hřebíček, CSc.
Přírodovědecká fakulta
Omezení zápisu do předmětu
Předmět je otevřen studentům libovolného oboru.
Cíle předmětu
Regression and classification methods based on the classical linear model and its extensions constitute a core part of multivariate statistics. Their important role in computational biology and medicine is obvious. However, over the last twenty years various new approaches have been developed which are more appropriate for the analysis of biodata. Most of them relax parametric model assumptions and add additional flexibility when our task is fitting models to quantitative observations. Others are destined to perform predictive tasks, e.g. in risk estimation. The latter belong to the group of statistical learning procedures. An additional complication is the size and complexity of the data we wish to analyse. Special techniques have been proposed to handle huge data sets and n much smaller then p-problems (typical for genetic data). Modern regression and classification techniques heavily rely on efficient computing. We take advantage of the open source R statistics and graphics environment. In the lectures (2 hours) selected statistical approaches and appropriate computer concepts are introduced. In the computer laboratory (2 hours) applied data problems are discussed and analyzed with R procedures. Moreover each student is requested to do a small case study on her/his own as an exercise. The results should be summarized in a short written report in English. There will be an introduction into R at the beginning of the computer laboratory.
Osnova
  • * Typical applications of regression and classification techniques in computational biology. * Typical data structures (errors, complexity, size, and dimensionality) in the modern biosciences. * The concept of regression model fitting. * The concept of statistical learning (prediction). * Curse of dimensionality and ill-posed problems (incl. n much smaller then p-problem). * Complexity control, regularization, and penalization. * The role of computing and algorithms. * Introduction to smoothing techniques (including k-Nearest-Neighbors). * Generalized additive non- and semiparametric regression models. * Metric, distance, and similarity. * Regression and classification trees. * Linear classification methods and extensions. * Nonparametric classification methods. * Support vector machines as statistical learning tool.
Vyučovací jazyk
Angličtina
Další komentáře
Předmět je vyučován jednorázově.
Předmět je zařazen také v obdobích podzim 2007 - akreditace, jaro 2011 - akreditace, jaro 2007, podzim 2007, podzim 2009, jaro 2010, jaro 2011, jaro 2012, jaro 2012 - akreditace, jaro 2013, jaro 2014.