Bi7527 Data Analysis in R

Faculty of Science
Spring 2014
Extent and Intensity
2/0/0. 2 credit(s) (fasci plus compl plus > 4). Type of Completion: zk (examination).
Mgr. Eva Budinská, Ph.D. (lecturer)
RNDr. Ivana Ihnatová, Ph.D. (lecturer)
Guaranteed by
prof. RNDr. Ladislav Dušek, Ph.D.
RECETOX - Faculty of Science
Contact Person: Mgr. Eva Budinská, Ph.D.
Supplier department: RECETOX - Faculty of Science
Tue 8:00–9:50 A1/609 - IBA (A1,6.p, Kamenice 3)
Prerequisites (in Czech)
Bi5040 Biostatistics - basic course || Bi5045 Biostatistics for Comp. Biol.
Bi5040 Biostatistika – základní kurz, Bi8600 Vícerozměrné statistické metody, Bi8660 Analýza dat na PC II. Pro absolvování kurzu je nutná základní znalost používání programu R, dále znalost základních statistických metod nejméně v rozsahu předmětu Bi5040 Biostatistika-základní kurz a znalost vícerozměrných statistických metod v rozsahu předmětu Bi8600 Vícerozměrné statistické metody.
Course Enrolment Limitations
The course is offered to students of any study field.
The capacity limit for the course is 30 student(s).
Current registration and enrolment status: enrolled: 0/30, only registered: 0/30, only registered with preference (fields directly associated with the programme): 0/30
Course objectives
After attending this course, the student:
Understands the syntax of language R
Knows data structures in R
Knows the difference between a script and a function
Can create functions
Creates scripts for R batch commands and uses them
Knows the syntax of basic cycles and conditions (for, repeat, if...)
Can install packages of R functions
Automatically creates objects with names defined by a variable
Makes automatic scripts
Optimizes computational burden of algorithms by using less time-consuming functions(e.g. apply instead of for cycle)
Knows the options of connecting R with other programming languages (C, Python, Perl)
Loads and saves data files
Transforms matrices and other data tables
Can merge tables of different types
Effectively recodes variables
Performs hypothesis testing
Knows basic packages and functions for survival analysis and can apply them
Can perform univariate and multivariate linear regression
Knows functions for robust linear models
Applies different functions for data clustering
Can use selected functions for data classification (decision trees,SVM...)
Knows all possibilities of graph saving
Knows and works with basic graphical interface in R
Creates graphs in lattice and grid
Can create and save graphs in automatic script
Creates complex colour graphs
Knows how to set up graph resolution and creates graphs of publication quality
Saves graphs in different formats
Can create analysis plan and find and select the best functions
Can create a simple-to-follow script and additional functions for complex data analysis of example data
Will optimize this script from the computational burden point of view
Installs Bioconductor and understands its special data types
Knows the use of Bioconductor in bioinformatics analyses and performs basic analyses
  • 1. Introducing advanced R programming (Lectures 01-02)
  • 2. Fundamentals of optimal scripting (Lecture 03)
  • 3. Data pre-processing and transformations (Lectures 04-06)
  • 4. Basic R packages for statistics and data mining (Lectures 07-08)
  • 5. Graphical outputs in R (Lectures 9-11)
  • 6. A complex data analysis example (Lecture 12)
  • 7. Introduction to Bioconductor (Lecture 13)
    recommended literature
  • TORGO, Luís. Data mining with R : learning with case studies. Boca Raton: Chapman and Hall/CRC, 2011. xv, 289. ISBN 9781439810187. info
  • MATLOFF, Norman S. The art of R programming : tour of statistical software design. San Francisco: No Starch Press, 2011. xxiii, 373. ISBN 1593273843. info
  • GENTLEMAN, Robert. R programming for bioinformatics. Boca Raton: CRC Press, 2009. xii, 314. ISBN 9781420063677. info
  • MURRELL, Paul. R graphics. Boca Raton, Fla.: Chapman & Hall/CRC, 2006. xix, 301. ISBN 158488486X. info
  • Bioinformatics and computational biology solutions using R and bioconductor. Edited by Robert Gentleman. New York: Springer, 2005. xix, 473. ISBN 0387251464. info
Teaching methods
The lectures are combined with exercises. The basics and theory are explained first, followed by hands-on practicals with fitted examples. Special homework exercises are given to students of which the best solutions are awarded by bonus points, which are accounted for in the final mark. The number of students in the course must not exceed the number of available computers (student notebooks included). Students are motivated to propose and discuss their own algorithmic solutions to particular problems.
Assessment methods
During the course,students will be given a chance to solve special exercises, each of which will be awarded 0.5 or 1 points. The final practical test will be in R, consisting of 8 tasks. Students have to hand in both solutions (tables,pictures, answers) and the corresponding R code. Maximal number of points to obtain in the test is 20. It is allowed to use the study materials. 11 points in total (from both special exercises and the test) are needed to pass: 20-19: A, 18-17:B, 16-15: C, 14-13: D, 12-11: E, 10-0: F
Language of instruction
Further comments (probably available only in Czech)
The course is taught annually.
General note: Předmět je vyučován blokově.
Information on course enrolment limitations: Doporučení absolvovat Bi8600, DSMBz01, Bi3060
The course is also listed under the following terms Spring 2011 - only for the accreditation, Autumn 2009, Spring 2011, Spring 2012, spring 2012 - acreditation, Spring 2013, Spring 2015, Spring 2016, Spring 2017, spring 2018, Spring 2019, Spring 2020.
  • Enrolment Statistics (Spring 2014, recent)
  • Permalink: