#
PřF:Bi7527 Data Analysis in R - Course Information

## Bi7527 Data Analysis in R

**Faculty of Science**

spring 2018

**Extent and Intensity**- 2/0/0. 2 credit(s) (fasci plus compl plus > 4). Type of Completion: zk (examination).
**Teacher(s)**- Mgr. Eva Budinská, Ph.D. (lecturer)

RNDr. Ivana Ihnatová, Ph.D. (lecturer)

Mgr. Barbora Zwinsová (lecturer) **Guaranteed by**- prof. RNDr. Ladislav Dušek, Ph.D.

RECETOX - Faculty of Science

Contact Person: Mgr. Eva Budinská, Ph.D.

Supplier department: RECETOX - Faculty of Science **Timetable**- Wed 10:00–11:50 A1/609 - IBA (A1,6.p, Kamenice 3)
**Prerequisites**(in Czech)-
**Bi5040**Biostatistics - basic course ||**Bi5045**Biostatistics for Comp. Biol.

Bi5040 Biostatistika – základní kurz, Bi8600 Vícerozměrné statistické metody, Bi8660 Analýza dat na PC II. Pro absolvování kurzu je nutná základní znalost používání programu R, dále znalost základních statistických metod nejméně v rozsahu předmětu Bi5040 Biostatistika-základní kurz a znalost vícerozměrných statistických metod v rozsahu předmětu Bi8600 Vícerozměrné statistické metody. **Course Enrolment Limitations**- The course is offered to students of any study field.

The capacity limit for the course is 30 student(s).

Current registration and enrolment status: enrolled:**3**/30, only registered:**0**/30, only registered with preference (fields directly associated with the programme):**0**/30 **Course objectives**- After attending this course, the student:

Understands the syntax of language R

Knows data structures in R

Knows the difference between a script and a function

Can create functions

Creates scripts for R batch commands and uses them

Knows the syntax of basic cycles and conditions (for, repeat, if...)

Can install packages of R functions

Automatically creates objects with names defined by a variable

Makes automatic scripts

Optimizes computational burden of algorithms by using less time-consuming functions(e.g. apply instead of for cycle)

Knows the options of connecting R with other programming languages (C, Python, Perl)

Loads and saves data files

Transforms matrices and other data tables

Can merge tables of different types

Effectively recodes variables

Performs hypothesis testing

Applies different functions for data clustering

Knows all possibilities of graph saving

Knows and works with basic graphical interface in R

Creates graphs in lattice and grid

Can create and save graphs in automatic script

Creates complex colour graphs

Knows how to set up graph resolution and creates graphs of publication quality

Saves graphs in different formats

Can create analysis plan and find and select the best functions

Can create a simple-to-follow script and additional functions for complex data analysis of example data

Will optimize this script from the computational burden point of view **Syllabus**- 1. lecture - Introduction in R (history of R; advantages and disadvantages of R; basics of R - setting of working directory, basic commands, operators, libraries; help, definition if object in R)
- 2. lecture - Selection of projects.
- 1.-3. lecture - Objects in R (vector; matrix; data frame; list and others)
- 4.-5. lecture - Loading and saving of files, data pre-processing and transformation
- 6.-7. lecture - R programming and fundamentals of optimal scripting
- 8.-9. lecture - Graphical outputs in R (traditional graphics; Lattice (Trellis); Grid; ways of saving graphs)
- 10. lecture - Multivariable analysis, a complex analysis example
- 11.lecture - Creating new packages
- 12. lecture - Connecting R with C
- 13. lecture - Project evaluation

**Literature**- TORGO, Luís.
*Data mining with R : learning with case studies*. Boca Raton: Chapman and Hall/CRC, 2011. xv, 289. ISBN 9781439810187. info - MATLOFF, Norman S.
*The art of R programming : tour of statistical software design*. San Francisco: No Starch Press, 2011. xxiii, 373. ISBN 1593273843. info - GENTLEMAN, Robert.
*R programming for bioinformatics*. Boca Raton: CRC Press, 2009. xii, 314. ISBN 9781420063677. info - MURRELL, Paul.
*R graphics*. Boca Raton, Fla.: Chapman & Hall/CRC, 2006. xix, 301. ISBN 158488486X. info *Bioinformatics and computational biology solutions using R and bioconductor*. Edited by Robert Gentleman. New York: Springer, 2005. xix, 473. ISBN 0387251464. info

*recommended literature*- TORGO, Luís.
**Teaching methods**- The lectures are combined with exercises. The basics and theory are explained first, followed by hands-on practicals with fitted examples. Special homework exercises are given to students of which the best solutions are awarded by bonus points, which are accounted for in the final mark. The number of students in the course must not exceed the number of available computers (student notebooks included). Students are motivated to propose and discuss their own algorithmic solutions to particular problems.
**Assessment methods**- During the course,students can get up to 5 points for solving special exercises.

At the last lecture, students can get up to 5 points for the project. The functionality and clarity of the script will be evaluated in the context of the project goals. The project is mandatory.

The final practical test will be in R, consisting of several tasks. Students have to hand in both solutions (tables,pictures, answers) and the corresponding R code. Maximal number of points to obtain in the test is 20. It is allowed to use the study materials. 17.5 points in total (from special exercises + project + test) are needed to pass, of which at least 3 points have to be from the project.

<17.5 F, ≤20 E, ≤22.5 D, ≤25 C, ≤27.5 B, ≤30 A **Language of instruction**- Czech
**Further comments (probably available only in Czech)**- Study Materials

The course is taught annually.

General note: Předmět je vyučován blokově.

Information on course enrolment limitations: Doporučení absolvovat Bi8600, DSMBz01, Bi3060

- Enrolment Statistics (spring 2018, recent)
- Permalink: https://is.muni.cz/course/sci/spring2018/Bi7527