Bi5444 Analysis of sequencing data

Faculty of Science
Autumn 2018
Extent and Intensity
2/1/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: z (credit).
Teacher(s)
Mgr. Eva Budinská, Ph.D. (lecturer)
doc. MUDr. Mgr. Marek Mráz, Ph.D. (lecturer)
Mgr. Jan Oppelt, Ph.D. (lecturer)
Hana Válková (assistant)
Supervisor
prof. RNDr. Ladislav Dušek, Ph.D.
Research Centre for Toxic Compounds in the Environment (RECETOX) - Chemistry Section - Faculty of Science
Contact Person: Mgr. Eva Budinská, Ph.D.
Supplier department: Research Centre for Toxic Compounds in the Environment (RECETOX) - Chemistry Section - Faculty of Science
Timetable
Mon 17. 9. to Fri 14. 12. Tue 9:00–10:50 A4-118, Tue 11:00–11:50 A4-118
Prerequisites
At least a basic knowledge of work in Linux system, knowledge of molecular biology and basic programming knowledge is expected. Knowing the basics of statistics and R is an advantage.
Course Enrolment Limitations
The course is offered to students of any study field.
Course objectives
Student at the end of the course will:
- know the latest NGS methods (next and third generation sequencing), their use and the type of data they produce.
- be able to distinguish the type of method based on the data. - know the basic scheme of data analysis.
- able to work with Linux, Bash and R at a level sufficient for analysis of NGS data.
- know how to select tools for data processing and apply them to real data.
- be able to analyze NGS data starting from quality control over alignment to the detection of deferentially expressed genes (in RNA-Seq), variants (CNV with SNP), genome assembly, etc.
Learning outcomes
Student at the end of the course will:
- know the latest NGS methods (next and third generation sequencing), their use and the type of data they produce.
- be able to distinguish the type of method based on the data. - know the basic scheme of data analysis.
- able to work with Linux, Bash and R at a level sufficient for analysis of NGS data.
- know how to select tools for data processing and apply them to real data.
- be able to analyze NGS data starting from quality control over alignment to the detection of deferentially expressed genes (in RNA-Seq), variants (CNV with SNP), genome assembly, etc.
Syllabus
  • 1. Introduction to NGS technologies: a brief introduction to biology, sequencing, history, NGS technologies and their applications, sample extraction, library preparation, basic glossary.
  • 2. The basic scheme of data analysis: how the data look like, definition of general steps in NGS data analysis, differences in dependence on the application (eg. variant calling vs RNA-Seq …), projects introduction.
  • 3. Student project assignment and introduction to software for data analysis: a brief introduction to work with Linux, Bash and R, data formats and the differences between them, on-line courses, discussion about projects.
  • 4. Quality control, data processing, specifications and start of work on projects: tools for quality control, Phred score, data pre-processing, examples on sample data.
  • 5. Alignment and post-processing: reference genome databases, annotations, the differences between them and application, explanations of alignment algorithms, differences between spliced/non-spliced ​​tools and their application, alignment quality control, alignment visualization.
  • 6. Theory to specifics parts of the analysis of the projects 1. (based on the student selections)
  • 7. Theory to specifics parts of the analysis of the projects 2. (based on the student selections)
  • 8. Theory to specifics parts of the analysis of the projects 3. (based on the student selections)
  • 9. Theory to specifics parts of the analysis of the projects 4. (based on the student selections)
  • 10. Projects processing/analysis, consultations.
  • 11. Projects processing/analysis and projects finalization, consultations.
  • 12. Presentation of the project results.
Teaching methods
The course will combine theoretical lectures with practical exercises and demonstrations on sample data.
Several biological problems/projects of various types of NGS data (RNA-Seq, WES, targeted genome sequencing, ChIP-Seq, paired-end, single-end, human DNA, plant DNA, ...) will be presented at the beginning of the semester. Students will be divided into groups and each group chooses a project which will analyze during the semester and use it as a data model for presented application methods. In the second half of the semester, students will present their interim results in lectures. Lectures will be then devoted to specific problems and methods for specific data analysis types.
Finally, students will conclude their semester results as a presentation in English.
Assessment methods
For successful completion of the course, students must achieve at least 20 points.
Max. 20 points (50% of total final assessment) will be granted for the processing of the project (max. 5 points for project presentations during the semester and max. 15 points for the elaboration of the project).
Students with an examination (as completion of the course) must take the final test, which will consist of 10 questions scored in total by 20 points. Points earned for this test will consist of 50% of total final assessment.
Enrollment to the examination is conditioned by submitting the project, in which student must reach at least 10 points (project evaluation takes 5 days).
Language of instruction
English
Further comments (probably available only in Czech)
Study Materials
The course is taught annually.
The course is also listed under the following terms Autumn 2015, Autumn 2016, autumn 2017, Autumn 2019.
  • Enrolment Statistics (Autumn 2018, recent)
  • Permalink: https://is.muni.cz/course/sci/autumn2018/Bi5444