M7DataSP Advanced Data Science Practicum

Faculty of Science
Autumn 2023
Extent and Intensity
0/2/1. 3 credit(s) (příf plus uk k 1 zk 2 plus 1 > 4). Type of Completion: z (credit).
Teacher(s)
Mgr. Eva Maršálková (lecturer)
Mgr. Petr Šimeček, MSc., Ph.D. (lecturer)
Mgr. Denisa Šrámková (lecturer)
Guaranteed by
doc. PaedDr. RNDr. Stanislav Katina, Ph.D.
Department of Mathematics and Statistics – Departments – Faculty of Science
Supplier department: Department of Mathematics and Statistics – Departments – Faculty of Science
Timetable of Seminar Groups
M7DataSP/01: Mon 8:00–9:50 MP1,01014, P. Šimeček
Prerequisites
It is expected that students have some experience with a programming language suitable for Data Analysis, e.g. Python or R. The code examples will be given in Python.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
The capacity limit for the course is 30 student(s).
Current registration and enrolment status: enrolled: 6/30, only registered: 0/30, only registered with preference (fields directly associated with the programme): 0/30
fields of study / plans the course is directly associated with
Course objectives
The main goal is to get hands-on experience with data analysis and machine learning methods. Also to deepen students' programming skills.
Learning outcomes
This course will enable students to
- predict dependent variable with linear or logistic regression
- examine unknown data using Principal Componen Analysis and/or clustering
- split data into training and testing sets, understand variance vs bias trade-off
- use classification and regression trees, forests, bagging and boosting (XGBoost, LightGBM, CatBoost)
- get basics of TensorFlow 2.0 and Keras, applying neural networks and fine-tuning to image and NLP data
- large language models
- recommendation algorithms (collaboration filtering)

As a side product, after on this course students will also practice
- data cleaning
- visualizations
- data transformation (group by, summary)
- working with git and GitHubem
- reproducible analysis and documents (Jupyter notebook, markdown, quatro)
- social skills, working in groups
Syllabus
  • The details can be found on GitHub https://github.com/simecek/dspracticum2023
Teaching methods
Each lecture will be focused on one dataset and problem on which we demonstrate a new data science skill. Students are expected to submit homework before each lecture.
Assessment methods
group homeworks (by group of 2-4 students), extra 30% optional final project (individual). To pass, you must achieve at least 70% points.
Language of instruction
Czech
Further Comments
Study Materials
Teacher's information
https://github.com/simecek/dspracticum2023
The course is also listed under the following terms Autumn 2020, autumn 2021, Autumn 2024.
  • Enrolment Statistics (Autumn 2023, recent)
  • Permalink: https://is.muni.cz/course/sci/autumn2023/M7DataSP