FA035 Advanced methods in data analysis

Faculty of Science
Spring 2021
Extent and Intensity
2/1/0. 4 credit(s). Type of Completion: zk (examination).
Teacher(s)
Mgr. Filip Hroch, Ph.D. (lecturer)
doc. Ernst Paunzen, Dr.rer.nat (lecturer)
Dr. Martin Topinka, PhD. (lecturer)
Guaranteed by
prof. Mgr. Jiří Krtička, Ph.D.
Department of Theoretical Physics and Astrophysics – Physics Section – Faculty of Science
Contact Person: Mgr. Filip Hroch, Ph.D.
Supplier department: Department of Theoretical Physics and Astrophysics – Physics Section – Faculty of Science
Prerequisites (in Czech)
We are assuming some skills in mathematics: a common function handle (derivation, integration), linear algebra for matrix manipulations, descriptive statistics and basic distributions. Computer skills includes a basic knowledge of programming, and usual Unix tools.
Course Enrolment Limitations
The course is offered to students of any study field.
Course objectives (in Czech)
This course is considered as an introduction of selected methods of data analysis. The methods includes the time series processing, an object classification, and similar concepts. The analysis of light curves, or the spectral classification, and related topics are common examples of the methods in in astronomy. Only modern effective methods will be presented. A statistical analysis, will be focused on detailed handling and interpretation of full information of data, as well as on a reproducible processing. Some of methods will be presented in R, the open statistical language project, in Gnuplot and Python.
Learning outcomes (in Czech)
Students will get an overview in modern methods of processing of large datasets, as well as common data processing tools.
Syllabus (in Czech)
  • 1. Robust methods - principles - definition of robustness - kinds of robust methods: M-,R-,L-estimates - properties of robust methods - influence curve 2. Maximum likelihood method - principle, joint probability - Gauss, Poisson and uniform distribution - estimation of parameters 3. Robust mean - estimation of central moments - median - quantiles - derivation of the mean - properties - asymptotic estimates 4. Numerical methods - fast median and sorting - random number generators of various distributions - testing of statistical distributions 5. Numerical optimisation - methods without derivations - methods using derivations - constrained optimisation - regularisation 6. Robust regression - generalisation on multi-parameter estimates - co-variance matrix 7. Applications - photon (particles) fluxes - robust regression of line - spectral line profile 1. Bayesian principles - prior (noninformative) - marginalization, parameter penalty - bayes vs. frequentist - application to general linear model 2. Bayesian numerics - reparametrization (towards normality) - integration, Monte Carlo - random variates 3. Markov Chain MC - random walks, Gibbs sampler - Metropolis (Hastings) - simulated annealing, convergence 4. Bayesian application - simultaneous parameter estimation - signal processing 1. neuronové sítě 1. automatické metody spektrální klasifikace 1. šíření chyb 1. metody fitování izochron 1. metody Monte Carlo 1. metody vícerozměrného fitování 1. programování v R a v gnuplotu 1. fitování rozložení energie 1. statistické testy 1. analýzy časových řad 1. robustní metody - principy, definice robustnosti, druhy robustních metod: M-,R-,L-odhady, vlastnosti robustních metod, influence curve 1. metoda maximální věrohodnosti, princip, skládání pravděpodobností, normální, Poissonovo a rovnoměrné rozdělení, odhady parametrů 1. robustní průměr, odhad centrálních momentů, medián, kvantily, odvození robustního průměru, vlastnosti, asymptotické odhady 1. numerické metody, rychlé algoritmy pro medián a uspořádání, generátory náhodných čísel pro různé distribuce, testování statistických distribucí 1. numerické hledání minima funkcí, metody neužívající derivace, metody využívající derivace, vázané extrémy, regularisace 1. užití robustních metod: fotonové (částicové) toky, robustní regrese přímky, profily spektrálních čar Artificial Neural Networks - Automatic spectral classification methods - Error propagation - Isochrone fitting techniques - Monte Carlo Methods - Multidimensional fitting techniques - Programming in "Gnuplot" - Programming in "R" - SED fitting techniques - Statistical tests - Time Series Analysis techniques
Literature
  • MATLOFF, Norman S. The art of R programming : a tour of statistical software design. Eleventh printing. San Francisco: No Starch Press, 2011, xxiii, 373. ISBN 1593273843. info
  • JANERT, Philipp K. Gnuplot in action : understanding data with graphs. Edited by Colin D. Kelley - Thomas Williams. Greenwich: Manning, 2010, xxxi, 360. ISBN 9781933988399. info
  • STARCK, Jean-Luc and Fionn MURTAGH. Astronomical image and data analysis. 2nd ed. Berlin: Springer, 2006, xiv, 335. ISBN 3540330240. info
Teaching methods (in Czech)
A common form of teaching will be lectures iterative presenting given topics.
Assessment methods (in Czech)
A final project should be prepared, and referred, for the successful complete. Homeworks may be included.
Language of instruction
English
Further comments (probably available only in Czech)
Study Materials
The course is taught once in two years.
The course is taught: every week.
General note: L.
The course is also listed under the following terms Spring 2024.
  • Enrolment Statistics (Spring 2021, recent)
  • Permalink: https://is.muni.cz/course/sci/spring2021/FA035