FI:IA176 Safe and Explainable AI - Course Information
IA176 Safe and Explainable AI
Faculty of InformaticsAutumn 2025
- Extent and Intensity
- 2/1/1. 4 credit(s) (plus extra credits for completion). Type of Completion: zk (examination).
In-person direct teaching - Teacher(s)
- prof. Dr. rer. nat. RNDr. Mgr. Bc. Jan Křetínský, Ph.D. (lecturer)
Sabine Rieder, M.Sc. (seminar tutor) - Guaranteed by
- prof. Dr. rer. nat. RNDr. Mgr. Bc. Jan Křetínský, Ph.D.
Department of Computer Science – Faculty of Informatics
Supplier department: Department of Computer Science – Faculty of Informatics - Timetable
- Mon 15. 9. to Mon 15. 12. Mon 12:00–13:50 B204
- Timetable of Seminar Groups:
- Course Enrolment Limitations
- The course is also offered to the students of the fields other than those the course is directly associated with.
- fields of study / plans the course is directly associated with
- there are 39 fields of study the course is directly associated with, display
- Abstract
- We discuss various aspects of dependable and trustworthy use of AI. We focus on different ways of enhancing its safety and explainability.
- Learning outcomes
- The student should be aware of the various aspects related to dependability of AI-aided systems and able to choose and apply state-of-the-art techniques to ensure ethically correct design, construction, deployment and use of AI.
- Key topics
- Trustworthy AI
- - adaptability & intelligence vs. reliability & algorithmic transparency
- - multidisciplinary aspects (legal - AI act & regulations and certifications; ethical - societal complexity and diversity; psychological - human oversight; mathematical and technical)
- - bias & fairness, robustness, explainability, safety, security, accountability etc.; decision making under uncertainty, epistemic vs aleatoric uncertainty
- Safety
- - notions of safety (accuracy, PAC - probably approximately correct, correctness w.r.t. specification), specifying requirements (King Midas problem, reward hacking and specification gaming)
- - training (training safely vs training safe systems): (i) safe reinforcement learning, (ii) adversarial attacks & training, (iii) integrating NN learning and discrete solvers
- - testing and validation (statistics, predefined vs generated data sets)
- - verification of (i) AI systems (techniques for NN: SMT, abstract interpretation and bound propagation, abstraction), (ii) AI-controlled systems (AI controller + cyber-physical plant: probabilistic verification, Lyapunov and barrier functions, martingales)
- - runtime monitoring, runtime enforcement, shielding and sand-boxing
- - LLMs: temperature and hallucination, LLM and knowledge graphs, LLM as a judge
- - Agentic AI (agency, sensors, evolution, Gorilla problem)
- Explainability
- - Explainability, transparency, interpretability
- - Explanations: types and techniques - feature attribution (saliency), causal & counterfactual, rule-based (Horn clauses, decision trees), concept-based (bottleneck models), surrogate models, inverse reinforcement learning
- Approaches, practices, and methods used in teaching
- lectures, excercises, projects, homework, flipped classrooms
- Method of verifying learning outcomes and course completion requirements
- Final grading is based on homework and a written closed-book final exam (without any reading materials).
- Language of instruction
- English
- Further Comments
- Study Materials
The course is taught annually.
- Enrolment Statistics (recent)
- Permalink: https://is.muni.cz/course/fi/autumn2025/IA176