IA176 Safe and Explainable AI

Faculty of Informatics
Autumn 2025
Extent and Intensity
2/1/1. 4 credit(s) (plus extra credits for completion). Type of Completion: zk (examination).
In-person direct teaching
Teacher(s)
prof. Dr. rer. nat. RNDr. Mgr. Bc. Jan Křetínský, Ph.D. (lecturer)
Sabine Rieder, M.Sc. (seminar tutor)
Guaranteed by
prof. Dr. rer. nat. RNDr. Mgr. Bc. Jan Křetínský, Ph.D.
Department of Computer Science – Faculty of Informatics
Supplier department: Department of Computer Science – Faculty of Informatics
Timetable
Mon 15. 9. to Mon 15. 12. Mon 12:00–13:50 B204
  • Timetable of Seminar Groups:
IA176/01: Fri 19. 9. to Fri 19. 12. Fri 10:00–11:50 C416, S. Rieder
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
there are 39 fields of study the course is directly associated with, display
Abstract
We discuss various aspects of dependable and trustworthy use of AI. We focus on different ways of enhancing its safety and explainability.
Learning outcomes
The student should be aware of the various aspects related to dependability of AI-aided systems and able to choose and apply state-of-the-art techniques to ensure ethically correct design, construction, deployment and use of AI.
Key topics
  • Trustworthy AI
  • - adaptability & intelligence vs. reliability & algorithmic transparency
  • - multidisciplinary aspects (legal - AI act & regulations and certifications; ethical - societal complexity and diversity; psychological - human oversight; mathematical and technical)
  • - bias & fairness, robustness, explainability, safety, security, accountability etc.; decision making under uncertainty, epistemic vs aleatoric uncertainty
  • Safety
  • - notions of safety (accuracy, PAC - probably approximately correct, correctness w.r.t. specification), specifying requirements (King Midas problem, reward hacking and specification gaming)
  • - training (training safely vs training safe systems): (i) safe reinforcement learning, (ii) adversarial attacks & training, (iii) integrating NN learning and discrete solvers
  • - testing and validation (statistics, predefined vs generated data sets)
  • - verification of (i) AI systems (techniques for NN: SMT, abstract interpretation and bound propagation, abstraction), (ii) AI-controlled systems (AI controller + cyber-physical plant: probabilistic verification, Lyapunov and barrier functions, martingales)
  • - runtime monitoring, runtime enforcement, shielding and sand-boxing
  • - LLMs: temperature and hallucination, LLM and knowledge graphs, LLM as a judge
  • - Agentic AI (agency, sensors, evolution, Gorilla problem)
  • Explainability
  • - Explainability, transparency, interpretability
  • - Explanations: types and techniques - feature attribution (saliency), causal & counterfactual, rule-based (Horn clauses, decision trees), concept-based (bottleneck models), surrogate models, inverse reinforcement learning
Approaches, practices, and methods used in teaching
lectures, excercises, projects, homework, flipped classrooms
Method of verifying learning outcomes and course completion requirements
Final grading is based on homework and a written closed-book final exam (without any reading materials).
Language of instruction
English
Further Comments
Study Materials
The course is taught annually.

  • Enrolment Statistics (recent)
  • Permalink: https://is.muni.cz/course/fi/autumn2025/IA176