FI:IA176 Safe and Explainable AI

IA176 Safe and Explainable AI

Faculty of Informatics
Autumn 2025

Extent and Intensity

2/1/1. 4 credit(s) (plus extra credits for completion). Type of Completion: zk (examination).
In-person direct teaching

Teacher(s)

prof. Dr. rer. nat. RNDr. Mgr. Bc. Jan Křetínský, Ph.D. (lecturer)
Sabine Rieder, M.Sc. (seminar tutor)

Guaranteed by

prof. Dr. rer. nat. RNDr. Mgr. Bc. Jan Křetínský, Ph.D.
Department of Computer Science – Faculty of Informatics
Supplier department: Department of Computer Science – Faculty of Informatics

Timetable

Mon 15. 9. to Mon 15. 12. Mon 12:00–13:50 B204

Timetable of Seminar Groups:

IA176/01: Fri 19. 9. to Fri 19. 12. Fri 10:00–11:50 C416, S. Rieder

Course Enrolment Limitations

The course is also offered to the students of the fields other than those the course is directly associated with.

fields of study / plans the course is directly associated with

there are 39 fields of study the course is directly associated with, display

Abstract

We discuss various aspects of dependable and trustworthy use of AI. We focus on different ways of enhancing its safety and explainability.

Learning outcomes

The student should be aware of the various aspects related to dependability of AI-aided systems and able to choose and apply state-of-the-art techniques to ensure ethically correct design, construction, deployment and use of AI.

Key topics

Trustworthy AI
- adaptability & intelligence vs. reliability & algorithmic transparency
- multidisciplinary aspects (legal - AI act & regulations and certifications; ethical - societal complexity and diversity; psychological - human oversight; mathematical and technical)
- bias & fairness, robustness, explainability, safety, security, accountability etc.; decision making under uncertainty, epistemic vs aleatoric uncertainty
Safety
- notions of safety (accuracy, PAC - probably approximately correct, correctness w.r.t. specification), specifying requirements (King Midas problem, reward hacking and specification gaming)
- training (training safely vs training safe systems): (i) safe reinforcement learning, (ii) adversarial attacks & training, (iii) integrating NN learning and discrete solvers
- testing and validation (statistics, predefined vs generated data sets)
- verification of (i) AI systems (techniques for NN: SMT, abstract interpretation and bound propagation, abstraction), (ii) AI-controlled systems (AI controller + cyber-physical plant: probabilistic verification, Lyapunov and barrier functions, martingales)
- runtime monitoring, runtime enforcement, shielding and sand-boxing
- LLMs: temperature and hallucination, LLM and knowledge graphs, LLM as a judge
- Agentic AI (agency, sensors, evolution, Gorilla problem)
Explainability
- Explainability, transparency, interpretability
- Explanations: types and techniques - feature attribution (saliency), causal & counterfactual, rule-based (Horn clauses, decision trees), concept-based (bottleneck models), surrogate models, inverse reinforcement learning

Approaches, practices, and methods used in teaching

lectures, excercises, projects, homework, flipped classrooms

Method of verifying learning outcomes and course completion requirements

Final grading is based on homework and a written closed-book final exam (without any reading materials).

FI:IA176 Safe and Explainable AI - Course Information

IA176 Safe and Explainable AI

Other applications