AI Safety & Interpretability Lab AI Safety & Interpretability Lab | AI Safety & Interpretability Lab

AI Safety & Interpretability Lab

In the AI Safety & Interpretability Lab at SDU, we develop interpretability-informed control methods to ensure the safe and beneficial deployment of advanced AI systems. As AI systems grow more capable and autonomous, our ability to understand how and why they behave the way they do is crucial for retaining human control – from single models to agent populations.

Learn more

From The Probe

Jun 25, 2026

Auditability Accuracy Tradeoff

The Auditability-Accuracy Tradeoff Monitoring reasoning traces of large language models is currently one of the most promising methods to detect when language models do not behave as intended1 – for...

Contact us via galke@imada.sdu.dk