AI Safety & Interpretability Lab
In the AI Safety & Interpretability Lab at SDU, we develop interpretability-informed control methods to ensure the safe and beneficial deployment of advanced AI systems. As AI systems grow more capable and autonomous, our ability to understand how and why they behave the way they do is crucial for retaining human control – from single models to agent populations.
Learn moreFrom The Probe
Jun 25, 2026
Auditability Accuracy Tradeoff
The Auditability-Accuracy Tradeoff Monitoring reasoning traces of large language models is currently one of the most promising methods to detect when language models do not behave as intended1 – for...
Contact us via galke@imada.sdu.dk
