01.07.2025 14:55

3 Questions for

Wojciech Samek

Professor of Machine Learning and Communication at TU Berlin and Head of the Department of Artificial Intelligence and the Explainable AI Group at the Fraunhofer Heinrich Hertz Institute (HHI) Berlin / member of Plattform Lernende Systeme

Explainable AI: Game changer for the safe and responsible use of modern AI systems

Systems designed by humans and built on a modular basis can be highly complex. An Airbus A340-600, for example, consists of over four million individual parts - each of which is independently tested for quality and must fulfil certain requirements. Modern AI systems exceed this complexity many times over: large language models consist of hundreds of thousands of components (neurons) that are interconnected in complex ways. This leads to billions of freely adjustable parameters that are optimised during training on large amounts of data. However, it remains largely unclear exactly what is learnt and what functions are assigned to individual neurons in the model.

In this interview, Wojciech Samek explains why AI systems should be explainable in view of this complexity, how this works and where Germany stands in terms of research. He is Professor of Machine Learning and Communication at TU Berlin, Head of the Department of Artificial Intelligence and the Explainable AI Group at the Fraunhofer Heinrich Hertz Institute (HHI) Berlin and a member of the Technological Enablers and Data Science of Plattform Lernende Systeme.

Mr Samek, do we really need to understand AI to be able to use and trust it?

Wojciech Samek: A common point of view is: No, we also take medicines for which the exact mechanism of action is not yet fully understood. Good evaluation procedures that can be used to test the performance of AI are crucial. But this is where the problem begins. For years, AI models have only been evaluated using performance metrics. However, with the development of methods for explainability, it became apparent that models with good performance do not always ‘understand’ the tasks, but can cheat particularly effectively. For example, horse images are not recognised by the horse itself, but by a copyright watermark frequently found in horse images.

What changes the explainability of AI systems?

Wojciech Samek: In any case, explainability is crucial for recognising errors in AI models at an early stage and ensuring that the model's decision-making processes are comprehensible and meaningful. This applies to both horse image classifiers and hallucinatory language models. With the development of ever more complex models, explainability is becoming increasingly important - both as a tool for human-AI interaction and for the systematic analysis, testing and improvement of models. Large language models in particular provide an ideal basis for analysing the role of individual components and actively controlling the model. The methods are thus developing further: from pure explanation to targeted intervention options - a decisive step for the safe and responsible use of modern AI systems.

However, explainability offers even more: with the help of explainable models, for example, a completely new structural class of antibiotics has been discovered. Explainability is also becoming increasingly important for legal reasons, for example due to new regulations such as the European Union's AI Act, which requires transparency in certain AI applications. Germany is very well positioned in the area of explainability. Not only have many fundamental technologies been developed here, but some of the leading researchers are also based here. This knowledge and the locational advantage should be utilised to create trustworthy and verifiable AI.

How has research into the explainability of AI developed?

Wojciech Samek: The development of the explainability of AI models (XAI) can be roughly divided into three waves, each with a different focus and objective.

In the initial phase, the focus was on making individual modelling decisions comprehensible. The aim was to visualise the extent to which different input dimensions - such as individual pixels of an image - contributed to the prediction of a model. A central method in this phase is the Layer-wise Relevance Propagation (LRP) method. It is based on the idea of distributing the prediction backwards through the network. Neurons that have contributed more to the decision receive a proportionally higher share of the overall relevance. The relevance values assigned in each pixel of the input image show which areas of the image were decisive for the AI's decision.

The second wave of explainability research aimed to better understand the AI model itself. For example, the Activation Maximisation method can be used to show which features are encoded by individual neurons. The Concept Relevance Propagation (CRP) method extends this type of explanation and makes it possible to analyse the role and function of individual neurons in model decisions. These methods of the second XAI wave form the basis of the emerging mechanistic interpretability, which analyses functional subnetworks (‘circuits’) in the model.

The third wave aims to use the latest methods of XAI research to gain a systematic understanding of the model, its behaviour and its representations. Methods such as SemanticLens attempt to understand the function and quality of each individual component (neuron) in the model. This holistic understanding allows systematic, automatable model checks - for example, whether a skin cancer model really follows the medical ABCDE rule.

Go back