Competencies and sovereign infrastructure: How successful AI is created with Data Science

Artificial Intelligence (AI) is the next stage of digitisation and promises great benefits for the economy and society. The basis for successful AI applications is data. In order to exploit the potential of Artificial Intelligence, comprehensive data management is needed to make the available data accessible in the first place. A current white paper of Plattform Lernende Systeme shows the importance of the interdisciplinary research branch Data Science as a key discipline for science as well as economy and names options for action. The experts suggest that knowledge of data management should also be taught in courses of study that are not related to IT and in schools and recommend an independent data infrastructure.

Download the white paper (executive summary)

Whether medical assistance systems, autonomous vehicles or the predictive maintenance of industrial plants - they all evaluate data in a short time, which is available in incredible quantities today. Data science methods make all these AI applications possible. They are also regarded as pioneers for scientific findings in many data-intensive fields of research such as climate research, astronomy or chemistry.

"If data is the raw material of the digital age, then data science is the tool that can be used to unearth this treasure," says co-author Daniel Keim, Professor of Data Analysis and Visualisation at the University of Konstanz and member of the Technological Enablers and Data Science working group of Plattform Lernende Systeme. "Large amounts of data alone are not enough to develop AI applications. First of all, the data must be prepared for the training of learning algorithms, their validity must be checked and the data must be made accessible. To do this, we need comprehensive data management, including advanced possibilities for visualising the data. Unfortunately, in the discussion about Artificial Intelligence, this is often given too little attention".

In particular, the white paper states that making the data accessible and ensuring data quality is often a complex process. For AI projects, for example, the cost of data acquisition and preparation is estimated at up to 80 percent. The results and recommendations of AI systems can only be as good as the data they are based on. Incorrect values, for example, must be identified and removed, data must be annotated and - in order to make them comprehensible - described by metadata. In addition to aspects such as completeness or consistency, attention should also be paid to possible distortions already at the data selection stage in order to avoid discrimination, for example. A software for the evaluation of job candidates should not be trained only with the data of successful applicants in the past. If these were predominantly male, the AI system will continue to place female candidates in a worse light in its recommendations.

Expertise and Data Literacy

Data scientists therefore not only need skills in data management, Machine Learning, statistics and visualisation, but also knowledge of ethics and law in order to handle data responsibly. "This already shows that it is no longer just a question of classic software development. Rather, these are requirements that require an interdisciplinary approach: Application experts increasingly need so-called data literacy expertise, and data science experts must also understand the application domains. There will certainly be a great need for continuing education programmes here," says co-author Kai-Uwe Sattler, Professor of Databases and Information Systems at the TU Ilmenau and member of the Technological Enablers and Data Science working group of Plattform Lernende Systeme.

According to the white paper, the profession of data scientist is considered one of the most important professions. In addition, data science skills are already required in many other professions today. The authors recommend giving more space to data science in computer science courses. However, to promote understanding of data science processes and technologies, other degree programmes and schools should also teach data literacy skills. Beyond research, this is the basis for training talents who can ultimately bring AI knowledge into companies and support the transfer into application, the authors say. The AI map  of Plattform Lernende Systeme shows at which universities interested in AI and data science courses can find courses of study.

In addition to training and further education, the authors believe that an independent infrastructure with sufficient storage and computing capacity is necessary to tap the potential of the data for the economy and society. One should not be tempted to unreflectively adopt seemingly successful models of the Internet corporations. In order to be able to independently design data management processes, an expansion of European data rooms and infrastructures is necessary, the authors advise.

About the White Paper

The white paper "From data to AI. Intelligent Data Management as a Basis for Data Science and the Use of Learning Systems" was prepared by experts from the Technological Enablers and Data Science working group of Plattform Lernende Systeme. It is available for free download here.

Further information:

Linda Treugut / Birgit Obermeier
Press and Public Relations

Lernende Systeme – Germany's Platform for Artificial Intelligence
Managing Office | c/o acatech
Karolinenplatz 4 | 80333 Munich

T.: +49 89/52 03 09-54 /-51
M.: +49 172/144 58-47 /-39
presse@plattform-lernende-systeme.de

Go back