TH Köln

Master Digital Sciences

Documents for Study Program Accreditation

Guided Project SS25_04 »AI based Audit of Sustainability Reports«

Organizational Details

Supervisor(s)
Prof. Dr. Daniel Gaida; Prof. Dr. Lilia Pasch
Team size
2-4
Language
English
Start
End March
Offered as
GP-ID (6 ECTS) , GP-GAK (12 ECTS)

Project Image

Problem Description

From 2025, large listed companies will have to publish very detailed sustainability reports covering environmental, social and governance issues. These reports are required by the new regulations at EU level, which consist of the following parts

  • Corporate Sustainability Reporting Directive (CSRD) (general requirements)
  • European Sustainability Reporting Standards (ESRS) (data points) 2 general standards (ESRS 1 and ESRS 2), 5 standards dealing with the environment (ESRS E1 to E5) 4 standards dealing with social topics (ESRS S1 to S4) and 1 standard on the topic of governance (ESRS G1)

The (qualitative and quantitative) data points required by the ESRS are available at: https://efrag.sharefile.com/share/view/s1a12c193b86d406e90b1bcd7b6bb8f6f/fo37c90b-9d9b-4432-a76b-27760cfcc01b.

The first sustainability reports will be published by the companies concerned in March/April 2025 (the list of companies is available at: https://www.xetra.com/resource/blob/67858/a74c988df5a50c597f9e88c7d5c355c6/data/Listed-companies.xlsx). In the meantime, it is possible to work with reports published last year.

The sustainability reports must be audited by an auditor. These reports will be very long and complex (100-150 pages). The use of AI to support the auditor in auditing the sustainability reports would significantly simplify the audit process and reduce costs.

Project Definition

The aim of the project is to use a Large Language Model (LLM) to check whether the published reports contain all the data points required by one of the ESRS mentioned above (the students can choose which of the ten ESRS they want to use for the project). As a result of the review of a sustainability report by the LLM, a protocol should be created in which one column contains the data points from one of the ESRS (see link above) and another column contains the text passages from the sustainability report that were assigned to this data point (if no suitable text passages were found, the field would be empty). A third column would contain the assessment of the AI, indicating the degree of fulfilment of the requirement (low, medium, high). Such a protocol could serve as a basis for the auditor to decide whether further audit procedures are required in relation to certain data points. You can start with ChatGPT, DeepSeek, etc. and write prompts that yield good results. In the end, you will need to provide a small application that uses an API to an LLM, where an Excel sheet with the ESRS data points and the report to be examined can be uploaded. This application will then generate the protocol as described above. For 12 ECTS you should additionally analyse different LLMs (local and remote) and develop/test different architectures (retrieval augmented generation and variations thereof). The application should also have a chat interface to be able to chat with the LLM about the protocol after it has been generated.

Learning Outcome

Usage of LLMs for the systematic analysis of large reports. Prompting of LLMs and possibly advanced methods such as Retrieval Augmented Generation.

Participation Requirements

Python, basic knowledge about prompting LLMs

External Partner

Prof Dr Lilia Pasch, Professor of External Accounting at the Schmalenbach Institute of Economics at Cologne University of Applied Sciences. She is available to the project team for all questions relating to the content of the ESRS data points.