From 2025, large listed companies will have to publish very detailed sustainability reports covering environmental, social and governance issues. These reports are required by the new regulations at EU level, which consist of the following parts
The (qualitative and quantitative) data points required by the ESRS are available at: https://efrag.sharefile.com/share/view/s1a12c193b86d406e90b1bcd7b6bb8f6f/fo37c90b-9d9b-4432-a76b-27760cfcc01b.
The first sustainability reports will be published by the companies concerned in March/April 2025 (the list of companies is available at: https://www.xetra.com/resource/blob/67858/a74c988df5a50c597f9e88c7d5c355c6/data/Listed-companies.xlsx). In the meantime, it is possible to work with reports published last year.
The sustainability reports must be audited by an auditor. These reports will be very long and complex (100-150 pages). The use of AI to support the auditor in auditing the sustainability reports would significantly simplify the audit process and reduce costs.
The aim of the project is to use a Large Language Model (LLM) to check whether the published reports contain all the data points required by one of the ESRS mentioned above (the students can choose which of the ten ESRS they want to use for the project). As a result of the review of a sustainability report by the LLM, a protocol should be created in which one column contains the data points from one of the ESRS (see link above) and another column contains the text passages from the sustainability report that were assigned to this data point (if no suitable text passages were found, the field would be empty). A third column would contain the assessment of the AI, indicating the degree of fulfilment of the requirement (low, medium, high). Such a protocol could serve as a basis for the auditor to decide whether further audit procedures are required in relation to certain data points. You can start with ChatGPT, DeepSeek, etc. and write prompts that yield good results. In the end, you will need to provide a small application that uses an API to an LLM, where an Excel sheet with the ESRS data points and the report to be examined can be uploaded. This application will then generate the protocol as described above. For 12 ECTS you should additionally analyse different LLMs (local and remote) and develop/test different architectures (retrieval augmented generation and variations thereof). The application should also have a chat interface to be able to chat with the LLM about the protocol after it has been generated.
Usage of LLMs for the systematic analysis of large reports. Prompting of LLMs and possibly advanced methods such as Retrieval Augmented Generation.
Python, basic knowledge about prompting LLMs
Prof Dr Lilia Pasch, Professor of External Accounting at the Schmalenbach Institute of Economics at Cologne University of Applied Sciences. She is available to the project team for all questions relating to the content of the ESRS data points.