TH Köln

Master Digital Sciences

Documents for Study Program Accreditation

Guided Project SS26_13 »The Synthetic Researcher - Simulation of Dataset Search«

Organizational Details

Supervisor(s)
Nolwenn Bernard, Timo Breuer, Philipp Schaer
Team size
3-5
Language
English
Start
March 2026
Offered as
GP-GAK (12 ECTS)

Project Image

Problem Description

Conversational agents are becoming an inherent actor to access information online. However, their evaluation remains a challenge, as releasing a bad conversational agent to the public can hurt users trust and developers’ reputation. A solution for offline evaluation is user simulation, that is, synthetic users interact with the conversational agent to test it and identify potential flaws before public release. The objective of this project is to develop user simulators that are good enough to produce realistic interactions with conversational agents in the domain of dataset search.

Project Definition

The project is aligned with the U-Sim track @ TREC 2026, where participants are tasked to develop user simulators to interact with conversational systems to retrieve datasets in the scholarly domain. A critical question is to assess if the user simulators developed are good enough to substitute real users and support training and evaluation of conversational agents. The project has two main objectives:

  1. Conceptualizing and developing a user simulator
  2. Performing a qualitative analysis of the user simulator to assess its value for the evaluation or training of conversational agents

The timeline for the project is as follows:

  • Kick-off workshop (mandatory): 1st or 2nd week of March
  • Implementation phase: March-June
  • Evaluation phase: July

Upon interest, the Cologne Information Retrieval group is happy to support student groups in submitting their user simulator along with a lab report to the U-Sim track in September 2026. Link and Resources:

  • U-Sim track @ TREC: https://trec.usersim.ai/
  • Sim4IA - Simulations for Information Access: http://sim4ia.org/
  • Cologne Information Retrieval: https://ir.web.th-koeln.de/
  • Related literature: https://arxiv.org/pdf/2405.14249, https://arxiv.org/pdf/2406.19007

Learning Outcome

  • Practical implementation of a user simulator using cutting edge and timely technologies (including large language models and conversational agents)
  • Project management and software development skills from concept to implementation
  • Experiments in an active field of research with the possibility to participate in an international benchmarking campaign

Participation Requirements

  • Strong coding skills, preferably with Python
  • Interest in natural language processing and information retrieval
  • Willingness to familiarize yourself with user simulation practices

External Partner

-