The Synthetic Researcher - Simulation of Dataset Search – Master Digital Sciences

Project Image

Problem Description

Conversational agents are becoming an inherent actor to access information online. However, their evaluation remains a challenge, as releasing a bad conversational agent to the public can hurt users trust and developers’ reputation. A solution for offline evaluation is user simulation, that is, synthetic users interact with the conversational agent to test it and identify potential flaws before public release. The objective of this project is to develop user simulators that are good enough to produce realistic interactions with conversational agents in the domain of dataset search.

Project Definition

The project is aligned with the U-Sim track @ TREC 2026, where participants are tasked to develop user simulators to interact with conversational systems to retrieve datasets in the scholarly domain. A critical question is to assess if the user simulators developed are good enough to substitute real users and support training and evaluation of conversational agents. The project has two main objectives:

Conceptualizing and developing a user simulator
Performing a qualitative analysis of the user simulator to assess its value for the evaluation or training of conversational agents

The timeline for the project is as follows:

Kick-off workshop (mandatory): 1st or 2nd week of March
Implementation phase: March-June
Evaluation phase: July

Upon interest, the Cologne Information Retrieval group is happy to support student groups in submitting their user simulator along with a lab report to the U-Sim track in September 2026. Link and Resources:

U-Sim track @ TREC: https://trec.usersim.ai/
Sim4IA - Simulations for Information Access: http://sim4ia.org/
Cologne Information Retrieval: https://ir.web.th-koeln.de/
Related literature: https://arxiv.org/pdf/2405.14249, https://arxiv.org/pdf/2406.19007

Learning Outcome

Practical implementation of a user simulator using cutting edge and timely technologies (including large language models and conversational agents)
Project management and software development skills from concept to implementation
Experiments in an active field of research with the possibility to participate in an international benchmarking campaign

Participation Requirements

Strong coding skills, preferably with Python
Interest in natural language processing and information retrieval
Willingness to familiarize yourself with user simulation practices

Master Digital Sciences

Dokumente zur Akkreditierung des Studiengangs

Guided Project SS26_13 »The Synthetic Researcher - Simulation of Dataset Search«

Informationen zur Organisation des Moduls

Problem Description

Project Definition

Learning Outcome

Participation Requirements

External Partner