TH Köln

Master Digital Sciences

Dokumente zur Akkreditierung des Studiengangs

Modul »Web Information Retrieval« (WIR)

Informationen zur Organisation des Moduls

Prof. Dr. Philipp Schaer (Fakultät F03)
Angeboten im
Wintersemester (Dauer 1 Semester)
Anzahl Teilnehmer*innen
minimal 5, maximal 20
Basic knowledge in IR, NLP or Text Mining; a minimum of Python and a bit of statistics
Gesamtaufwand 180h
60h (30h Vorlesung / 15h Übung / 15h Projektbetreuung)
120h (davon 120h eigenständige Projektarbeit)
Klausur in Verbindung mit semesterbegleitenden Ausarbeitungen (2 Teilprüfungen)
Vermittelte Kompetenzen
Model Systems, Implement Concepts, Optimize Systems
Beziehung zu globalen Studiengangskriterien

Beitrag zu Handlungsfeldern

Nachfolgend ist die Zuordnung des Moduls zu den Handlungsfeldern des Studiengangs aufgeführt, und zwar als anteiliger Beitrag (als ECTS und inhaltlich). Dies gibt auch Auskunft über die Verwendbarkeit des Moduls in anderen Studiengängen und über die Beziehung zu anderen Modulen im selben Studiengang.

Handlungsfeld ECTS (anteilig) Modulbeitrag zum Handlungsfeld
Generating and Accessing Knowledge 5

Students learn about the usecase and dimension of web search with a special focus on scientific and academic usecases.

Architecting and Coding Software 1

The module requires some expertise in coding.

Learning Outcome

Students learn about the usecase and dimension of web search with a special focus on scientific and academic use cases. After a brief introduction (or a recap) on Information Retrieval and search engine technologies this courses dives into current state-of-the-art

To understand the issues related to academic search like domain-specific languages, expertise and entities, they implement their own search environment to foster their knowledge and hands-on experiences with the latest Information Retrieval approaches in the field. At the end of the course they know about the issues and solutions that are implemented in big academic search systems like GoogleScholar, PubMed, or arXiv and how the knowledge can be transfered to other domains like enterprise search or expertise retrieval. In this process they will analyze and evaluate these search systems to discover and explain differences.

With the knowledge aquired in this course students are able to apply existing search solutions to commercial or research-related search problems and in a later stage design their own search systems and use state-of-the-art IR methods to expand their knowledge on data sets like web corpora, user logs, or large-scale academic data sets.

Inhaltliche Beschreibung des Moduls

  1. Information Retrieval in a nutshell
  2. Search engine architectures
  3. Indexing and query processing
  4. Retrieval evaluation
  5. Retrieval models
  6. Text classficiation and clustering
  7. Academic Search
  8. Quantifying (scientific) information
  9. Citation analysis
  10. Semantic Search
  11. Entity Linking

Lehr- und Lernformen

The course follows a hybrid format, where lecture videos are provided online and classroom time is used for discussion, exercises, and working on assignments.

  • This course involves self-study (which can be completed online): You’re expected to watch the lecture videos, read the corresponding book chapters/sections listed on the last slide of each lecture deck, as well as complete the exercises on GitHub.
  • There is also a classroom component which is not obligatory, but highly recommended for an optimal learning experience. This involves discussion and exercises in a regular or virtual classroom setting.

Zur Verfügung gestelltes Lehrmaterial

  • slides and recorded lectures
  • excersises

Weiterführende Literatur

  • ChengXiang Zhai and Sean Massung (2016), “Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining”, Association for Computing Machinery and Morgan & Claypool.
  • Krisztian Balog, Yi Fang, Maarten de Rijke, Pavel Serdyukov and Luo Si (2012), “Expertise Retrieval”, Foundations and Trends® in Information Retrieval: Vol. 6: No. 2–3, pp 127-256.
  • Krisztian Balog (2017): Entity Retrieval. Springer. httar://
  • Mark Sanderson (2010), “Test Collection Based Evaluation of Information Retrieval Systems”, Foundations and Trends® in Information Retrieval: Vol. 4: No. 4, pp 247-375.
  • Peter Ingwersen (2012), “Scientometric Indicators and Webometrics - and the Polyrepresentation Principle in Information Retrieval”, Bangalore: Ess Ess Publications, New Delhi, India.