TH Köln

Master Digital Sciences

Documents for Study Program Accreditation

Module »Natural Language Processing« (NLP)

Organizational Details

Responsible for the module
Prof. Dr. Philipp Schaer (Faculty F03)
Lecturer(s)
Prof. Dr. Philipp Schaer (Faculty F03), Prof. Dr. Klaus Lepsky (Faculty F03)
Language
English
Offered in
Summer Semester (Duration 1 Semester)
Location
Remote
Number of participants
minimum 1, maximum 20
Precondition
Not allowed for students coming from the DIS, B.Sc.
Recommendation
Basic knowledge of Python.
ECTS
3
Effort
Total effort 90h
Total contact time
30h (20h lecture / 10h exercise)
Time for self-learning
60h
Exam
Expert talk or written exam
Competences taught by the module
Analyze Domains, Model Systems, Implement Concepts
General criteria covered by the module
Interdisciplinarity, Digitization

Mapping to Focus Areas

Below, you find the module's mapping to the study program's focus areas. This is done as a contribution to all relevant focus areas (in ECTS, and content-wise). This is also relevant for setting the module in relation to other modules, and tells to what extent the module might be part of other study programs.

Focus Area ECTS (prop.) Module Contribution to Focus Area
Generating and Accessing Knowledge 2

Natural Language Processing (NLP) deals with techniques that enable computers to understand the meaning of text, which is written in a natural language.

Acting Responsibly 1

The module also deals with the impacts of this technique.

Learning Outcome

Natural Language Processing (NLP) deals with techniques that enable computers to understand the meaning of text, which is written in a natural language. Thus NLP constitutes an essential part for modern text-based challenges. As a science NLP can be considered as the field, where Computer Science, Artificial Intelligence, Machine Learning and Linguistics overlap.

In this course the students will learn about basis techniques and theories of NLP. However, the lecture does not only provide the theory but also the implementation of relevant and state-of-the-art NLP procedures. Topics of this course are well-established approaches like tool-based language processing, dictionary, or lexical approaches.

By applying state-of-the-art techniques on real-world data sets students learn to extract knowledge from natural language corpora. These allow them to analyze, discover and evaluate phenomena hidden in texts.

NLP enables applications like intelligent search engines, dialog systems, question-answering systems, machine translation, document classification, sentiment analysis or opinion mining. However, the lecture does not only provide the theory but also the implementation of the relevant NLP procedures. This allows them to conduct own applied research on given or self-crawled data from a variaty of data sources, like commercial or research-related scenarios.

Module Content

  1. Introduction, show case indexing: thesauri vs. statistical approaches - show, compare and discuss
  2. Access text and preprocess (assessing text files, web sites, corpora, segmentation into words and sentences, regex)
  3. Morphology (normalisation, stemming, lemmatisation, POS-Tagging …)
  4. Lexical processing (WordNet, DBPedia)
  5. Basic language modelling
  6. Information extraction
  7. Sentiment analysis

Forms of Teaching and Learning

The course follows a hybrid format, where lecture videos are provided online and classroom time is used for discussion, exercises, and working on assignments.

  • This course involves self-study (which can be completed online): You’re expected to watch the lecture videos, read the corresponding book chapters/sections listed on the last slide of each lecture deck, as well as complete the exercises on GitHub.
  • There is also a classroom component which is not obligatory, but highly recommended for an optimal learning experience. This involves discussion and exercises in a regular or virtual classroom setting.

Learning Material Provided by Lecturer

  • slides and recorded lectures
  • excersises
  • access to standard NLP text corpora

Literature

  • Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition (2009) by Dan Jurafsky
  • Foundations of statistical natural language processing (18 June 1999) by Christopher D. Manning, Hinrich Schuetze
  • Natural Language Processing with Python (2009) by Steven Bird, Ewan Klein, Edward Loper
  • Neural Network Methods in Natural Language Processing (Morgan and Claypool Publishers, 2017) by Yoav Goldberg
  • Natural Language Processing with PyTorch (O’ Reilly 2019) by D. Rao, B. MacMahan
  • Natural Language Processing in Action (Manning 2019) by H. Lane, H. Hapke, C. Howard