Skip to main content
Search by keyword

Natural Language Processing Advanced Learning Path - MAI4CAREU Master in AI

Natural Language Processing Advanced Learning Path - MAI4CAREU Master in AI

Natural language processing (NLP) seeks to provide computers with the ability to intelligently process human language, extracting meaning, information, and structure from text, speech, web pages, and social networks. This curriculum presents a structured learning path that begins with the fundamental elements of NLP systems, advances through the evolving techniques for text representation, introduces the latest deep learning advancements in NLP, and explores their applications in addressing current and relevant issues. It is based on the elective course offered at the Master in Artificial Intelligence of the University of Cyprus, which was developed with co-funding from the MAI4CAREU European project. The course is organised in four parts, which are further categorised into Introductory (four units) and Advanced (seven units) according to their difficulty. The recommended order for studying all materials is the one shown below (from 1 to 11).

Part I: Introduction

  1. Introduction to Natural Language Processing
  2. Fundamental Text Pre-Processing

Part II: Language Modeling and Classification

  1. Language Modelling
  2. Text Classification

Part III: Vector Semantics and Word Embeddings

  1. Vector Semantics
  2. Word Vector Semantics
  3. Distributed Contextual Embeddings

Part IV: NLP Applications and Advancements

  1. Use Hybrid Models to Detection Online Hate-speech
  2. Linguistic Features to Identify Fake News
  3. Modelling Polarisation in News Media using NLP
  4. Understanding Large Language Models 
Introductory learning materials

MAI4CAREU - Natural Language Processing - Introduction to Natural Language Processing

This lecture serves as an entry point into Natural Language Processing (NLP), providing a panoramic view of the field. It introduces the task of automated extraction of meaning from language across various platforms, the application of NLP in commercial and industrial domains, and its role in conversational systems. The lecture also touches on the challenges of language ambiguity and the use of machine learning models to build NLP tools, setting the stage for a deep dive into the course. 

MAI4CAREU - Natural Language Processing - Fundamental Text Pre-Processing

This lecture focuses on the initial steps required to prepare text data for deeper analytical processes. It dives into the use of Regular Expressions (RegEx) to identify and manipulate textual data efficiently, covering practical applications such as searching and modifying text strings in various formats. The lecture also introduces more advanced text processing techniques, such as tokenization methods including Byte-Pair Encoding (BPE) and WordPiece, which are crucial for handling languages that do not use spaces to separate words. Additionally, it addresses the complexities of word normalisation covering methods such as case folding, stop word removal, lemmatization, and stemming. These techniques adjust text data to a standard form, enhancing both the efficiency and accuracy of subsequent NLP tasks. 

MAI4CAREU - Natural Language Processing – Language Modeling

This lecture focuses on the development and application of language models that compute the probabilities of sequences of words. The lecture dives into various types of language models, such as unigram, bigram, and trigram models, which are fundamental for tasks like machine translation, spell correction, and speech recognition. It covers the mathematical foundations of language modelling, including the computation of joint probabilities and conditional probabilities using the chain rule. The lecture also explores practical challenges like handling sparse data through techniques such as smoothing and discusses the limitations and generalisation of N-gram models, highlighting their application across diverse NLP tasks. 

Advanced learning materials

MAI4CAREU - Natural Language Processing - Vector Semantics

This lecture marks the initial part of our exploration into word embeddings in the Natural Language Processing course, exploring the representation of word meanings in multi-dimensional space. It starts by questioning the traditional views of words as mere sequences of characters or indices, instead introducing the concept of lexical semantics, which delves into understanding words, their lemmas, and various senses. The lecture further explains the importance of capturing semantic relationships such as synonymity, antonymy, and hierarchical relationships between words using vector spaces. Techniques like one-hot encoding, Bag of Words (BoW), and more sophisticated methods such as word embeddings that capture nuanced semantic relationships in a dense vector form are covered. This approach allows for a deeper understanding of language that transcends simple word forms, facilitating advanced applications such as machine translation and sentiment analysis. 

MAI4CAREU - Natural Language Processing – Word Vector Semantics

The second part of the lecture continues to delve into the practical and theoretical aspects of word embeddings within Natural Language Processing. In this session the focus shifts to the distributional hypothesis, which posits that words which appear in similar contexts possess similar meanings. This part explores various methods of constructing word embeddings, including detailed explanations of techniques like Skip-gram and Continuous Bag of Words (CBOW) from the Word2Vec framework. Additionally, the lecture discusses how embeddings capture semantic and syntactic word relationships, and their application in tasks such as sentiment analysis and machine translation, illustrating the pivotal role of embeddings in modern computational linguistics. 

MAI4CAREU - Natural Language Processing - Distributed Contextual Embeddings

In this lecture we move beyond static word representations to explore the dynamic nature of language through distributed contextual embeddings. It delves into how contextual embeddings generated by transformer models like ELMo, GPT, and BERT provide a deeper understanding of word meanings by considering the entire sentence context, which enhances their application in complex NLP tasks. The lecture explains the architecture of these models, such as the transformer mechanism in BERT, which allows for bidirectional understanding, a crucial development over previous unidirectional models. It also covers the practical applications of these models in tasks like text classification, sentiment analysis, and language generation, demonstrating their superiority in handling nuances of language compared to traditional embeddings. 

Learning path Details

Digital skill level
Digital technology / specialisation