Natural Language Processing Advanced Learning Path - MAI4CAREU Master in AI

MAI4CAREU - Natural Language Processing - Introduction to Natural Language Processing

This lecture serves as an entry point into Natural Language Processing (NLP), providing a panoramic view of the field. It introduces the task of automated extraction of meaning from language across various platforms, the application of NLP in commercial and industrial domains, and its role in conversational systems. The lecture also touches on the challenges of language ambiguity and the use of machine learning models to build NLP tools, setting the stage for a deep dive into the course.

MAI4CAREU - Natural Language Processing - Fundamental Text Pre-Processing

This lecture focuses on the initial steps required to prepare text data for deeper analytical processes. It dives into the use of Regular Expressions (RegEx) to identify and manipulate textual data efficiently, covering practical applications such as searching and modifying text strings in various formats. The lecture also introduces more advanced text processing techniques, such as tokenization methods including Byte-Pair Encoding (BPE) and WordPiece, which are crucial for handling languages that do not use spaces to separate words. Additionally, it addresses the complexities of word normalisation covering methods such as case folding, stop word removal, lemmatization, and stemming. These techniques adjust text data to a standard form, enhancing both the efficiency and accuracy of subsequent NLP tasks.

MAI4CAREU - Natural Language Processing – Language Modeling

This lecture focuses on the development and application of language models that compute the probabilities of sequences of words. The lecture dives into various types of language models, such as unigram, bigram, and trigram models, which are fundamental for tasks like machine translation, spell correction, and speech recognition. It covers the mathematical foundations of language modelling, including the computation of joint probabilities and conditional probabilities using the chain rule. The lecture also explores practical challenges like handling sparse data through techniques such as smoothing and discusses the limitations and generalisation of N-gram models, highlighting their application across diverse NLP tasks.

See more materials

Advanced learning materials

Digital Expert

MAI4CAREU - Natural Language Processing - Vector Semantics

This lecture marks the initial part of our exploration into word embeddings in the Natural Language Processing course, exploring the representation of word meanings in multi-dimensional space. It starts by questioning the traditional views of words as mere sequences of characters or indices, instead introducing the concept of lexical semantics, which delves into understanding words, their lemmas, and various senses. The lecture further explains the importance of capturing semantic relationships such as synonymity, antonymy, and hierarchical relationships between words using vector spaces. Techniques like one-hot encoding, Bag of Words (BoW), and more sophisticated methods such as word embeddings that capture nuanced semantic relationships in a dense vector form are covered. This approach allows for a deeper understanding of language that transcends simple word forms, facilitating advanced applications such as machine translation and sentiment analysis.

MAI4CAREU - Natural Language Processing – Word Vector Semantics

The second part of the lecture continues to delve into the practical and theoretical aspects of word embeddings within Natural Language Processing. In this session the focus shifts to the distributional hypothesis, which posits that words which appear in similar contexts possess similar meanings. This part explores various methods of constructing word embeddings, including detailed explanations of techniques like Skip-gram and Continuous Bag of Words (CBOW) from the Word2Vec framework. Additionally, the lecture discusses how embeddings capture semantic and syntactic word relationships, and their application in tasks such as sentiment analysis and machine translation, illustrating the pivotal role of embeddings in modern computational linguistics.

MAI4CAREU - Natural Language Processing - Distributed Contextual Embeddings

In this lecture we move beyond static word representations to explore the dynamic nature of language through distributed contextual embeddings. It delves into how contextual embeddings generated by transformer models like ELMo, GPT, and BERT provide a deeper understanding of word meanings by considering the entire sentence context, which enhances their application in complex NLP tasks. The lecture explains the architecture of these models, such as the transformer mechanism in BERT, which allows for bidirectional understanding, a crucial development over previous unidirectional models. It also covers the practical applications of these models in tasks like text classification, sentiment analysis, and language generation, demonstrating their superiority in handling nuances of language compared to traditional embeddings.

See more materials

Learning path Details

Digital skill level

Digital technology / specialisation