Skip to main content
Search by keyword

Explainable Artificial Intelligence in Medicine (xAIM) Learning Path - Text Mining

Explainable Artificial Intelligence in Medicine (xAIM) Learning Path - Text Mining

The xAIM project provides a comprehensive learning path designed to equip individuals with the knowledge and skills needed to leverage Explainable Artificial Intelligence (xAI) in healthcare. In collaboration with Goethe University (Germany), Keele University (UK), Leibniz University Hannover (Germany), and the University of Ljubljana (Slovenia), the learning path offers a selected course from the xAIM Master’s program to introduce students to Explainable AI.
The xAIM Master’s program covers core principles across three main areas: healthcare management, artificial intelligence, and ethical and legal considerations. Key topics include the role and applications of AI techniques in the healthcare sector, opportunities and challenges of data-driven approaches in medical environments, methods for analyzing and interpreting complex healthcare datasets and communicating insights to stakeholders, as well as addressing ethical and social implications of AI and new technologies. Additionally, the students develop advanced programming skills, including deep learning, text mining, and computer vision.

Text mining seeks to extract insights, patterns, and knowledge from large sets of textual data, transforming unstructured text into structured information for analysis and decision-making. This curriculum presents a structured learning path that begins with the various techniques for text pre-processing and visualisation, introduces document vectorisation, applies natural language processing approaches to document clustering and classification, and explains topic modelling. It is based on the elective course offered at the xAIM Master of the University of Pavia, developed with co-funding from the xAIM European project.

The course is organised into nine topics, categorised into Introductory (five units) and Advanced (four units). The introductory material covers core concepts of text mining, while advanced units offer further insight into topic modelling, sentiment analysis, keyword extraction, and co-occurrence networks.

Introductory materials

  1. Introduction to Text Mining
  2. Document Vectorisation
  3. Document Classification
  4. Document Clustering
  5. Topic Modelling

Advanced materials

  1. Explaining LDA
  2. Sentiment Analysis
  3. Semantic Search
  4. Document Networks
     
Introductory learning materials

xAIM - Text Mining course: Introduction to Text Mining

Let's dive into text mining with this first introductive unit. Introduction to Text Mining details the initial steps for preparing text data for analysis. It covers essential concepts like tokenisation, lemmatisation, stemming, and filtering. The process includes filtering on punctuation, stopwords, or applying custom filters. Techniques such as n-grams and POS tagging are also discussed. These steps are crucial for converting raw text into a format suitable for downstream analysis using tools like Orange.

xAIM - Text Mining: Document vectorisation

We make a step further diving into Document Vectorisation, which discusses techniques for converting text documents into numerical vectors suitable for machine learning tasks. It covers two main techniques: Bag of Words (BOW) with TF-IDF transform and Document Embedding. BOW involves counting word occurrences, while Document Embedding uses pre-trained models to create vectors capturing semantic relationships between words. The chapter highlights the advantages and limitations of both methods, including their application contexts and pre-processing requirements. The focus is on transforming text data into formats that algorithms can process effectively.

xAIM - Text Mining: Document Classification

Document Classification explores techniques for categorising text documents into predefined classes or categories using machine learning algorithms. It discusses logistic regression, a simple machine learning classifier, and its application in automated document sorting and classification tasks. It covers converting text into numerical representations, training predictive models, evaluating performance using metrics like classification accuracy and AUC, and making predictions on new data. The chapter includes practical examples, such as classifying Grimm's tales based on their content and predicting tale types for Andersen's stories. The emphasis is on explaining the results of a logistic regression classifier and predicting on new data.

Advanced learning materials

xAIM - Text Mining: Explaining Latent Dirichlet Allocation (LDA)

We start the more advanced part of this part by discovering more about Latent Dirichlet Allocation (LDA). This unit explains what are the assumptions of the LDA method. It provides a detailed step-by-step explanation of the algorithm (based on Gibbs sampling) with an example from medicine. It provides a comparison between Gibbs sampling and variational inference. Finally, it gives an overview of tools that include LDA with additional options that span beyond the described method.

xAIM – Text Mining: Sentiment Analysis

Sentiment Analysis explores techniques for automatically determining the sentiment expressed in text data. It overviews methods such as lexicon-based and machine-learning-based approaches, with the emphasis on the former. The focus is on analysing the emotional valence in text, visualising the results, and extracting relevant documents. Sentiment analysis (or opinion mining) is a task of extracting sentiment from text data. Sentiment comprises the opinion holder (i.e. reviewer) + (time of event) + sentiment target (product, movie, service...) + sentiment (positive, negative, neutral). Furthermore, we are interested in polarity (+/-/0), intensity (high, medium, low), and/or specific emotion (fear, anger, joy, surprise, disgust).

This unit dives into the three approaches to sentiment extraction: lexicon-based, machine learning, hybrid.

 

xAIM – Text Mining: Semantic Analysis

We pass now to Semantic Analysis, which discusses techniques to uncover meanings in unstructured text and organise documents based on conceptual similarity. It covers constructing annotated document maps using tools like t-SNE and Gaussian mixture models. An example with patient notes from PubMed Central demonstrates creating a 2D document map to identify clusters related to different medical conditions.

Learning path Details

Digital skill level
Digital technology / specialisation