BioNT - Applied Machine Learning for Biological Data

This intensive workshop focuses on applying machine learning techniques to biological and genomic data, combining theoretical foundations with hands-on coding experience. Participants will work through real-world scenarios using Python-based tools and frameworks that are critical for modern bioinformatics.
Structure of the workshop
Module 1 (optional)
This module provides a solid foundation in scientific computing with Python. Across two half-day sessions, participants will explore essential data handling techniques using NumPy and Pandas—tools widely adopted for manipulating and analyzing biological data.
Module 2 (mandatory)
The module spans five full days and begins by introducing core concepts in machine learning. On the first day, the participants will be introduced to unsupervised learning, and they will implement clustering algorithms and dimensionality reduction techniques using real-world genomics data. The workshop then dives into supervised learning with a focus on classification and regression, including logistic regression and tree-based methods. Participants will construct and evaluate ML models, perform cross-validation, and tune hyperparameters in hands-on sessions tailored to cancer genomics datasets. Later sessions introduce deep learning concepts and the PyTorch framework. Participants will learn to build and train simple neural networks and explore a deep learning-based bioinformatics tool used in genomic variant calling. The final day introduces accelerated genomics through GPU-powered workflows. Participants will learn about GPU technology and how to use containerized bioinformatics tools. They will also implement high-performance, GPU-accelerated pipelines using Parabricks.
Please note: Registration is only required for participation in Module 2. BioNT can accommodate 40 participants for Module 2 due to logistical constraints, including access to virtual machines and GPUs. Module 1 is optional and does not require registration.
Who is this course for?
Join this workshop if you are:
● A life scientist, bioinformatician, or data analyst working with biological or genomic data
● Curious about how machine learning can be applied to biological research questions
● Looking to strengthen your Python skills for data handling and analysis
● Interested in implementing classification, regression, or clustering models on real-world datasets
● Exploring the use of deep learning techniques, in bioinformatics
● Involved in next-generation sequencing (NGS) workflows and want to optimize them with GPU acceleration
● Committed to building reproducible and scalable analysis pipelines using container technology
● Eager to understand and apply best practices in model evaluation, tuning, and validation
● New to machine learning and seeking a hands-on, structured introduction
Practical information and deadline to apply
Module 1 - May 27, 2025 - May 28, 2025, 09:00-12:00 CEST (no registration required)
Module 2 - June 2, 2025 - June 6, 2025, 09:00-16:00 CEST
Registration deadline: May 19, 2025
Location: Online