Skip to main content
Search by keyword

Data Advanced Learning Path - Transforming data into knowledge

Data Advanced Learning Path - Transforming data into knowledge

The first learning path on “Data analytics” introduced the basics of the field. In this second learning path, we will dig much deeper into advanced data analytics. This broad term covers many activities that all revolve around the transformation of data into knowledge. In other world, analyzing raw data beyond common statistical analysis, and use it to establish a generic model that allows to make predictions, or classify new data or group similar data together (also known as data clustering). Advanced analytics is used in many fields, to optimize activities and processes, provide reliable forecasts or improve the understanding of the surrounding world. The boundary between advanced data analytics and data science is a blurred one; both fields share algorithms and approaches. But data science goes way beyond the methods that will be presented here, including very advanced data processing like neural networks, deep learning, Natural Language Processing… This second learning path will be much more technical and practical than the first one, and will include detailed description of some mathematical methods as well as tools to efficiently implement them. 

Introductory learning materials

An Introduction to Scikit-Learn: Machine Learning in Python

Let us first start by introducing tools that help performing data analytics tasks. Scikit-learn is an open-source library of tools developed for the Python programming language. It offers simple and efficient tools for predictive data analysis, like logistic regression, clustering, density estimation… as well as tools for data transformation and visualization. You can complete this reading with this other tutorial. If you don’t feel like installing python and all related packages on your own computer, you can simply go to Google Colab. It is an online platform provided by Google to develop and run your code directly on your browser. An alternative is to install Jupyter Lab, which runs on your own computer and provides a user-friendly environment to write and organize your code.

D3.js tutorial

Another useful tool for data visualization is D3.js. D3.js is a widely-used, free, open-source JavaScript library for visualizing data. It relies on open standards like SVG and Canvas and is mainly meant to be used on the web. It is a low-level toolbox that allows to compose the tools into a process that is truly useful to you. It supports a wide variety of graph types, data ingestion, user interactions and layouts. 

This tutorial takes you through the main components of D3.js and its prominent use cases. It also includes some additional resources. Another introductory resource is offered here.

Linear regression: House Price Prediction in Python

Let’s now dive into a simple yet powerful data analytics method, that looks at existing data, derives a model from that data and allows to make predictions about previously unseen data. It is also called a regression algorithm. 

For instance, imagine you have collected data from a buying preferences of a supermarket’s customers database. For each customer, you know where he lives, his age, and may be a few more information; you also know how much he spends in your shop. By analyzing this kind of data, you can derive a model of a customer, i.e. how a typical customer behaves, according to his characteristic, and thus you can predict how much a new customer would spend, according to his living place, his age and a few more infos… This method is called Linear regression, and this video introduces it and presents its implementation using scikit-learn.

Advanced learning materials

Python for Data Science and Machine Learning Bootcamp

If you want to dig further into the technical tools that were presented earlier, to deepen your knowledge of the methods we saw and to learn about other ones, just follow this online course. It covers some of the elements learned previously, but in a more detailed way, and also addresses other types of methods, like Decision trees, Principal Component Analysis, Natural Language Processing and others… It is definitely a very nice way to take your skills to the next level. 

Mastering data visualization in D3.js

If you are more into the visualization part, this online course is also for you. It is a well-organized and well-structured extensive presentation of all D3.js main features. They are presented in a progressive approach, and illustrated by 4 class projects that give you the opportunity to practice your skills.

Logistic Regression in Machine Learning

We can now have a look at another data analysis method, that allows to predict the probability for a new item to belong to one class or another. For instance, it could be a spam classifier that looks at some characteristics of the messages you receive and decides whether or not it is spam. It falls in the class of supervised classification algorithm, because it has to be trained on data (i.e. emails) that have already been labelled as spam or non-spam. This article presents the method, giving its mathematical foundation and its scikit-learn implementation. If you are not a math fan, at least read through the first part to get an intuition of how this algorithm work.  

As a bonus, have a look at this video for a comparison between linear and logistic regression.

Learning path Details

Digital skill level
Digital technology / specialisation