Data Advanced Learning Path - Transforming data into knowledge

An Introduction to Scikit-Learn: Machine Learning in Python

Let us first start by introducing tools that help performing data analytics tasks. Scikit-learn is an open-source library of tools developed for the Python programming language. It offers simple and efficient tools for predictive data analysis, like logistic regression, clustering, density estimation… as well as tools for data transformation and visualization. You can complete this reading with this other tutorial. If you don’t feel like installing python and all related packages on your own computer, you can simply go to Google Colab. It is an online platform provided by Google to develop and run your code directly on your browser. An alternative is to install Jupyter Lab, which runs on your own computer and provides a user-friendly environment to write and organize your code.

Digital Expert

D3.js tutorial

Machine Learning

Web Development

Another useful tool for data visualization is D3.js. D3.js is a widely-used, free, open-source JavaScript library for visualizing data. It relies on open standards like SVG and Canvas and is mainly meant to be used on the web. It is a low-level toolbox that allows to compose the tools into a process that is truly useful to you. It supports a wide variety of graph types, data ingestion, user interactions and layouts.

This tutorial takes you through the main components of D3.js and its prominent use cases. It also includes some additional resources. Another introductory resource is offered here.

Linear regression: House Price Prediction in Python

Machine Learning

Let’s now dive into a simple yet powerful data analytics method, that looks at existing data, derives a model from that data and allows to make predictions about previously unseen data. It is also called a regression algorithm.

For instance, imagine you have collected data from a buying preferences of a supermarket’s customers database. For each customer, you know where he lives, his age, and may be a few more information; you also know how much he spends in your shop. By analyzing this kind of data, you can derive a model of a customer, i.e. how a typical customer behaves, according to his characteristic, and thus you can predict how much a new customer would spend, according to his living place, his age and a few more infos… This method is called Linear regression, and this video introduces it and presents its implementation using scikit-learn.

See more materials

Advanced learning materials

Digital Expert

Python for Data Science and Machine Learning Bootcamp

If you want to dig further into the technical tools that were presented earlier, to deepen your knowledge of the methods we saw and to learn about other ones, just follow this online course. It covers some of the elements learned previously, but in a more detailed way, and also addresses other types of methods, like Decision trees, Principal Component Analysis, Natural Language Processing and others… It is definitely a very nice way to take your skills to the next level.

Digital Expert

Mastering data visualization in D3.js

If you are more into the visualization part, this online course is also for you. It is a well-organized and well-structured extensive presentation of all D3.js main features. They are presented in a progressive approach, and illustrated by 4 class projects that give you the opportunity to practice your skills.