Big Data Basic Learning Path - Making value out of data: data analytics
With the world becoming increasingly digital, data is turning into an incredible high-value asset; nowadays, it is called “the fuel of the digital economy” or “the most valuable asset of an organization”. But what exactly is data? Are there different kinds of data? Where does it come from? What is big data? How can you extract value out of data? What tools and methods can be used to analyze data? What is the Data, Information, Knowledge, Wisdom pyramid? What are the opportunities and challenges of data analytics? As you can see, the topics is very broad, touching many aspects of computer science, like AI, networks, computing platform, software… This learning path will take you through different resources that will introduce you to those different subjects and provide answers to these questions (and probably raise many others). It also includes some practical resources to that you can get a grasp of how concretely this all can work.
Introduction to Data Analytics
This introductory course covers in 5 modules the main aspects of data analytics. It starts with an overview of the data analytics landscape, defining key terms, actors and processes. It then dives into more technical aspects by presenting the different types of data, where they come from, what a database is, what difference there is between a data mart, a data lake, what the ETL process is… It finally presents the data mining and visualization approaches, that turn raw data into meaningful shareable information. The course includes some additional readings and quizzes.
From data to knowledge
Now that the scene has been set, let’s define the core object of the learning path: data. This article explains the difference between data, information and knowledge, and the transformation chain. It defines different types of data and data sources, before making the link with the field of machine learning, as a way of exploiting data and turning it into highly valuable asset. The second part of the article focuses on the concept of data lake, a repository containing an enormous amount of raw data, making it available for on-demand access.
Python Basics for Data Science MOOC
Python is one of the most used programming language when it comes to analysing and visualizing data. It comes with a host of functions and libraries that allow to easily build extremely powerful tools to process and display data in a useful way. Since it is also heavily used in machine learning, it is the language of choice to develop a complete pipeline of data processing and analysing tool. This course will provide you with the basics of the Python programming language, that you can run on your own computer.
Data Visualisation Online Course
This course provided by the EU Academy provides an introduction to data visualisation. It starts by presenting common graph formats to display various kinds of data; it then provides guidelines and pitfalls to avoid when building a data visualization, recommendations about how to tell a story through data, and finally introduces some non-standard ways of visualizing data. Although the course doesn’t include any practical, the principles set forward are clearly the basis for building a proper data analytics pipeline.
Explained: What is Big Data?
The title of this video is probably not the best-chosen one, but the video itself is really interesting in that it explains in simple terms the different kinds of data that are collected about individuals by web companies, sometimes without informing the subject, and what kind of risks this can pose to user privacy. It also briefly touches upon the EU General Data Privacy Regulation. A very nice intro into the topic of data privacy.
Data analytics in a privacy-concerned world
This article from the Journal of Business Research offers an in-depth reflection on the privacy concerns raised by data analytics, looking at its different functions and studying how well they address user privacy. The academic article is a thorough and rigorous analysis of the subject and is a worthwhile reading for any user or data analyst concerned about protecting privacy.
This article introduces the concept of Advanced Analytics, i.e. going beyond traditional analysis and visualisation approaches, to integrate prediction of future trends and likelihood of potential events. It is a short yet worthwhile reading in that beyond presenting the concept, it also introduces several common terms in the data analytics field that may not have been covered earlier, and includes some interesting resources for further reading.
Fundamentals of Data Analytics in the Public Sector with R
If you are working in the public sector, this online course will probably be of interest to you. It starts by looking at the various functions of the public sector and how data analytics may support them. It also introduces R, a free and open language and environment for statistical computing and graphics, that is use through the modules to run data analysis. The course next focuses on the analysis of survey data and population data, two common types of data used in the sector. Finally, the course ends with some real-life stories and scenarios.
EUHubs4Data Training Catalogue
May be you are a music fan, may be you are not. If you are, you may find this dataset useful to carry out some data analytics methods you have learned. If you are not, you will find plenty of other datasets freely available in a wide variety of topics: health, business, environment, culture… Whether for practicing or out of a real business interest, there are truly interesting data sources. In general, the https://euhubs4data.eu/ platform is a very useful resource, in that it collates a catalogue of many courses and datasets.
Matplotlib Tutorial
If you are working with Python language to develop your data analytics tools, you will most likely come across the Matplotlib library, which is a low-level, very powerful data visualization library. It allows to create a wide range of graphics out of data. This website offers a tutorial on how to use Matplotlib, in a very progressive way. If you are using other tools (like R), they probably offer their own visualization tools.
OECD's Good Practice Principles for Data Ethics in the Public Sector
Whether or not you are working in the public sector, whenever you are working with data, one of your concerns should be about the ethical use of data. This guide published by the OECD presents 10 good practice principles for processing data in an ethical way. The practices are supported by implementation recommendations. This is a very valuable reading in that it reminds of an important consideration that needs to be incorporated as early as possible in the data analytics process.
Introducing Open Data
When working on data analytics, one needs data; although this may sound obvious, it is not always simple to find enough good quality reliable data. In the past years, the open data movement has widened, and many datasets are made publicly available in various domains of activities (road works, flood maps, petrol prices…) This course, offered on the portal for European data from the European Union, presents the concept of open data as well as its impact as a driver for innovation. The portal itself is an open window on the open data world. It provides information, online courses, datasets that are made available to anyone.