Delivering evidence from online job advertisements

For over 10 years, Cedefop and Eurostat have developed a system for extracting labour market information from Online Jobs Advertisements (OJAs) through a multilingual Data Production System (DPS). This System follows a pipeline structure: from data scraping and pre-processing job postings, to ultimately extracting structured information such as occupations, skills, economic activity and field of study based on ISCO and ESCO classifications.
For this report, Cedefop and Eurostat have added data scraped by a Large Language Model (LLM) to improve categorisation accuracy and skills extraction due to the more complex phrasing used in OJAs. This work aims to provide a better match between skills supply and demand. These systems are able to extract detailed and timely information from the OJAs that should be used to support evidence-based policy-making and labour market analysis.
For digital skills, by using community-driven platforms like GitHub, Overflow etc., nearly 400 relevant skills were identified for potential ESCO updates mainly in the fields of software tools, computer networks and programming language. For green skills experts identified that in “green by definition” occupations, explicit green skills may not be listed as they are implied in the job title or company’s mission rather than job requirements. They found problematic classifications in the field of study/qualification extraction, due to insufficient coverage and uneven terms across languages.
Ultimately this OJA-based skills intelligence analysis provides a more specific understanding of job market demands, insights into dynamic regional variations and points out skills gaps.