Master's – Data scientists: proposal of a conceptual model considering their definition, education, skills and tools they use

Tipo de evento: 
Data e hora: 
29/09/2021 - 09:00 to 12:00


Fabiano Castello De Campos Pereira

Master's – Data scientists: proposal of a conceptual model considering their definition, education, skills and tools they use 

Advisor: Prof. Dr. Cesar Alexandre de Souza

Comission: Profs. Drs. Daielly Melina Nassif Mantovani, Rodrigo Baroni de Carvalho and Alexandre Del Rey

Link YouTube:


Data scientists are one of the actors that explore the potential of big data to generate insights and create new forms of value that transform organizations and society. It is a profession where there are few published studies on what a data scientist is and what professional skills this type of function demands. The aim of this study is to propose a conceptual model for data scientists, considering their definition, education, skills and tools they use. The study is exploratory and has a qualitative approach and was conducted from three perspectives. First, from the academy, where a systematic literature review was carried out, based on 2,245 documents. Second, from the market, through the collection and analysis of 1,308 job openings. Third, from people who practice data science, through the analysis of secondary data from the 2019 Data Hackers BR survey. The set of three perspectives generated a conceptual model for Data Scientists, which was then validated with 201 experts from academia and the market. From the result of this process, which generated a robust and comprehensive conceptual model for data scientists, it is possible to conclude on (a) the existence of a significant distance of understanding about the profession between data science practitioners and the companies that seek to hire them; (b) the issue of "over-qualification", in the sense that many skills are needed, hardly found in just one professional, suggesting that there are groups or families of data scientists, with more specific skill sets but that not yet explicitly defined; and (c) the need for continuous improvement given the heterogeneity of techniques and tools needed, as well as the dynamism with which they evolve. Additionally, the method used, with the extensive use of natural language processing (NLP) and text mining resources, can be automated and used by other authors for future studies of professional profiles, including those for professions other than data scientists.

*Abstract provided by the author



Voltar para a página de eventos