The Data Professions

#data #datascience #dataengineering #datastructures

With the development of Machine Learning, new professions have emerged. There are eight main roles, in addition to more traditional positions like project manager or developer. These roles help form a team capable of managing a project from start to finish. Creating models is certainly important, but it is not enough. It is also necessary to be able to deploy and maintain applications, requiring a wide range of different skills.
These professions are divided into three branches:

A "model" oriented branch
An "integration" oriented branch
A "support" oriented branch

In the "model" branch, the first role is the Data Analyst. Their primary task is to prepare and format data, either to extract Key Performance Indicators (KPIs) directly or to inject the data into Machine Learning algorithms. They can also provide dashboards and possess skills in data visualization.
The second role is the Data Scientist. Their task is to create models from the prepared data. This role requires strong knowledge in mathematics and statistics as well as programming skills, mainly in R or Python. The Data Scientist role is often seen as an evolution of the Data Analyst role, either through specialized training or with on-the-job experience, though the skill sets do not completely overlap between the two positions.

In the "integration" branch, there are three roles. Their purpose is to enable models to interact with more complex IT systems. Their roles range from data input retrieval to result output and model retraining.

The first role is the Data Architect. Similar to the Solution Architect (in more traditional IT), their task is to create an architecture that allows all elements to interact together, but focused on data flow. Knowledge in Big Data is often necessary due to the volumes handled (skills in Spark and Hadoop are sought after).
The second role is the Data Engineer. Their task is to implement the architecture defined by the Data Architect. This requires good programming and process automation skills. The Data Architect also needs to understand this technical aspect to create high-quality architectures.
The third role is the Data Integrator. Their task is to ensure that data can transition from one system to another, in the correct format and with the right syntax. This role requires knowledge in data buses, middleware, and data transformation.

A sixth role, increasingly mentioned in literature and articles, is the Machine Learning Engineer. This is a Data Engineer trained in Data Science and Machine Learning or a Data Scientist who has learned programming.

Finally, in the "support" branch, two new roles have emerged. Their task is not to execute projects but to support other roles and the project once in production.

The first role is Data Support. This is a helpdesk with added data project skills, allowing them to monitor models and act quickly when a problem arises. The more models in production, the more crucial their role becomes in ensuring service continuity.
The Data Steward, on the other hand, is a project manager with additional data skills. Since Machine Learning projects cannot be managed in the same way as more traditional development projects, adapted project management is necessary. Their role includes discussing with clients, estimating the remaining work, ensuring good communication among all team members, and, in large projects, managing the parallel progress of various roles.

Most of these professions did not exist a few years ago, and many more are likely to emerge in the coming years, which will inevitably change the scope of each of these roles. Similarly, as technologies evolve very rapidly, individuals working in these various professions must continuously follow developments and undergo training. Without this ongoing learning, they risk quickly finding themselves with outdated skills.

DEV Community

The Data Professions

Top comments (0)

Read next

Building Fault-Tolerant Data Systems

Large-scale Data Processing with Step Functions : AWS Project

Databricks vs. Hadoop: Which Platform is Best for Predictive Analytics?

New AttackVector Jailbreaks LLMs by Prompt Manipulation