Data Science: Skills Not Titles

The article reviews a list of typical data scientist job titles since titles often do not reflect the actual job function or the experience and skill sets required in the role.

Recruiting data science talent faces many of the same challenges as any hiring – writing an effective job description, attracting talented, qualified candidates, evaluating the candidates, successfully recruiting those receiving offers, and retaining talent – but many hiring managers and HR specialists skip the most important and most obvious step – clearly defining the specific role and skills needed for the actual data science work involved.

Beware any job posting for a “data scientist.” While many job titles are not very informative, the “data scientist” title is specifically problematic. It has no clear definition in terms of scope and is generally overused by recruiters and human resources departments. Many data science job titles fall into the Goldilocks trap of being too broad or too narrow. On the broad end, it seems like nearly every technical worker, regardless of training or responsibilities, has rebranded as a “data scientist” to capitalize on the trend that was first introduced about 15 years ago. At the other end of the spectrum, many companies have a plethora of seemingly granular job titles like data scientist, data engineer, data analyst, machine learning engineer, data visualization expert, and statistician, which can cause confusion for recruiters.

The inclination is often to start with a job title and then build a job description and responsibilities list, followed by the qualifications for those duties. But when it comes to data science, it is critical to flip this order. Start with understanding the exact skill sets and experience required to do the work. Then, use this information to build the critical components of a job description, including a title.

That said, job titles do exist in the real world, so it is important to review the commonly used titles.

Commonly used titles

Data engineers (and the frequently synonymous data architects) are fundamental to any data science project. They play a critical role in designing, building, and maintaining the data structures used. They create the data pipelines and storage solutions that are later used by the other members of the data science team. Their engagement is necessary to ensure that the data solutions will support the user’s and business needs while staying within the constraints of the organization’s policies and standards.

Data analysts are the first line in the analytics process and play a major role in data preparation processes such as data cleaning. They also provide a second round of quality checking to correct mistakes in the data itself. Additionally, they are often the main people involved in data wrangling, also called data munging. This is the process of exploring and developing ways to transform the data into formats that are more valuable for further analysis and modeling work.

Data visualization specialists or data storytellers use data as inputs to communicate results, often to a non-technical audience. They often produce the key communications content used to explain the project’s value and the utility of its deliverables to the board and senior management. Ideally, this work goes beyond charts, tables, and graphs to develop a narrative that can support effective storytelling.

Machine learning scientists (sometimes research scientists or research engineers) focus on creating and using the newest technology and innovative approaches. They can be engaged in developing and applying new algorithms and data manipulation approaches. Because their work sits at the cutting edge of research and application, they often work in research and development departments. They actively read and engage with the data science community to understand new innovations through user groups, scientific journals, and conferences.

Some of these data scientists have a particular specialty, such as network analysis, computer vision, natural language processing (NLP), or spatial data, while others are more generalists.

Data scientist specialties

The network analysis specialist focuses on technologies used to analyze the structure of networks to determine, for example, how different customers are connected to each other or to different healthcare providers. The computer vision specialist could focus on using algorithms to automatically detect cancers or other medical conditions in an image.

The NLP expert looks at ways to automatically extract usable information from free-form text fields—for example, by converting clinical notes into clinical codes for diagnoses, procedures, and drugs or converting consumer comments into measures of sentiment. They can also be involved in translating texts to other languages and in using text-to-speech conversion processes. Today, the task of NLP has become vastly simpler with the release of large language models like GPT, BLOOM, and LaMDA. Relatively simple programs leveraging these large language models can perform many of the tasks that previously required deep NLP experience.

The spatial data scientist leverages GPS and other location data to create systems used for site selection (the best place for a new store), navigation (the best route to take), and other geospatial-specific tasks.

Some “old school” titles that may appear in your data science team include mathematician and statistician. Mathematicians often have degrees in operations research or applied mathematics and tend to focus on optimization problems. Statisticians have experience in theoretical and applied statistics. They would have a deep understanding of how to apply specific statistical tests or models and quantify uncertainty for specific problems and data sets.

Lastly, since many companies use the “data scientist” title, it is important to recognize that it can refer specifically to those who are engaged in designing, building, and implementing machine learning models. These individuals are experienced in the main programming languages and routinely do tasks such as feature engineering (creating new variables from available data), feature selection (selecting input variables that are the most informative for the task), and dimensionality reduction (removing variables that are not useful for the task or creating compact projections of the data that can be readily used).

The list of typical data scientist job titles above is meant to be only a rough guide since often titles don’t reflect the actual job function or the experience and skill sets required.

The task of recruiting top talent is daunting — the talent pool is global, as is the competition for recruiting those highly trained professionals. By moving beyond generic or misleading titles like “data scientist” and instead focusing on the specific skill sets, experience, and roles needed for each position, organizations can more effectively attract and recruit the right talent. This approach not only simplifies the hiring process but also sets the stage for more successful and productive data science teams. Embracing this strategy is essential for organizations to stay competitive in the ever-growing and rapidly changing field of data science.

Howard Friedman and Akshar Swaminathan
Howard Steven Friedman is a data scientist, health economist, and adjunct professor at Columbia University. Akshay Swaminathan leads the data science team at Cerebral and is a Knight-Hennessy scholar at Stanford University School of Medicine. Together they are authors of Winning with Data Science: A Handbook for Business Leaders (Columbia Business School Press, January 2024).