So, What is a Data Scientist Anyway?

Advancements in technology have disrupted nearly every industry and created career opportunities that were once implausible. So, it should come as no surprise that nearly half of the 50 Best Jobs in America, according to Glassdoor, are tech-related. What may be surprising, however, is that in 2016, data scientist came in at the top of the list.

Simply put, data scientists are big data wranglers. They explore and analyze datasets in order to understand and organize data, identify underlying patterns and trends, and develop methods that identify how to best extract and summarize information from the data that can be used to inform better decision-making. 

McKinsey study predicts that by 2018, the number of data science jobs in the United States alone will exceed 490,000. However, despite demand, there will be fewer than 200,000 available data scientists to fill these positions. Globally, this demand is projected to exceed supply by more than 50 percent in the next two years. I was lucky enough to find my calling in numerical analytics and scientific computing, but how can we inspire an entire generation to track along career paths that emphasize quantitative reasoning as industries place more importance on technology and data insights?


It all starts with math

A career in data science begins not only with a love for mathematics, but also with a knack for applying mathematical concepts to topics from other aspects of life both academically and in general. Traditionally, school curriculums do not emphasize many quantitative toolsets required for analyzing and manipulating large volumes of data such as statistics, matrix algebra, and hands-on exercises geared at translating these methods into numerical algorithms. While this is starting to change as more emphasis is placed on science, technology, engineering and math (STEM) education, middle school and high school mathematics curriculums tend to still primarily focus on preparing students for calculus. However, other analytical toolsets, such as statistics and discrete math, offer critical and different ways of thinking that is key to data science.

I’ve personally always had a passion for math but it wasn’t until my junior year of college that I decided to become a math major. Like many others, I initially thought the only thing you could do with a mathematics degree was to teach high school students, but technology has opened the door for a range of career possibilities.

After college, I pursued a graduate degree, studying applied mathematics and scientific computing. For my post-doctorate, I focused on biomathematics and held a joint appointment in critical care medicine designing data-driven models to better understand complex medical processes. It was my varied educational background, coupled with real-world experience in data modeling, which led me to my current role as the first data scientist at a global software company.

A Growing Industry

IoT is driving demand

When I was hired as a data scientist in 2014, it was still a relatively new field. The growth of connected devices, sensors, and better Internet access globally, however, has created an abundance of messy data—driving the demand for data scientists across industries.

When I say data is messy, I’m referring to data quality. Think of it as missing fields from manual entry. To bring it to a consumer level, fitness trackers are a perfect example of disorganized data. When you enter information into a fitness tracker, you tend to do input it quickly. For example, after you ride a bike or go for a run, you may input the distance you traveled; however, there is so much additional information that could have also been added. How many minutes did you exercise? Did you ride a road bike, a mountain bike, or a beach cruiser? Did you run on a treadmill or a trail? At what resistance or pace did you ride? What about your age, weight and activity level? All of these factors help improve the data quality and inform a more complete story about your fitness and health.

When it comes to enterprise-level initiatives, data science teams tackle the challenge of identifying and developing ways to produce measureable outputs of value from data of variable quality originating from disparate sources. Decision-makers want to see summary numbers presented in an informative and consumable way. In the desire to see whole numbers, users do not always understand the importance of also looking at the statistical certainty around data measurements. It is my team’s job to take statistical validity into account while evaluating metrics for both data quality and for performance benchmarking. The data science team will scour through data in order to create and measure benchmarks for tracking improvement efforts and for identifying trends or opportunities for growth.

Every organization’s data might start messy, but it all holds valuable insights that can effect the bottom line. Data scientists can help organizations transform the data being collected in ways that will ultimately help achieve business objectives.

Talent Pipeline

Opening the door for data scientists

My job is to help industrial companies, such as oil and gas refineries or utility providers, organize, qualify, and manage data from digital assets, and then use this data to draw strategic insights around how to improve asset performance and reduce risks. In a turbulent energy market, identifying efficiencies and realizing cost savings from data is critical for many of these businesses to stay afloat. But this is just in one sector—many other organizations have identified the need for a data science team, though few have thus far been able to fill these types of roles.

In order to effectively build a talent pipeline for data scientists, there needs to be more of a focus on teaching quantitative skills beyond calculus prep in a mathematics education. There must be increased awareness at the high school and college levels of what skill sets are in demand so programs may be tailored accordingly. Every year the number of opportunities for this skill set grows, and the need for data scientists at a range of companies has never been greater.

Beyond math skills, prospective data scientists need to know how to think creatively and develop context and a story for the data they are analyzing. Data scientists need to be talented with numbers, but they must also excel at problem solving by leveraging various types of data. The art of taking qualitative phenomenon and quantifying it in a meaningful way is a difficult challenge, largely due to the fact it is an open-ended task and not straightforward like a number crunching process. However, everything can be modeled into a mathematical story, and having the ability to look at data sets and develop strategic insights from a business mindset is what makes data scientist so valuable.

Sarah Lukens, Data Scientist, Asset Performance Management, GE Digital