As a relatively new career emerging quickly to the forefront of many companies, the first main question tends to be: what is a data scientist? As the name implies, this is an individual with the capability to integrate data capabilities (extracting, prepping, storing, etc.) with science guiding the process of sampling, analysis, and reporting. Given the morass of recommendations and considerations of how to become a data scientist, it can be challenging to decipher what the fundamentals of data science are. There are two domains that underlie data scientist qualifications: technical and non-technical, and design-thinking (in both a technical and business sense) is the key to both.
Data Science Concepts: Technical
The core of most technical data science basics is math, making it the primary fundamental data science concept. An aptitude for (and ideally an enjoyment in) math is a key building block for developing the technical data scientist qualifications. The logic and theories in math lend directly to the development of data models and algorithms needed to solve business problems.
Mathematical skills are necessary for developing the second fundamental data science concept: statistics. Key statistical concepts include:
- Appropriate sampling techniques (particularly crucial when applying experimental conditions and/or evaluating the provenance of the data)
- Data distributions (the shape of the data for the most accurate analyses)
- Central tendency (finding the centre of the distribution)
- Dispersion (how much the data varies).
A background in statistics guides the data scientist in which statistical test is appropriate to use for the data and the business problem at hand. Statistical theory is sometimes undervalued in a fast-paced data science world where the emphasis seems to be placed more in terms of coding and data processing. However, such theory creates a vital difference between analysts: those who can employ various models and algorithms (but aren’t sure why they are using some instead of others), and those who are clear on why particular models and algorithms are selected and how they actually work.
3. Programming Languages
Python is leading the pack as one of the top data scientist qualifications in terms of programming. Many in the role speak to the ease of use in coding and its flexibility for its steady rise to the top. As a program developed particularly for social science analysis, R is still a preferred language for many companies. Learning both is strongly recommended. SQL is also desired in order to query from relational databases. Other programming languages such as C/C++ are still considered quite useful as well.
4. Data Engineering
It is surprising to some to learn that much of a data scientist’s time is spent on sourcing and preparing data - certainly as much as it is on the development of algorithms and models. One data scientist qualification would include familiarity with ETL:
Extract (data from a source)
Transform (put it into a format for the destination database/storage)
Load (get it into the system used for analysis)
The ability to wrangle data is particularly crucial when dealing with unstructured, undefined data sets (such as those found when extracting from social media or blogs).
5. Machine Learning
Machine learning is a subset of artificial intelligence. Software is employed here to teach a machine how to detect patterns and themes in data without a specific set of instructions. Machine learning is among data science basics - it assists the data scientist in analyzing huge volumes of data efficiently as it permits for real-time data processing.
6. Data Visualization/Story Telling
The ability to display complex information in a form that is readily understood is a critical data science concept. Well-designed charts and graphs can convey at a glance what might otherwise take several paragraphs of summative text. Design thinking is a technical skill in this context - data visualizations are key in assisting non-technical decision-makers and stakeholders in understanding the data and the results. Learning how to become a data scientist involves learning how to tell the story of the data from its origins to how to best understand the findings.
Data Science Concepts: Non-Technical
7. Critical Thinking
When answering the question “what is a data scientist,” we spoke to the science aspect of the role. A key feature of any scientist is his or her ability to remain objective in the face of a problem and let evidence and theory guide the solution. This is critical thinking at work – that objectivity permits the data scientist to find ways to minimize bias from the beginning (data sources) to the end (reporting). Critical thinking underlies the capability to ask the right questions for the most meaningful result. The ability to think critically also permits the data scientist to separate relevant information from noise in the data.
As in most disciplines, the ability to communicate is fundamental to succeeding as a data scientist. Communication occurs at all levels – data scientists must be able to accurately convey their needs and objectives to:
- Others in their team (such as developers)
- Other units (such as marketing where the need might be the most cost-effective marketing campaign).
- The data scientist must be capable of accurately and clearly communicating the results to the key decision-makers and stakeholders such that the most effective decisions and strategies are put into place.
9. Business Acumen
It isn’t enough to come into the role of data scientist equipped with technical skills and ability – you will not succeed if you do not also possess the capability to learn the business culture, mission, and processes. Another data scientist qualification is understanding the importance of the business context that frames the question at hand. The context provides the framework for the data sources, the models, the algorithms, and the meaning of the results. A data scientist who possesses business acumen will also be able to engage in the non-technical aspect of design thinking: determining the most effective means for leveraging the data for a competitive market edge. Data scientists are not only analysts – they are leaders.
Bringing it all Together
Among the technical and non-technical basics of computer science is an element that brings them together – intellectual persistence and the drive to learn and enhance existing skills. Approximately 75% of data scientists possess a master’s degree or Ph.D. While certainly there are informal paths in becoming a data scientist, most would benefit from having an advanced degree, particularly one targeted for data science. This takes us to the last of the data science qualifications:
10. Data Science Education
Formal training in data science should be robust, rigorous, and relevant. However, many individuals seeking to become data scientists are not always in a position to attend a traditional brick and mortar program. Fortunately, there are advanced degree online programs to fulfil this need. The JCU Online Master of Data Science offers an opportunity to develop the technical and non-technical skills necessary to succeed as a data scientist.
A career as a data scientist can certainly be lucrative, and the need is growing, but primarily the career of a data scientist is one of intellectual stimulation. When your career marries your technical skills with your business savvy, your work will rarely be routine.