Subject code: MA5800:03
This subject will provide students with an overview of data science as a discipline as well as an introduction to a number of topics that play fundamental roles across various subjects in this area.
Students will learn different forms of representing and pre-processing data for further analysis and visualisation. They will also learn principles of algorithm analysis that will allow them to assess and compare the scalability of different algorithms to be studied across other subjects in the realm of data science. Core elements of this subject include: An Introduction to Data Science and Big Data; Data Types and Representation; Essentials on Data Visualisation of Tabular Data; Data Pre-Processing; Data Wrangling and Tidying; Algorithm Analysis; Case Studies; Software Practice (R).
Software platform: RStudio
My name is Ricardo Campello, I'm a professor in applied mathematics and statistics at James Cook University.
We will learn from the main techniques of data representation, visualisation, pre-processing and analysis, using a modern software package for data science, RStudio and the R programming language.
We will discuss and practice relevant concepts using, not only pedagogy but real data sets, including interesting case studies in domains such as text and network data analytics. Foundational concepts learnt in this subject will be used for most, if not all, subjects in the program.
It's common to hear from experienced data analysts that more than 50 per cent of the time in practical data analysis is spent just by pre-processing and preparing data for analysis. This is so-called data wrangling: selecting, cleaning, transforming and tidying data are essential for a successful data analysis task.
Also, there are multiple ways of representing data and describing the relationships, each of which can be more or less appropriate depending on the application scenario at hand.
Choosing appropriate representations in analysis tools can make the difference between success and failure in your data analysis task.
Students will be able to explain what data science is about, and the areas that play major roles within the realm of data science. They will be able to describe and apply the most common forms of data types and representations.
They will be able to describe and apply a core collection of simple, yet powerful, techniques for data visualisation and exploration; describe and apply core collection of elementary techniques for data pre-processing; interpret, compare and explain results of algorithm complex analysis as well as the importance of this type of analysis in data science.
You will also be able to apply common data representation and pre-processing techniques, such as rangling and tidying, using the software package - RStudio - and the programming language, R.
- Explain what data science is about and the areas that play major roles within the realm of data science
- Explain and exemplify the most common forms of data types and representations
- Identify and describe at a conceptual level a core collection of simple yet powerful techniques for data visualisation in the realm of data science
- Conceptually describe and apply a core collection of elementary techniques for data pre-processing
- Interpret and explain, at a conceptual level, results of algorithm analyses
- Apply common data representation and data pre-processing techniques, such as wrangling and tidying, using the software package and language R
This is one of the interdisciplinary subjects studied in the online Master of Data Science.
Please note, course structure and content are subject to change. For information on all course subjects download the course guide.