Subject code: MA5831:03
This subject will provide students with cutting-edge tools and techniques for high-performance and large-scale computing, with focus on computer models and software designed to handle Big Data sets in a distributed and/or parallel fashion. Particular focus will be given to distributed and parallel computing using Map-Reduce/Hadoop and similar models for processing Big Data sets.
Software platform: SAS exclusively and Hadoop
- Compare and evaluate different systems and approaches for high-performance and large-scale computing for analytics for standard data and big data
- Manage and prepare data using standard management frameworks for the purpose of transforming, cleaning to ensuring classical characteristic outcomes are achieved
- Perform data management tasks to improve data quality, entity resolution and data monitoring
- Examine and deploy data processing tasks in the Hadoop ecosystem for big data and critically evaluate the combination of Hadoop and SAS to overcome big data challenges
- Choose and apply different techniques and software for distributed and cloud computing of big data
- Conduct a written review of a current data processing technology to establish a critical baseline understanding of the academic research with regards to a new trend.
Assessment for this course will occur at various times across the seven-week study period. Tasks may include online quizzes, discussion board activity, portfolio development, case studies, reflection, literature reviews presentations and reports.Feedback will be provided to you throughout the study period as well as a final grade at the conclusion of the study period.
This is one of the interdisciplinary subjects studied in the online Master of Data Science.
Please note, course structure and content are subject to change. For information on all course subjects download the course guide.