MTU Courses - SOFT8033 - Big Data & Analytics

Module Details

Module Code:	SOFT8033
Title:	Big Data & Analytics
Long Title:	Big Data & Analytics
NFQ Level:	Advanced
Valid From:	Semester 1 - 2017/18 ( September 2017 )

Duration:	1 Semester

Credits:	5

Field of Study:	4811 - Computer Science

Module Delivered in:	2 programme(s)

Module Description:	On completion of this module students will be competent in the application of Big Data frameworks for parallelized processing of data. This module will also equip students with the skills to perform analytics on big data problems using scalable machine learning algorithms and interpret the results.

Learning Outcomes
On successful completion of this module the learner will be able to:
#	Learning Outcome Description
LO1	Utilise a Big Data platform to facilitate the pre-processing and exploration of a large dataset.
LO2	Apply machine learning techniques for parallel analysis of large data sets using a distributed framework.
LO3	Implement streaming analytics to facilitate the processing of real-time data streams.
LO4	Perform analytics on graphs in a scalable manner.

Dependencies
Module Recommendations This is prior learning (or a practical skill) that is strongly recommended before enrolment in this module. You may enrol in this module if you have not acquired the recommended learning but you will have considerable difficulty in passing (i.e. achieving the learning outcomes of) the module. While the prior learning is expressed as named MTU module(s) it also allows for learning (in another module or modules) which is equivalent to the learning specified in the named module(s).

Incompatible Modules These are modules which have learning outcomes that are too similar to the learning outcomes of this module. You may not earn additional credit for the same learning and therefore you may not enrol in this module if you have successfully completed any modules in the incompatible list.
No incompatible modules listed
Co-requisite Modules
No Co-requisite modules listed
Requirements This is prior learning (or a practical skill) that is mandatory before enrolment in this module is allowed. You may not enrol on this module if you have not acquired the learning specified in this section.
No requirements listed

Indicative Content
Introduction Introduction to the concept and characteristics of Big Data (volume, velocity, variety) and associated challenges. Overview of Big Data applications in domains such as finance, medicine, social media, transportation, etc.
Big Data Platforms Introduction to Apache Hadoop and HDFS. Processing large datasets using MapReduce. Managing resources with YARN. Limitations of MapReduce. Introduction to Apache Spark. Installing Hadoop with Spark Clusters.
Big Data Analytics Introduction to algorithms for the analysis of high velocity data. Creating Spark sessions, dataframes, datasets. Performing analytics with the dataset API. Machine learning for parallel analysis of large data sets using a distributed framework such as Spark MLlib.
Stream Processing Distributed stream processing for data real-time analysis using a distributed framework such as Spark Streaming. Advantages and disadvantages of Spark streaming. Architecture and application flow for Spark streaming. Stateless and stateful processing. Fault tolerance. Spark streaming with Kafka and HBase. Performance monitoring and tuning.
Graph Analytics Introduction to graph processing using a package such as Sparks GraphX. Graph algorithms and views. Creating graphs, transforming, modifying and joining graphs. VertexRDD and EdgeRDD operations. Loading and saving graphframes.

Assessment Breakdown	%
Module Content & Assessment
Coursework	50.00%
End of Module Formal Examination	50.00%

Assessments

Coursework

Assessment Type	Project	% of Total Mark	25
Timing	Week 7	Learning Outcomes	1,2
Assessment Description Big Data Analytics Project. Perform necessary pre-processing and analytical exploration of a Big Data problem. Apply distributed machine learning techniques and evaluate the results. Findings should be documented in a report.

Assessment Type	Project	% of Total Mark	25
Timing	Week 12	Learning Outcomes	3,4
Assessment Description Big Data Case Study. Students will implement an analytics solution for a Big Data case study requiring stream processing and graphics analytics.

End of Module Formal Examination

Assessment Type	Formal Exam	% of Total Mark	50
Timing	End-of-Semester	Learning Outcomes	1,2,3,4
Assessment Description End of Semester Formal Examination.

Reassessment Requirement
Repeat examination Reassessment of this module will consist of a repeat examination. It is possible that there will also be a requirement to be reassessed in a coursework element.

The University reserves the right to alter the nature and timings of assessment

Module Workload

Workload: Full Time
Workload Type	Contact Type	Workload Description	Frequency	Average Weekly Learner Workload	Hours
Lecture	Contact	Delivers the concepts and theories underpinning the learning outcomes.	Every Week	2.00	2
Lab	Contact	Application of learning to case studies and project work.	Every Week	2.00	2
Independent Learning	Non Contact	Student reads recommended papers and practices implementation.	Every Week	3.00	3
Total Hours					7.00
Total Weekly Learner Workload					7.00
Total Weekly Contact Hours					4.00

Workload: Part Time
Workload Type	Contact Type	Workload Description	Frequency	Average Weekly Learner Workload	Hours
Lecture	Contact	Delivers the concepts and theories underpinning the learning outcomes.	Every Week	2.00	2
Lab	Contact	Application of learning to case studies and project work.	Every Week	2.00	2
Independent Learning	Non Contact	Student reads recommended papers and practices implementation.	Every Week	3.00	3
Total Hours					7.00
Total Weekly Learner Workload					7.00
Total Weekly Contact Hours					4.00

Recommended Book Resources
Module Resources
Venkat Ankam. (2016), Big Data Analytics, 1st. Venkat Ankam, [ISBN: 9781785884696]. Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia. (2015), Learning Spark: Lightning-Fast Big Data Analysis, O'Reilly Media, [ISBN: 9781449358624]. Nick Pentreath. (2015), Machine Learning with Spark, 1st. Packt Publishing, [ISBN: 9781783288519].
Supplementary Book Resources
Mohammed Guller. (2015), Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, 1st. Apres, [ISBN: 9781484209653].
This module does not have any article/paper resources
Other Resources
Website, Udacity - Intro to Hadoop and MapReduce, https://www.udacity.com/course/intro-to- hadoop-and-mapreduce--ud617 Website, Coursera: Big Data Specialization, https://www.coursera.org/specializations /big-data Website, Apache Hadoop, http://hadoop.apache.org/ Website, Apache Spark, http://spark.apache.org/

Programme Code	Programme	Semester	Delivery
Module Delivered in
CR_KSDEV_8	Bachelor of Science (Honours) in Software Development	8	Mandatory
CR_KWEBD_8	Bachelor of Science (Honours) in Web Development	8	Mandatory