Module Details

Module Code: SOFT8033
Title: Big Data & Analytics
Long Title: Big Data & Analytics
NFQ Level: Advanced
Valid From: Semester 1 - 2017/18 ( September 2017 )
Duration: 1 Semester
Credits: 5
Field of Study: 4811 - Computer Science
Module Delivered in: 2 programme(s)
Module Description: On completion of this module students will be competent in the application of Big Data frameworks for parallelized processing of data. This module will also equip students with the skills to perform analytics on big data problems using scalable machine learning algorithms and interpret the results.
 
Learning Outcomes
On successful completion of this module the learner will be able to:
# Learning Outcome Description
LO1 Utilise a Big Data platform to facilitate the pre-processing and exploration of a large dataset.
LO2 Apply machine learning techniques for parallel analysis of large data sets using a distributed framework.
LO3 Implement streaming analytics to facilitate the processing of real-time data streams.
LO4 Perform analytics on graphs in a scalable manner.
Dependencies
Module Recommendations

This is prior learning (or a practical skill) that is strongly recommended before enrolment in this module. You may enrol in this module if you have not acquired the recommended learning but you will have considerable difficulty in passing (i.e. achieving the learning outcomes of) the module. While the prior learning is expressed as named MTU module(s) it also allows for learning (in another module or modules) which is equivalent to the learning specified in the named module(s).

Incompatible Modules
These are modules which have learning outcomes that are too similar to the learning outcomes of this module. You may not earn additional credit for the same learning and therefore you may not enrol in this module if you have successfully completed any modules in the incompatible list.
No incompatible modules listed
Co-requisite Modules
No Co-requisite modules listed
Requirements

This is prior learning (or a practical skill) that is mandatory before enrolment in this module is allowed. You may not enrol on this module if you have not acquired the learning specified in this section.

No requirements listed
 
Indicative Content
Introduction
Introduction to the concept and characteristics of Big Data (volume, velocity, variety) and associated challenges. Overview of Big Data applications in domains such as finance, medicine, social media, transportation, etc.
Big Data Platforms
Introduction to Apache Hadoop and HDFS. Processing large datasets using MapReduce. Managing resources with YARN. Limitations of MapReduce. Introduction to Apache Spark. Installing Hadoop with Spark Clusters.
Big Data Analytics
Introduction to algorithms for the analysis of high velocity data. Creating Spark sessions, dataframes, datasets. Performing analytics with the dataset API. Machine learning for parallel analysis of large data sets using a distributed framework such as Spark MLlib.
Stream Processing
Distributed stream processing for data real-time analysis using a distributed framework such as Spark Streaming. Advantages and disadvantages of Spark streaming. Architecture and application flow for Spark streaming. Stateless and stateful processing. Fault tolerance. Spark streaming with Kafka and HBase. Performance monitoring and tuning.
Graph Analytics
Introduction to graph processing using a package such as Sparks GraphX. Graph algorithms and views. Creating graphs, transforming, modifying and joining graphs. VertexRDD and EdgeRDD operations. Loading and saving graphframes.
Module Content & Assessment
Assessment Breakdown%
Coursework50.00%
End of Module Formal Examination50.00%

Assessments

Coursework
Assessment Type Project % of Total Mark 25
Timing Week 7 Learning Outcomes 1,2
Assessment Description
Big Data Analytics Project. Perform necessary pre-processing and analytical exploration of a Big Data problem. Apply distributed machine learning techniques and evaluate the results. Findings should be documented in a report.
Assessment Type Project % of Total Mark 25
Timing Week 12 Learning Outcomes 3,4
Assessment Description
Big Data Case Study. Students will implement an analytics solution for a Big Data case study requiring stream processing and graphics analytics.
End of Module Formal Examination
Assessment Type Formal Exam % of Total Mark 50
Timing End-of-Semester Learning Outcomes 1,2,3,4
Assessment Description
End of Semester Formal Examination.
Reassessment Requirement
Repeat examination
Reassessment of this module will consist of a repeat examination. It is possible that there will also be a requirement to be reassessed in a coursework element.

The University reserves the right to alter the nature and timings of assessment

 

Module Workload

Workload: Full Time
Workload Type Contact Type Workload Description Frequency Average Weekly Learner Workload Hours
Lecture Contact Delivers the concepts and theories underpinning the learning outcomes. Every Week 2.00 2
Lab Contact Application of learning to case studies and project work. Every Week 2.00 2
Independent Learning Non Contact Student reads recommended papers and practices implementation. Every Week 3.00 3
Total Hours 7.00
Total Weekly Learner Workload 7.00
Total Weekly Contact Hours 4.00
Workload: Part Time
Workload Type Contact Type Workload Description Frequency Average Weekly Learner Workload Hours
Lecture Contact Delivers the concepts and theories underpinning the learning outcomes. Every Week 2.00 2
Lab Contact Application of learning to case studies and project work. Every Week 2.00 2
Independent Learning Non Contact Student reads recommended papers and practices implementation. Every Week 3.00 3
Total Hours 7.00
Total Weekly Learner Workload 7.00
Total Weekly Contact Hours 4.00
 
Module Resources
Recommended Book Resources
  • Venkat Ankam. (2016), Big Data Analytics, 1st. Venkat Ankam, [ISBN: 9781785884696].
  • Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia. (2015), Learning Spark: Lightning-Fast Big Data Analysis, O'Reilly Media, [ISBN: 9781449358624].
  • Nick Pentreath. (2015), Machine Learning with Spark, 1st. Packt Publishing, [ISBN: 9781783288519].
Supplementary Book Resources
  • Mohammed Guller. (2015), Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, 1st. Apres, [ISBN: 9781484209653].
This module does not have any article/paper resources
Other Resources
 
Module Delivered in
Programme Code Programme Semester Delivery
CR_KSDEV_8 Bachelor of Science (Honours) in Software Development 8 Mandatory
CR_KWEBD_8 Bachelor of Science (Honours) in Web Development 8 Mandatory