Module Details
Module Code: |
SOFT8033 |
Title: |
Big Data & Analytics
|
Long Title:
|
Big Data & Analytics
|
NFQ Level: |
Advanced |
Valid From: |
Semester 1 - 2017/18 ( September 2017 ) |
Field of Study: |
4811 - Computer Science
|
Module Description: |
On completion of this module students will be competent in the application of Big Data frameworks for parallelized processing of data. This module will also equip students with the skills to perform analytics on big data problems using scalable machine learning algorithms and interpret the results.
|
Learning Outcomes |
On successful completion of this module the learner will be able to: |
# |
Learning Outcome Description |
LO1 |
Utilise a Big Data platform to facilitate the pre-processing and exploration of a large dataset. |
LO2 |
Apply machine learning techniques for parallel analysis of large data sets using a distributed framework. |
LO3 |
Implement streaming analytics to facilitate the processing of real-time data streams. |
LO4 |
Perform analytics on graphs in a scalable manner. |
Dependencies |
Module Recommendations
This is prior learning (or a practical skill) that is strongly recommended before enrolment in this module. You may enrol in this module if you have not acquired the recommended learning but you will have considerable difficulty in passing (i.e. achieving the learning outcomes of) the module. While the prior learning is expressed as named MTU module(s) it also allows for learning (in another module or modules) which is equivalent to the learning specified in the named module(s).
|
|
Incompatible Modules
These are modules which have learning outcomes that are too similar to the learning outcomes of this module. You may not earn additional credit for the same learning and therefore you may not enrol in this module if you have successfully completed any modules in the incompatible list.
|
No incompatible modules listed |
Co-requisite Modules
|
No Co-requisite modules listed |
Requirements
This is prior learning (or a practical skill) that is mandatory before enrolment in this module is allowed. You may not enrol on this module if you have not acquired the learning specified in this section.
|
No requirements listed |
Indicative Content |
Introduction
Introduction to the concept and characteristics of Big Data (volume, velocity, variety) and associated challenges. Overview of Big Data applications in domains such as finance, medicine, social media, transportation, etc.
|
Big Data Platforms
Introduction to Apache Hadoop and HDFS. Processing large datasets using MapReduce. Managing resources with YARN. Limitations of MapReduce. Introduction to Apache Spark. Installing Hadoop with Spark Clusters.
|
Big Data Analytics
Introduction to algorithms for the analysis of high velocity data. Creating Spark sessions, dataframes, datasets. Performing analytics with the dataset API. Machine learning for parallel analysis of large data sets using a distributed framework such as Spark MLlib.
|
Stream Processing
Distributed stream processing for data real-time analysis using a distributed framework such as Spark Streaming. Advantages and disadvantages of Spark streaming. Architecture and application flow for Spark streaming. Stateless and stateful processing. Fault tolerance. Spark streaming with Kafka and HBase. Performance monitoring and tuning.
|
Graph Analytics
Introduction to graph processing using a package such as Sparks GraphX. Graph algorithms and views. Creating graphs, transforming, modifying and joining graphs. VertexRDD and EdgeRDD operations. Loading and saving graphframes.
|
Module Content & Assessment
|
Assessment Breakdown | % |
Coursework | 50.00% |
End of Module Formal Examination | 50.00% |
Assessments
End of Module Formal Examination |
|
Reassessment Requirement |
Repeat examination
Reassessment of this module will consist of a repeat examination. It is possible that there will also be a requirement to be reassessed in a coursework element.
|
The University reserves the right to alter the nature and timings of assessment
Module Workload
Workload: Full Time |
Workload Type |
Contact Type |
Workload Description |
Frequency |
Average Weekly Learner Workload |
Hours |
Lecture |
Contact |
Delivers the concepts and theories underpinning the learning outcomes. |
Every Week |
2.00 |
2 |
Lab |
Contact |
Application of learning to case studies and project work. |
Every Week |
2.00 |
2 |
Independent Learning |
Non Contact |
Student reads recommended papers and practices implementation. |
Every Week |
3.00 |
3 |
Total Hours |
7.00 |
Total Weekly Learner Workload |
7.00 |
Total Weekly Contact Hours |
4.00 |
Workload: Part Time |
Workload Type |
Contact Type |
Workload Description |
Frequency |
Average Weekly Learner Workload |
Hours |
Lecture |
Contact |
Delivers the concepts and theories underpinning the learning outcomes. |
Every Week |
2.00 |
2 |
Lab |
Contact |
Application of learning to case studies and project work. |
Every Week |
2.00 |
2 |
Independent Learning |
Non Contact |
Student reads recommended papers and practices implementation. |
Every Week |
3.00 |
3 |
Total Hours |
7.00 |
Total Weekly Learner Workload |
7.00 |
Total Weekly Contact Hours |
4.00 |
Module Resources
|
Recommended Book Resources |
---|
-
Venkat Ankam. (2016), Big Data Analytics, 1st. Venkat Ankam, [ISBN: 9781785884696].
-
Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia. (2015), Learning Spark: Lightning-Fast Big Data Analysis, O'Reilly Media, [ISBN: 9781449358624].
-
Nick Pentreath. (2015), Machine Learning with Spark, 1st. Packt Publishing, [ISBN: 9781783288519].
| Supplementary Book Resources |
---|
-
Mohammed Guller. (2015), Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, 1st. Apres, [ISBN: 9781484209653].
| This module does not have any article/paper resources |
---|
Other Resources |
---|
-
Website, Udacity - Intro to Hadoop and MapReduce,
-
Website, Coursera: Big Data Specialization,
-
Website, Apache Hadoop,
-
Website, Apache Spark,
|
|