Module Details
Module Code: |
STAT8010 |
Title: |
Intro to R for Data Science
|
Long Title:
|
Intro to R for Data Science
|
NFQ Level: |
Advanced |
Valid From: |
Semester 1 - 2018/19 ( September 2018 ) |
Field of Study: |
4620 - Statistics
|
Module Description: |
In this module, students will learn how to clean, manipulate and visualise data using the statistical software package R. Students will create and analyse statistical models and simulations with R.
|
Learning Outcomes |
On successful completion of this module the learner will be able to: |
# |
Learning Outcome Description |
LO1 |
Evaluate the functionality of the R statistical programming language. |
LO2 |
Perform data cleaning, manipulation and wrangling techniques to specified data problems. |
LO3 |
Implement appropriate data visualisation techniques to examine real world datasets. |
LO4 |
Investigate statistical modelling and simulation techniques. |
LO5 |
Develop best practice in terms of reproducible documentation and version control. |
Dependencies |
Module Recommendations
This is prior learning (or a practical skill) that is strongly recommended before enrolment in this module. You may enrol in this module if you have not acquired the recommended learning but you will have considerable difficulty in passing (i.e. achieving the learning outcomes of) the module. While the prior learning is expressed as named MTU module(s) it also allows for learning (in another module or modules) which is equivalent to the learning specified in the named module(s).
|
|
Incompatible Modules
These are modules which have learning outcomes that are too similar to the learning outcomes of this module. You may not earn additional credit for the same learning and therefore you may not enrol in this module if you have successfully completed any modules in the incompatible list.
|
No incompatible modules listed |
Co-requisite Modules
|
No Co-requisite modules listed |
Requirements
This is prior learning (or a practical skill) that is mandatory before enrolment in this module is allowed. You may not enrol on this module if you have not acquired the learning specified in this section.
|
No requirements listed |
Indicative Content |
Base R
Learn how to navigate RStudio or similar IDE; how to load/save a file, load a package, access help etc. Examine the base R objects - vectors, matrices, arrays, lists, factors and tables; their respective characteristics, naming conventions and structures. Understand subsetting, filtering and creation of these objects. Examine the implementation of control structures (loops and functions) in R. Investigate how R can be used for mathematical and statistical calculations.
|
Data Cleaning and Manipulation in R
Understand the tidyverse suite of packages and how they can be used for data wrangling and data manipulation. Learn how to use regular expressions and pattern recognition in R for data cleaning purposes.
|
Visualisation
Learn how basic plots are generated in R - histograms, X-Y plots. Understand the ggplot2 package for advanced plotting. Examine RShiny for the creation of web-based dashboards and interactive plots.
|
Statistical Testing
Understand how R can be used for sampling and simulation techniques such as bootstrapping, Monte Carlo method, simulating sample distributions, checking hypothesis testing. Investigate how R can be used in statistical modelling techniques (e.g. naive Bayes classifers).
|
Reproducible Documentation and Version Control
Learn how R and R Markdown can be used to produce documents for reproducible research and results. Implement version control through the integration of Git in R.
|
Module Content & Assessment
|
Assessment Breakdown | % |
Coursework | 100.00% |
Assessments
No End of Module Formal Examination |
Reassessment Requirement |
Coursework Only
This module is reassessed solely on the basis of re-submitted coursework. There is no repeat written examination.
|
The University reserves the right to alter the nature and timings of assessment
Module Workload
Workload: Full Time |
Workload Type |
Contact Type |
Workload Description |
Frequency |
Average Weekly Learner Workload |
Hours |
Lecture |
Contact |
Module Content delivery |
Every Week |
1.00 |
1 |
Lab |
Contact |
Programming laboratory |
Every Week |
3.00 |
3 |
Independent & Directed Learning (Non-contact) |
Non Contact |
Study, practice and completion of worksheets |
Every Week |
3.00 |
3 |
Total Hours |
7.00 |
Total Weekly Learner Workload |
7.00 |
Total Weekly Contact Hours |
4.00 |
Workload: Part Time |
Workload Type |
Contact Type |
Workload Description |
Frequency |
Average Weekly Learner Workload |
Hours |
Lecture |
Contact |
Module Content delivery |
Every Week |
1.00 |
1 |
Lab |
Contact |
Programming laboratory |
Every Week |
2.00 |
2 |
Independent & Directed Learning (Non-contact) |
Non Contact |
Study, practice and completion of worksheets |
Every Week |
4.00 |
4 |
Total Hours |
7.00 |
Total Weekly Learner Workload |
7.00 |
Total Weekly Contact Hours |
3.00 |
Module Resources
|
Recommended Book Resources |
---|
-
Garrett Grolemund and Hadley Wickham. (2017), R for Data Science, O'Reilly Media, http://r4ds.had.co.nz/, [ISBN: 9781491910399].
-
Kabacoff, Robert. (2015), R in Action, 2nd. Manning, New York, [ISBN: 1617291382].
-
Norman Matloff. (2011), The Art of R Programming, No Starch Press, San Francisco, [ISBN: 9781593273842].
| This module does not have any article/paper resources |
---|
Other Resources |
---|
-
Website, R Statistical Programming,
-
Website, RStudio IDE,
-
Website, Tidyverse - R packages for Data Science,
-
Website, Ggplot2 gallery,
-
Website, R Markdown,
|
|