Description
R’s power as a programming language is derived from its ability to seamlessly work with data. Despite this ease, base R still has some syntactical issues that make common data analysis tasks, like joining and summarizing, more difficult than it could be. In this course, we will cover the package dplyr written by Hadley Wickham, which provides a data analysis workflow that speeds up programmer efficiency and integrates well with other tools. We will also cover some packages such as tidyr, which fit into the same framework and provide tools for reshaping and restructuring datasets as desired.
Prerequisites
- Basic understanding of spreadsheets
- Knowledge of R Programming
- R 3.4.x
- RStudio 1.0.x
Outline
1. Introduction to dplyr – We’ll start by introducing dplyr, a package for data manipulation, by using practical examples which highlight both its power and its flexibility
2. The dplyr Verbs – dplyr maintains a set of verbs, each with independent functionality which can be chained together. We will cover each of these verbs.
3. Restructuring Data with tidyr – While dplyr handles datasets that are already in a long-form structure, sometimes we need to do some cleaning to begin working with it. We will introduce tidyr as a tool for this task.
4. The Theory of Tidy Data – Understanding tidyr is one piece of the puzzle. We also need to understand when and how data needs to be restructured. We will cover the theory of this topic.
5. Statistical Models in a dplyr Framework – The broom package allows us to chain together dplyr and tidyr commands and ultimately retrieve results from a statistical model in a clean, easy-to-use format. We will discuss this package.
6. Special Topics – Here we’ll discuss recent developments in data manipulation