Description
Unsupervised learning involves the use of data to group similar observations into cohesive, separable clusters without a reference label to validate the results. Given this constraint, clustering requires a unique combination of objective science and an artful approach to both intelligently construct groupings and assess the quality of the solution within the proper context. This course will focus on developing intuition and establishing best practices for deriving, analyzing and presenting actionable clustering solutions. With a focus on practical application, the course will include both a case study of unsupervised learning applied to price segmentation and a discussion of special topics on the current state of the art.
Prerequisites
- Basic understanding of statistics
- Knowledge of R Programming
- R 3.4.x
- RStudio 1.0.x
Outline
1. Motivation – The course will kick-off with an explanation of machine learning and the unique context that defines unsupervised learning tasks.
2. Preliminaries – This section will tackle preliminary concepts about data, their representations and the usage of distance calculations to measure object similarity.
3. Clustering Techniques – Techniques such as K-means, Self Organizing Maps and several other modern clustering algorithms will be surveyed in this section of the course.
4. Exploratory Methods and Validation – Here we’ll focus on summary statistics and multivariate visualization techniques best suited for the analysis of cluster solutions.
5. Applications and Best Practices – This section will focus on teaching the high level workflow required to complete a clustering project by walking through a sample case study.
6. Special Topics – Here we’ll conclude the course with a discussion recent developments in unsupervised machine learning.