Description
This survey course provides a broad overview of the important concepts, techniques and applications of data science in industry. Participants will get exposure to data structures, visualization techniques, machine learning and selected topics in classical statistics, all of which will form a foundation in analytics well suited for the construction and execution of data science projects.
Prerequisites
- Basic understanding of statistics
- Knowledge of R Programming
- R 3.4.x
- RStudio 1.0.x
Outline
1. Motivation – The course will kick-off with case studies of successful industry projects that leveraged data analytics to improve profits through revenue increases and cost reductions.
2. Data Structures – Here we will present an overview of data types, formats, and structures used in modern analytics applications.
3. Data Visualization – In this section, we’ll build a fundamental understanding of cognitive visual perception and learn how about graphical visualizations of data.
4. Special Topics in Multivariate Statistics – This section will touch on important topics such as hypothesis testing, confidence intervals and also lay the foundation for deeper statistical understanding by explaining the Central Limit Theorem and Law of Large Numbers.
5. Unsupervised Learning – Techniques designed to uncover groupings of similar observations, such as K-means, Self-Organizing Maps and Hierarchical Clustering will be introduced in this section of the course.
6. Supervised Learning – For observations with a well-defined response, we use statistical techniques such as Random Forests, Classification trees and Regression techniques to make predictions. This section will discuss best practices for modeling supervised machine learning tasks.
7. Wrap-up – The course will conclude with a guided question and answer work session where the participants will propose business scenarios and solutions to each other.