An Introduction to Statistical Learning with Applications in Python

Transform your data science career by mastering statistical learning, the definitive skill set for the modern data professional.

(STATS-PYTHON.AU1)
Lessons
Lab
AI Tutor (Add-on)
Get A Free Trial

About This Course

Are you ready to move beyond basic data manipulation and truly leverage machine learning in Python to revolutionize your decision-making process? The role of the data analyst is undergoing a fundamental shift, requiring specialized knowledge in how to strategically model complex systems. This ISLP course moves you past simple summary statistics and dives deep into the art and science of supervised learning and high-dimensional data analysis.

You will master the foundational mathematical frameworks, learn professional cross-validation techniques for model selection, and explore unsupervised learning to uncover hidden patterns in unlabeled data. Whether you are aiming for precise predictions using Linear Regression, building robust classifiers with support vector machines, or exploring the frontier of deep learning, this program provides the practical, hands-on knowledge to design and launch advanced models. From the bias-variance trade-off to modern resampling methods, you will learn to build systems that are both accurate and interpretable.

Skills You’ll Get

  • Foundations & Linear Models: Master the core of statistical learning, building from basic matrix algebra to multiple linear regression and logistic regression for powerful predictive modeling.
  • Resampling & Regularization: Tackle model accuracy through cross-validation and the bootstrap, while optimizing high-dimensional performance using ridge and lasso resampling methods.
  • Tree-Based & Support Vector Machines: Move beyond simple linearity with Decision Trees, Random Forests, and Support Vector Machines to handle complex, non-linear datasets with precision.
  • Deep & Unsupervised Learning: Explore the power of Neural Networks alongside Unsupervised Learning techniques like Clustering and PCA to find insights in data without predefined labels.

1

Preface

2

Introduction

  • An Overview of Statistical Learning
  • A Brief History of Statistical Learning
  • This Course
  • Who Should Read This Course?
  • Notation and Simple Matrix Algebra
  • Organization of This Course
  • Data Sets Used in Labs and Exercises
3

Statistical Learning

  • What is Statistical Learning?
  • Assessing Model Accuracy
  • Lab: Introduction to Python
  • Exercises
4

Linear Regression

  • Simple Linear Regression
  • Multiple Linear Regression
  • Other Considerations in the Regression Model
  • The Marketing Plan
  • Comparison of Linear Regression with K-Nearest Neighbors
  • Lab: Linear Regression
  • Exercises
5

Classification

  • An Overview of Classification
  • Why Not Linear Regression?
  • Logistic Regression
  • Generative Models for Classification
  • A Comparison of Classification Methods
  • Generalized Linear Models
  • Lab: Logistic Regression, LDA, QDA, and KNN
  • Exercises
6

Resampling Methods

  • Cross-Validation
  • The Bootstrap
  • Lab: Cross-Validation and the Bootstrap
  • Exercises
7

Linear Model Selection and Regularization

  • Subnet Selection
  • Shrinkage Methods
  • Dimension Reduction Methods
  • Considerations in High Dimensions
  • Lab: Linear Models and Regularization Methods
  • Exercises
8

Moving Beyond Linearity

  • Polynomial Regression
  • Step Functions
  • Basis Functions
  • Regression Splines
  • Smoothing Splines
  • Local Regression
  • Generalized Additive Models
  • Lab: Non-Linear Modeling
  • Exercises
9

Tree-Based Methods

  • The Basics of Decision Trees
  • Bagging, Random Forests, Boosting, and Bayesian Additive Regression Trees
  • Lab: Tree-Based Methods
  • Exercises
10

Support Vector Machines

  • Maximal Margin Classifier
  • Support Vector Classifiers
  • Support Vector Machines
  • SVMs with More than Two Classes
  • Relationship to Logistic Regression
  • Lab: Support Vector Machines
  • Exercises
11

Deep Learning

  • Single Layer Neural Networks
  • Multilayer Neural Networks
  • Convolutional Neural Networks
  • Document Classification
  • Recurrent Neural Networks
  • When to Use Deep Learning
  • Fitting a Neural Network
  • Interpolation and Double Descent
  • Lab: Deep Learning
  • Exercises
12

Survival Analysis and Censored Data

  • Survival and Censoring Times
  • A Closer Look at Censoring
  • The Kaplan-Meier Survival Curve
  • The Log-Rank Test
  • Regression Models With a Survival Response
  • Shrinkage for the Cox Model
  • Additional Topics
  • Lab: Survival Analysis
  • Exercises
13

Unsupervised Learning

  • The Challenge of Unsupervised Learning
  • Principal Components Analysis
  • Missing Values and Matrix Completion
  • Clustering Methods
  • Lab: Unsupervised Learning
  • Exercises
14

Multiple Testing

  • A Quick Review of Hypothesis Testing
  • The Challenge of Multiple Testing
  • The Family-Wise Error Rate
  • The False Discovery Rate
  • A Re-Sampling Approach to p-Values and False Discovery Rates
  • Lab: Multiple Testing
  • Exercises

1

Introduction

  • Analyzing the Wage Dataset
  • Analyzing Stock Market Trends Using the Smarket Dataset
2

Statistical Learning

  • Implementing the Bayes Classifier
  • Implementing the Bias-Variance Trade-Off
  • Indexing the Data
3

Linear Regression

  • Implementing Qualitative Predictors Using the Credit Dataset
  • Implementing Non-Linear Transformations of Predictors
  • Performing Multiple Linear Regression
  • Implementing Simple Linear Regression
4

Classification

  • Implementing Multiple Logistic Regression
  • Implementing Multinomial Logistic Regression
  • Generating and Visualizing a Multivariate Gaussian Distribution
  • Implementing GLM
  • Implementing Poisson Regression
  • Implementing KNN on the Caravan Dataset
  • Implementing Naive Bayes Classification
  • Implementing QDA
  • Implementing LDA
5

Resampling Methods

  • Implementing LOOCV
  • Implementing Bootstrapping Techniques on the Portfolio Dataset
  • Implementing K-Fold Cross-Validation
  • Implementing the Validation Set Approach
6

Linear Model Selection and Regularization

  • Implementing Forward and Backward Stepwise Selection
  • Improving Predictions with PCR
  • Implementing PLS
  • Implementing Lasso Regression
  • Implementing Ridge Regression
  • Implementing Subset Selection Methods Using the Hitters Dataset
7

Moving Beyond Linearity

  • Implementing Splines
  • Implementing a Step Function
  • Improving GAM
  • Implementing Polynomial Regression
8

Tree-Based Methods

  • Building and Analyzing a Classification Tree Using the Carseats Dataset
  • Improving Model Performance Using Boosting
  • Implementing Bagging and Random Forests
  • Fitting Regression Trees
9

Support Vector Machines

  • Implementing the Maximal Margin Classifier
  • Creating and Analyzing an ROC Curve
  • Implementing SVM with Multiple Classes
  • Implementing SVC
10

Deep Learning

  • Implementing RNN for Time Series Prediction
  • Creating an Image Classifier Using CNNs
11

Survival Analysis and Censored Data

  • Implementing the Kaplan-Meier Survival Curve
  • Applying the Log-Rank Test
  • Incorporating Shrinkage Techniques into the Cox Model
12

Unsupervised Learning

  • Implementing a Dendrogram
  • Analyzing the NCI60 Dataset
  • Implementing K-Means Clustering
13

Multiple Testing

  • Implementing Holm's Step-Down Procedure
  • Implementing the BH Procedure
  • Implementing FDR
  • Implementing FWER

Any questions?
Check out the FAQs

Still have unanswered questions and need to get in touch?

Contact Us Now

This program is ideal for data scientists, statisticians, and software developers who want to master statistical learning using Python. It is perfect for those transitioning from basic analytics to advanced predictive modeling.

 Yes! Beyond classical models, the course features dedicated modules on deep learning (CNNs and RNNs) and Unsupervised Learning techniques like Clustering and matrix completion.

We go deep into the mechanics of Support Vector Machines, covering everything from Maximal Margin Classifiers to kernels and ROC curves, ensuring you can handle even the most complex classification boundaries.

It is a balanced approach. While we cover the mathematical notation, the core of the course is heavily focused on practice, featuring extensive labs on Linear Regression, Resampling Methods, and validation strategies like Cross-Validation.

Related Courses

All Courses
scroll to top