Skip to main content

Data Science 2 with Python (303-2-21)

Instructors

Arvind Krishna

Meeting Info

Harris Hall L07: Mon, Wed 5:30PM - 6:50PM

Overview of class

Only Statistics majors, Data Science majors, Data Science minors, and Applied Statistics Masters students assigned to take STAT 303-2 in this quarter are able to register for this course.

This course introduces supervised machine learning in Python, with a focus on linear and logistic regression. It prepares students for learning advanced machine learning methods.

Registration Requirements

Only Statistics majors, Data Science majors, Data Science minors, and Applied Statistics Masters students assigned to take 303-2 in this quarter are able to register for this course.

STAT 303-1 or consent of the instructor.

Learning Objectives

At the end of the course, students should be able to:
1. Translate a problem described in layman terms to a regression problem.
2. Identify the suitability of regression for a given problem.
3. Develop, interpret, and validate regression models.
4. Integrate regression modeling as a component of the larger data science project.
5. Demonstrate proficiency with coding in the Python programming language, in the context of regression.

Teaching Method

Lectures will consist of presentations discussing the theory, and the coding notebooks discussing the implantation of the theory. There may be in-class quizzes in every class.

Evaluation Method

1. Weekly Assignments: Students will have weekly assignments to practice and demonstrate the coding techniques, tools and methods taught during class hours. These assignments will test students on learning objectives 2,3 and 5.
2. Mid-term exam: Students will have a mid-term exam, where they will be provided with a dataset to develop a linear regression model. This assessment will test students on learning objectives 1,2,3 and 5.
3. Final exam: Students will have a final exam, where they will be provided with multiple datasets and a problem to develop a data science solution involving regression. This assessment will test students on the learning objectives 1,2,3, 4 and 5.
4. Prediction problem: Students will be asked to develop a regression model that makes predictions up to a certain level of accuracy.

Class Materials (Required)

An Introduction to Statistical Learning with Applications in R' by James, Witten, Hastie, Tibshirani (2013), with Python codes https://github.com/JWarmenhoven/ISLR-python, ISBN-13: 978-1461471370 (available for free online)

Class Materials (Suggested)

Linear Models with Python by Julian J. Faraway
Python data science handbook by Jake VanderPlas
The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman

Class Notes

https://nustat.github.io/STAT303-2-class-notes/

Class Attributes

Formal Studies Distro Area

Enrollment Requirements

Enrollment Requirements: Prerequisite: STAT 303-1
Add Consent: Department Consent Required