Skip to main content

Data Science 2 with R (301-2-20)

Instructors

Arend Matthew Kuyper
IPR, 2040 Sheridan Road, Evanston

Meeting Info

Swift Hall 107: Mon, Wed 12:30PM - 1:50PM

Overview of class

Only Statistics majors, Data Science, majors, Data Science minors, and Statistics Masters students assigned to take STAT 301-2 in this quarter are able to register for this course.

As the second course in the STAT 301 Data Science with R series, our objective is to build upon and extend the foundational analytical skills and knowledge gained in STAT 301-1. While many data science problems can be solved using knowledge from multiple coding languages and technologies, this course chooses to utilize R through the IDE RStudio to conduct data science. Students will be introduced to machine learning and will develop skills, tools, and strategies for modeling and understanding complex data. Methods to be addressed include linear regression and classification, tree-based models, model selection, and model assessment, resampling methods, regularization methods. These skills will be developed through project/lab based learning (1-2 projects/labs per week). The skills developed in this course are necessary for STAT 301-3 Data Science 3 with R.

Registration Requirements

Course requires STAT 301-1 Data Science 1 with R or instructor's permission to enroll - we assume students will be well versed in the skills covered in STAT 301-1 Data Science 1 with R. That is, students should be comfortable using R and RStudio to manage, manipulate, and visualize data. Students should be prepared to spend approximately 5 to 9 hours outside of class working per week on this course - dependent on knowledge of R.

Learning Objectives

Students will be able to (1) explain/define each of the statistical learning methods introduced throughout the quarter; (2) demonstrate application of statistical learning methods on real datasets; (3) explain/define model building, selection, and assessment techniques; and (4) demonstrate the application of model building, selection, and assessment techniques to real datasets.

Teaching Method

A typical class will devote about 10-20 minutes to discussion/lecture with the remaining time devoted to projects/labs where students will either work by themselves or in groups. Students will be expected to adequately prepare for each discussion/lecture by reviewing assigned material (e.g. readings, videos, etc…) because the majority of class time will be spent working on projects/labs - designed around the assigned material. Students will be expected to collaborate and engage with other students to help each other learn and solve problems

Evaluation Method

There will be a final project in place of a written exam. We will also evaluate progress throughout the quarter using project/lab-based learning (1-2 projects/labs per week) and other miscellaneous assessment strategies (for example: short discussions, surveys, and proficiency quizzes

Class Materials (Required)

(1) Laptop for in class projects/labs — contact department if access to a laptop is an issue.
(2) Free online textbook, Tidy Modeling with R: https://www.tmwr.org/
(3) Free downloadable pdf, An Introduction to Statistical Learning with Application in R 2nd Edition: https://www.statlearning.com/
(4) Free statistical software R (https://cran.rstudio.com/)
(5) Free integrated development environment software RStudio (https://www.rstudio.com/). Think of R as the car engine needed to power and run everything while RStudio is the steering wheel/dashboard that we use to run and control the car.

Class Notes

ATTENDANCE AT THE FIRST CLASS IS MANDATORY

Class Attributes

Formal Studies Distro Area

Enrollment Requirements

Enrollment Requirements: Prerequisite: STAT 301-1
Add Consent: Department Consent Required