# Data Science 3 with R (301-3-21)

## Instructors

Arend Matthew Kuyper

IPR, 2040 Sheridan Road, Evanston

## Meeting Info

Harris Hall L07: Mon, Wed 5:30PM - 6:50PM

## Overview of class

Only Statistics majors, Data Science majors, Data Science minors, and Statistics Masters students assigned to take 301-3 in this quarter are able to register for this course.

As the third course in the Data Science series, our objective is to build and extend upon the foundational analytical skills and knowledge gained in both STAT 301-1 and STAT 301-2. This course will introduce more machine learning models (SVMs, MARs, ensembles, etc), deepen predictive modeling skills (feature engineering, model assessment and evaluation, etc), and survey topics such as unsupervised learning and version control. While many data science problems will require knowledge of multiple coding languages and technologies, this course chooses to utilize R through the IDE RStudio to conduct data science. Students will develop and extend skills, tools, and strategies for modeling and understanding complex data. These skills will be developed through project/lab-based learning (1-2 projects/labs per week). This course dedicates a significant portion of time to a few culminating experiences one of which is a final project.

## Registration Requirements

At least an introductory understanding of statistics is necessary (i.e. STAT 202 or 210). Course requires STAT 301-2 Data Science 2 with R or instructor's permission to enroll - we assume students will be well versed in the skills covered in Data Science 2 with R.

## Learning Objectives

Students will be able to explain/define each of the statistical learning methods introduced throughout the quarter and demonstrate application of the methods on real datasets. Students will be introduced to and use version control through GitHub.

## Teaching Method

A typical class will devote about 10-20 minutes to discussion/lecture with the remaining time devoted to projects/labs where students will either work by themselves or in groups. Students will be expected to adequately prepare for each discussion/lecture by reviewing assigned material (e.g. readings, videos, etc.) because the majority of class time will be spent working on projects/labs - designed around the assigned material. Students will be expected to collaborate and engage with other students to help each other learn and solve problems.

## Evaluation Method

There will be a final project in place of a written exam. We will also evaluate progress throughout the quarter using project/lab-based learning (1-2 projects/labs per week).

## Class Materials (Required)

(1) Free online textbook, Tidy Modeling with R: https://www.tmwr.org/

(2) Free online textbook, Feature Engineering and Selection: A Practical Approach for Predictive Models: http://www.feat.engineering/

(3) Free online textbook, Text Mining with R: https://www.tidytextmining.com/

(4) Free downloadable textbook, An Introduction to Statistical Learning with Application in R, 2nd edition: https://www.statlearning.com/

(5) Free statistical software R (https://cran.rstudio.com/)

(6) Free integrated development environment software RStudio (https://www.rstudio.com/). Think of R as the car engine needed to power and run everything while RStudio is the steering wheel/dashboard that we use to run and control the car.

## Class Materials (Suggested)

(1) Deep Learning with R by François Chollet with J. J. Allaire (ISBN 9781617295546)

## Class Notes

ATTENDANCE AT THE FIRST CLASS IS MANDATORY

## Class Attributes

Formal Studies Distro Area

## Enrollment Requirements

Enrollment Requirements: Students must have completed STAT 301-2 to enroll in this course.

Add Consent: Department Consent Required