Skip to main content

Data Science 1 with Python (303-1-20)

Instructors

Lizhen Shi

Meeting Info

Tech Institute Lecture Room 4: Mon, Wed 12:30PM - 1:50PM
Location of Midterm TBD: Fri 7:00PM - 9:00PM

Overview of class

Only Statistics majors, Data Science majors, Data Science minors, and Applied Statistics Masters students assigned to take 303-1 in this quarter are able to register for this course.

Python has emerged as a powerful tool for data science in recent years, thanks to its rich ecosystem of libraries and easy-to-understand syntax. In this quarter, we delve deep into the fundamentals of Python for data science, focusing on essential libraries such as NumPy, Matplotlib, Seaborn, and Pandas. Through hands-on exercises and real-world examples, students will gain proficiency in data manipulation, data visualization, and exploratory data analysis (EDA) techniques. Throughout the course, emphasis is placed on exploratory data analysis (EDA) techniques to understand the underlying patterns and relationships within datasets. Students will learn how to use statistical measures and visualization tools to uncover insights, identify outliers, and detect patterns in data.

Students may not receive credit for both this course and STAT 301-1.

Registration Requirements

Prerequisites: STAT 201 or equivalent and STAT 202-0 or STAT 210 or consent of the instructor.

Only Statistics majors, Data Science majors, Data Science minors, and Applied Statistics Masters students assigned to take 303-1 in this quarter are able to register for this course.

Learning Objectives

At the end of the course, students should be able to:
1. Translate a problem described in layman terms to a data science project.
2. Acquire, integrate, and store data from various sources.
3. Manipulate, clean, and transform data to make it suitable for answering the question at hand.
4. Visualize, explore, and analyze data to identify patterns and gather insights.
5. Demonstrate proficiency with coding in the Python programming language, in the context of data science.
6. Collaborate in a team to develop a complete data science solution that answers a question of interest.

Teaching Method

Classes will be a combination of "lectures" + "lab sessions". The course material will be introduced in the "lectures" portion of the classes. Lectures are expected to be interactive. In the lab sessions, students will be given problem(s) to solve. Students are encouraged to ask questions and collaborate during the lab session. Everyone must bring their own laptop in each class, as coding in Python will be required. Python installation on laptop is necessary.

Evaluation Method

Students will be assessed on the learning objectives with:
1. Weekly Assignments: Students will have weekly assignments to practice and demonstrate the coding techniques, tools and methods taught during class hours. These assignments will test students on learning objectives 2,3,4 and 5.
2. Mid-term exam: Students will have a mid-term exam, where they will be provided with a dataset to answer a set of questions. This assessment will test students on learning objectives 1,2,3 and 5.
3. Final exam: Students will have a final exam, where they will be provided with multiple datasets to answer a few broad questions. This assessment will test students on the learning objectives 1,2,3, 4 and 5.
4. Course project: Students will have the freedom to identify a problem of their choice, and leverage data to solve it. This assessment will test students on all the learning objectives.
5. Class participation: Students will have the opportunity to earn bonus class participation points by answering questions in class or on the online class forum.

Class Materials (Required)

Krishna, A., Shi, L., Besler, E., and Kuyper A., 'Introduction to Data Science with Python' (2022), https://nustat.github.io/DataScience_Intro_python/.

McKinney, W. (2017). Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. 2nd Edition. O'Reilly Media, Inc. ISBN-13: 978-1491957660
ISBN-10: 1491957662

Class Materials (Suggested)

Reference book: VanderPlas, J. (2016). Python data science handbook: Essential tools for working with data. " O'Reilly Media, Inc.". ISBN-13: 978-1491912058
ISBN-10: 1491912057

Sample online material for Python and libraries:
Python for Beginners: https:/

Class Attributes

Formal Studies Distro Area

Enrollment Requirements

Enrollment Requirements: Prerequisite: STAT 201-0 or COMP_SCI 110-0 and STAT 202-0 or STAT 210-0 or STAT 232-0 or PSYCH 201-0 or IEMS 201-0 or IEMS 303-0 or equivalent.
Add Consent: Department Consent Required