Data Science 1 with Python (303-1-21)
Instructors
Shreeya Behera
Meeting Info
Annenberg Hall G21: Mon, Wed 5:30PM - 6:50PM
Overview of class
Only Statistics majors, Data Science majors, Data Science minors, and Applied Statistics Masters students assigned to take 303-1 in this quarter are able to register for this course.
Python has emerged as a powerful tool for data science in recent years, thanks to its rich ecosystem of libraries and easy-to-understand syntax. This quarter, we delve deep into the fundamentals of Python for data science, focusing on essential libraries such as NumPy, Matplotlib, Seaborn, and Pandas. Through hands-on exercises and real-world examples, students will gain proficiency in data manipulation, data visualization, and exploratory data analysis (EDA) techniques. Throughout the course, emphasis is placed on exploratory data analysis (EDA) techniques to understand the underlying patterns and relationships within datasets. Students will learn how to use statistical measures and visualization tools to uncover insights, identify outliers, and detect patterns in data.
Students may not receive credit for both this course and STAT 301-1.
Registration Requirements
Prerequisites: STAT 201 or equivalent and STAT 202-0 or STAT 210 or consent of the instructor.
Only Statistics majors, Data Science majors, Data Science minors, and Applied Statistics Masters students assigned to take 303-1 in this quarter are able to register for this course.
Learning Objectives
At the end of the course, students should be able to:
1. Translate a problem described in layman terms to a data science project.
2. Acquire, integrate, and store data from various sources.
3. Manipulate, clean, and transform data to make it suitable for answering the question at hand.
4. Visualize, explore, and analyze data to identify patterns and gather insights.
5. Demonstrate proficiency with coding in the Python programming language, in the context of data science.
Teaching Method
There will be two 80-minute lectures per week. The lectures will mostly include in-class coding with explanatory notes as comments on the script; along with some diagrams to visualize some coding concepts better. The last 10 minutes of each lecture will be an in-class quiz. Students are required to bring their own laptop in each class, as coding in Python will be required. Installation of a Python environment is also required.
Evaluation Method
Students will be evaluated through (1) homework assignments, (2) in-class quizzes, (3) three exams and (4) a project.
The first and second exams will be in person, on paper and closed-notes. The third exam will be take-home, online, and open-notes. Only for the third exam, students will be allowed to use online resources, including generative AI. The exam dates will be posted on the Canvas page. Seven homework assignments will be given throughout the quarter. The lowest homework score will be dropped. There will be an in-class quiz for every lecture day. Top 10 quiz scores will be taken. More information about the project deadlines and deliverables will be given throughout the quarter.
Class Materials (Required)
Krishna, A., Shi, L., Besler, E., and Kuyper A., 'Introduction to Data Science with Python' (2022), https://nustat.github.io/DataScience_Intro_python/.
McKinney, W. (2017). Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. 2nd Edition. O'Reilly Media, Inc. ISBN-13: 978-1491957660
ISBN-10: 1491957662
Enrollment Requirements
Enrollment Requirements: REASON: Pre-registration is not allowed for this class. Please try again during regular registration.
Prerequisite: STAT 201-0 or GEN_ENG 150-0 or GEN_ENG 151-0 and STAT 202-0 or STAT 210-0 or STAT 232-0 or PSYCH 201-0 or IEMS 201-0 or IEMS 303-0
Add Consent: Department Consent Required