Information Management for Data Science (305-0-20)
Instructors
Emre Besler
Meeting Info
Online: Mon, Wed 2:00PM - 4:40PM
Overview of class
The Information Management for Data Science course aims to give students an extensive skillset to upload, clean, process, store and utilize data from various sources. Starting with the main libraries and data structures in Python, it moves on to advanced techniques to obtain data. Namely, it will cover HTML text from web sites using CSS and Xpath techniques, interacting with Application Programming Interfaces (APIs) using Javascript Object Notation (JSON) files and the corresponding libraries. Students are expected to have fundamental Python skills from STAT 303-1 or CS 110. The course then moves on to relational databases and how to store/obtain data from them using Structured Query Language (SQL). Students are not expected to have any prior knowledge on SQL; it will be introduced from scratch and applied during the lectures. After a certain understanding of SQL is established, database design will be the last main topic of the course.
Registration Requirements
STAT 303-1 or CS 110 (or equivalent Python knowledge) If you have introductory Python knowledge but you are not sure if it is sufficient for the course, please email emre.besler@northwestern.edu to check.
Learning Objectives
At the completion of this course, students should be able to:
- Identify data parts that misleading, wrong, irrelevant or redundant according to the task at hand and process the dataset they uploaded accordingly.
- Create new variables from the data they have, in a new data type if necessary.
- Visualize the data in an interactive and visually aesthetic manner.
- Scrape different types of data from online sources and process it for further analysis.
- Handle SQL queries to obtain data that is spread across multiple and relational databases.
- Obtain data from a mobile application or a website and process it for numerical analysis
- Design relational databases according to the needs of the datasets at hand.
Teaching Method
Remote Synchronous - We will meet twice a week (Mon and Wed) on Zoom for 160-min sessions. Each session will be a combination of lectures given by the instructor and in-class coding sessions for the students to work/practice on their own.
Evaluation Method
There will be 5 homework assignments, (9% each) in-class exercises (25%) an in-class midterm exam, (15%) and an in-class final exam. (15%)
Class Materials (Required)
A laptop that is able to run Anaconda Navigator for Python programming and SQLite Studio for databases with Structured Query Language.
Class Attributes
Formal Studies Distro Area
Course Meets Online
Synchronous:Class meets remotely at scheduled time