Skip to main content

Topics in Statistics (359-0-20)

Topic

Large Language Models

Instructors

Lizhen Shi

Meeting Info

Harris Hall L07: Mon, Wed 2:00PM - 3:20PM

Overview of class

Topic: Large Language Models
This course provides a comprehensive introduction to large language models (LLMs) and the foundational transformer architecture that powers them. Students will explore core principles, mathematical foundations, and key innovations behind transformers. The course traces the evolution of word embeddings and NLP models — from word2vec, to RNN-based models, to encoder-decoder architectures with attention, and on to state-of-the-art LLM systems like GPT and beyond. While transformers represent a major breakthrough and the current state of the art, they are not the endpoint of this journey. This course will help students build a solid foundation for understanding the ongoing evolution of LLMs and prepare them to stay current in this rapidly advancing field.

Registration Requirements

For Undergraduate Students:
- Completion of either the Python sequence or the R sequence (STAT 303-1,2,3 or STAT 301-1,2,3)
- Students coming from the R sequence should be comfortable with Python programming, as the course materials and assignments will primarily use Python.

For Graduate Students:
- A solid understanding of traditional machine learning concepts (e.g., regression, classification, model evaluation).
- Proficiency in Python programming for data analysis and model implementation.

Learning Objectives

By the end of this course, students will be able to:
- Explain the evolution of large language models (LLMs) — from early word embeddings and RNN-based architectures to modern transformer-based systems — and their impact on natural language processing tasks.
- Understand the core principles of transformer architecture, including self-attention, positional encoding, multi-head attention, and feedforward components.
- Implement and experiment with transformer-based models using PyTorch for applications such as text generation, classification, and fine-tuning.
- Build a strong conceptual foundation for understanding emerging architectures beyond transformers and staying current with advances in the rapidly evolving field of large language models.

Teaching Method

The primary teaching method will be lectures

Evaluation Method

1) Homework assignments, 2) Final Project, 3) Participation

Class Materials (Required)

Course materials will be distributed via Canvas

Class Attributes

Formal Studies Distro Area