Foundation ModelsMay 10, 20255 min read

Sebastian Raschka: Coding LLMs from the Ground Up course

A ~15-hour, seven-video series by Sebastian Raschka covers tokenization, attention, pretraining, finetuning and a 2.5h bonus.

The BrieftideMay 10, 2025

TL;DR

01A ~15-hour, seven-video series by Sebastian Raschka covers tokenization, attention, pretraining, finetuning and a 2.5h bonus.
02Sebastian Raschka published a ~15-hour, seven-video course titled "Coding LLMs from the Ground Up" on May 10, 2025, aimed at teaching how to build large language models by coding them from scratch.
03Raschka notes the videos originally started as supplementary content for his book but also work as standalone learning material.

Sebastian Raschka published a ~15-hour, seven-video course titled "Coding LLMs from the Ground Up" on May 10, 2025, aimed at teaching how to build large language models by coding them from scratch. The series includes seven substantive videos plus a 2.5-hour bonus talk recorded in April, and the materials began life as supplementary content for his Build a Large Language Model (From Scratch) book.

Course breakdown

The course is organized into numbered videos, with explicit runtimes and focused topics:

1 - Set up your code environment (0:21:01)
2 - Working with text data (1:28:01), covering tokenization, byte pair encoding and data loaders
3 - Coding attention mechanisms (2:15:40), explaining self-attention, causal attention and multi-head attention
4 - Set up your code environment (0:21:01) (a second environment video listed in the materials)
5 - Pretraining on Unlabeled Data (2:36:44)
6 - Finetuning for Classification (2:15:29), using a spam classification example as a gentle introduction
7 - Instruction Finetuning (1:46:04)

A bonus non-coding talk, "LLMs Then And Now (From 2018 to 2025)", runs approximately 2.5 hours and was recorded earlier in April, approximately two days after the Llama 4 release. Raschka notes the videos originally started as supplementary content for his book but also work as standalone learning material.

Installation, GPT-2 weights and practical notes

The course includes a short environment setup video that explains using the uv tool and the "uv pip" workflow. Raschka warns that installation may cause issues on certain versions of Windows, likely due to a TensorFlow dependency used to load the original GPT-2 model weights in video 5. He advises Windows users who run into trouble that they can skip the TensorFlow installation by removing the TensorFlow line from the requirements file.

To avoid the TensorFlow dependency entirely, Raschka converted the GPT-2 model weights from TensorFlow format to PyTorch and published them on the Hugging Face model hub at https://huggingface.co/rasbt/gpt2-from-scratch-pytorch. The course also links to a Build an LLM from Scratch GitHub repository and references the Build a Large Language Model (From Scratch) book (Manning | Amazon).

Raschka explains the pedagogy behind the project: building an LLM from scratch is presented as one of the most efficient ways to learn how LLMs work. He uses an extended analogy comparing starters to go-kart building, arguing that starting with simpler, hands-on projects teaches fundamentals such as steering and motor function before moving to more complex systems.

Raschka also discloses a personal note: he is dealing with a bad neck injury and has been unable to work on a computer for the past three weeks, trying conservative treatment before considering a suggested surgical route. He says sharing the recorded videos during recovery felt like a useful interim contribution.

Why it matters

The course foregrounds fundamentals at a time when Raschka identifies reasoning and "agentic" capabilities as among the biggest LLM topics in 2025. By walking learners through tokenization, attention, pretraining and instruction finetuning, the material aims to produce practitioners who understand inner mechanics rather than only applying high-level APIs. That practical grounding, Raschka argues by analogy, can enable participants to give more informed feedback on model design and behavior.

What to watch

Track Raschka's Build an LLM from Scratch GitHub repository and the Hugging Face model page for the converted GPT-2 weights for code and example checkpoints. Also watch for his planned upcoming articles and publications: he says he has "lots of ideas for upcoming articles" and hopes to return to writing as his recovery allows.

Course video sequence and runtimes

01
Set up your code environment
0:21:01 — uv pip environment setup
02
Working with text data
1:28:01 — tokenization, BPE, data loaders
03
Coding attention mechanisms
2:15:40 — self-attention, causal and multi-head attention
04
Set up your code environment (repeat)
0:21:01 — second environment video listed in the materials
05
Pretraining on Unlabeled Data
2:36:44 — pretraining workflow
06
Finetuning for Classification
2:15:29 — spam classification example
07
Instruction Finetuning
1:46:04 — instruction finetuning techniques
08
Bonus: LLMs Then And Now
≈2.5h — non-coding talk recorded earlier in April, ~2 days after the Llama 4 release