Sebastian Raschka: Coding LLMs from the Ground Up course
A ~15-hour, seven-video series by Sebastian Raschka covers tokenization, attention, pretraining, finetuning and a 2.5h bonus.
TL;DR
- 01A ~15-hour, seven-video series by Sebastian Raschka covers tokenization, attention, pretraining, finetuning and a 2.5h bonus.
- 02Sebastian Raschka published a ~15-hour, seven-video course titled "Coding LLMs from the Ground Up" on May 10, 2025, aimed at teaching how to build large language models by coding them from scratch.
- 03Raschka notes the videos originally started as supplementary content for his book but also work as standalone learning material.
Sebastian Raschka published a ~15-hour, seven-video course titled "Coding LLMs from the Ground Up" on May 10, 2025, aimed at teaching how to build large language models by coding them from scratch. The series includes seven substantive videos plus a 2.5-hour bonus talk recorded in April, and the materials began life as supplementary content for his Build a Large Language Model (From Scratch) book.
Course breakdown
The course is organized into numbered videos, with explicit runtimes and focused topics:
- 1 - Set up your code environment (0:21:01)
- 2 - Working with text data (1:28:01), covering tokenization, byte pair encoding and data loaders
- 3 - Coding attention mechanisms (2:15:40), explaining self-attention, causal attention and multi-head attention
- 4 - Set up your code environment (0:21:01) (a second environment video listed in the materials)
- 5 - Pretraining on Unlabeled Data (2:36:44)
- 6 - Finetuning for Classification (2:15:29), using a spam classification example as a gentle introduction
- 7 - Instruction Finetuning (1:46:04)
A bonus non-coding talk, "LLMs Then And Now (From 2018 to 2025)", runs approximately 2.5 hours and was recorded earlier in April, approximately two days after the Llama 4 release. Raschka notes the videos originally started as supplementary content for his book but also work as standalone learning material.
Installation, GPT-2 weights and practical notes
The course includes a short environment setup video that explains using the uv tool and the "uv pip" workflow. Raschka warns that installation may cause issues on certain versions of Windows, likely due to a TensorFlow dependency used to load the original GPT-2 model weights in video 5. He advises Windows users who run into trouble that they can skip the TensorFlow installation by removing the TensorFlow line from the requirements file.
To avoid the TensorFlow dependency entirely, Raschka converted the GPT-2 model weights from TensorFlow format to PyTorch and published them on the Hugging Face model hub at https://huggingface.co/rasbt/gpt2-from-scratch-pytorch. The course also links to a Build an LLM from Scratch GitHub repository and references the Build a Large Language Model (From Scratch) book (Manning | Amazon).
Raschka explains the pedagogy behind the project: building an LLM from scratch is presented as one of the most efficient ways to learn how LLMs work. He uses an extended analogy comparing starters to go-kart building, arguing that starting with simpler, hands-on projects teaches fundamentals such as steering and motor function before moving to more complex systems.
Raschka also discloses a personal note: he is dealing with a bad neck injury and has been unable to work on a computer for the past three weeks, trying conservative treatment before considering a suggested surgical route. He says sharing the recorded videos during recovery felt like a useful interim contribution.
Why it matters
The course foregrounds fundamentals at a time when Raschka identifies reasoning and "agentic" capabilities as among the biggest LLM topics in 2025. By walking learners through tokenization, attention, pretraining and instruction finetuning, the material aims to produce practitioners who understand inner mechanics rather than only applying high-level APIs. That practical grounding, Raschka argues by analogy, can enable participants to give more informed feedback on model design and behavior.
What to watch
Track Raschka's Build an LLM from Scratch GitHub repository and the Hugging Face model page for the converted GPT-2 weights for code and example checkpoints. Also watch for his planned upcoming articles and publications: he says he has "lots of ideas for upcoming articles" and hopes to return to writing as his recovery allows.
Set up your code environment
0:21:01 — uv pip environment setup
Working with text data
1:28:01 — tokenization, BPE, data loaders
Coding attention mechanisms
2:15:40 — self-attention, causal and multi-head attention
Set up your code environment (repeat)
0:21:01 — second environment video listed in the materials
Pretraining on Unlabeled Data
2:36:44 — pretraining workflow
Finetuning for Classification
2:15:29 — spam classification example
Instruction Finetuning
1:46:04 — instruction finetuning techniques
Bonus: LLMs Then And Now
≈2.5h — non-coding talk recorded earlier in April, ~2 days after the Llama 4 release
Written by The Brieftide · Source: Ahead of AI
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Foundation ModelsStrands Agents open-source SDK lets you run any model anywhere
Strands Agents offers an open-source SDK that powers Amazon's AI agents, with multi-cloud model support, guardrails.
llm 0.32a3 release: Atom feed update, Simon Willison
Alpha release llm 0.32a3 from Simon Willison improves Atom feed handling and adds bug fixes and documentation clarifications.
Hugging Face: Five labs compose multi-agent small LLM finance demo
Five independent labs combined compact LLM agents into a finance simulation showcased on Hugging Face.
OpenAI Frontier Governance Framework: Licensing, Audits, Safety
Sets safety testing, staged access licensing, independent audits and regulatory alignment for high-capability models.