Foundation Models5 min read

Sebastian Raschka: Coding LLMs from the Ground Up course

A ~15-hour, seven-video series by Sebastian Raschka covers tokenization, attention, pretraining, finetuning and a 2.5h bonus.

The Brieftide

TL;DR

  • 01A ~15-hour, seven-video series by Sebastian Raschka covers tokenization, attention, pretraining, finetuning and a 2.5h bonus.
  • 02Sebastian Raschka published a ~15-hour, seven-video course titled "Coding LLMs from the Ground Up" on May 10, 2025, aimed at teaching how to build large language models by coding them from scratch.
  • 03Raschka notes the videos originally started as supplementary content for his book but also work as standalone learning material.

Sebastian Raschka published a ~15-hour, seven-video course titled "Coding LLMs from the Ground Up" on May 10, 2025, aimed at teaching how to build large language models by coding them from scratch. The series includes seven substantive videos plus a 2.5-hour bonus talk recorded in April, and the materials began life as supplementary content for his Build a Large Language Model (From Scratch) book.

Course breakdown

The course is organized into numbered videos, with explicit runtimes and focused topics:

  • 1 - Set up your code environment (0:21:01)
  • 2 - Working with text data (1:28:01), covering tokenization, byte pair encoding and data loaders
  • 3 - Coding attention mechanisms (2:15:40), explaining self-attention, causal attention and multi-head attention
  • 4 - Set up your code environment (0:21:01) (a second environment video listed in the materials)
  • 5 - Pretraining on Unlabeled Data (2:36:44)
  • 6 - Finetuning for Classification (2:15:29), using a spam classification example as a gentle introduction
  • 7 - Instruction Finetuning (1:46:04)

A bonus non-coding talk, "LLMs Then And Now (From 2018 to 2025)", runs approximately 2.5 hours and was recorded earlier in April, approximately two days after the Llama 4 release. Raschka notes the videos originally started as supplementary content for his book but also work as standalone learning material.

Installation, GPT-2 weights and practical notes

The course includes a short environment setup video that explains using the uv tool and the "uv pip" workflow. Raschka warns that installation may cause issues on certain versions of Windows, likely due to a TensorFlow dependency used to load the original GPT-2 model weights in video 5. He advises Windows users who run into trouble that they can skip the TensorFlow installation by removing the TensorFlow line from the requirements file.

To avoid the TensorFlow dependency entirely, Raschka converted the GPT-2 model weights from TensorFlow format to PyTorch and published them on the Hugging Face model hub at https://huggingface.co/rasbt/gpt2-from-scratch-pytorch. The course also links to a Build an LLM from Scratch GitHub repository and references the Build a Large Language Model (From Scratch) book (Manning | Amazon).

Raschka explains the pedagogy behind the project: building an LLM from scratch is presented as one of the most efficient ways to learn how LLMs work. He uses an extended analogy comparing starters to go-kart building, arguing that starting with simpler, hands-on projects teaches fundamentals such as steering and motor function before moving to more complex systems.

Raschka also discloses a personal note: he is dealing with a bad neck injury and has been unable to work on a computer for the past three weeks, trying conservative treatment before considering a suggested surgical route. He says sharing the recorded videos during recovery felt like a useful interim contribution.

Why it matters

The course foregrounds fundamentals at a time when Raschka identifies reasoning and "agentic" capabilities as among the biggest LLM topics in 2025. By walking learners through tokenization, attention, pretraining and instruction finetuning, the material aims to produce practitioners who understand inner mechanics rather than only applying high-level APIs. That practical grounding, Raschka argues by analogy, can enable participants to give more informed feedback on model design and behavior.

What to watch

Track Raschka's Build an LLM from Scratch GitHub repository and the Hugging Face model page for the converted GPT-2 weights for code and example checkpoints. Also watch for his planned upcoming articles and publications: he says he has "lots of ideas for upcoming articles" and hopes to return to writing as his recovery allows.

Course video sequence and runtimes
  1. 01

    Set up your code environment

    0:21:01 — uv pip environment setup

  2. 02

    Working with text data

    1:28:01 — tokenization, BPE, data loaders

  3. 03

    Coding attention mechanisms

    2:15:40 — self-attention, causal and multi-head attention

  4. 04

    Set up your code environment (repeat)

    0:21:01 — second environment video listed in the materials

  5. 05

    Pretraining on Unlabeled Data

    2:36:44 — pretraining workflow

  6. 06

    Finetuning for Classification

    2:15:29 — spam classification example

  7. 07

    Instruction Finetuning

    1:46:04 — instruction finetuning techniques

  8. 08

    Bonus: LLMs Then And Now

    ≈2.5h — non-coding talk recorded earlier in April, ~2 days after the Llama 4 release

Advertisement

Written by The Brieftide · Source: Ahead of AI

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement