Neural Networks: Zero to Hero — Andrej Karpathy's from-scratch course on LLMs

TL;DR

Andrej Karpathy published a step-by-step course that builds neural networks from first principles, with an emphasis on language models. The syllabus walks through backpropagation, multilayer perceptrons, diagnostics and training stability, up to building a GPT and its tokenizer, with code-focused lectures and recommended prerequisites.

What happened

Andrej Karpathy released a multipart, code-first course called "Neural Networks: Zero to Hero" that teaches neural network construction and training from scratch. The series begins with an extended, hands-on explanation of backpropagation (micrograd) and progresses through character-level language models (makemore), multilayer perceptrons, diagnostics of activations and gradients, and Batch Normalization. Later modules include manual backpropagation through a two-layer MLP, creating a deeper tree-like network akin to WaveNet, and a detailed, spelled-out implementation of a Generative Pretrained Transformer (GPT). The course closes so far with a dedicated lecture on building the GPT tokenizer from first principles, including Byte Pair Encoding and the encode/decode pipeline, and notes that tokenization explains many odd behaviors of LLMs. Prerequisites listed are solid Python programming and introductory calculus, and a Discord channel is offered for group learning. The series is marked as ongoing.

Why it matters

Offers a hands-on, code-first path to learning deep learning fundamentals rather than purely theoretical exposition.
Focus on language models provides transferable skills that the author says apply to other domains like computer vision.
Detailed coverage of internals (activations, gradients, BatchNorm, manual backprop) aims to build debugging and innovation ability.
A dedicated lecture on tokenizers highlights a frequently overlooked component that can cause many model-level issues.

Key facts

Course author: Andrej Karpathy (confirmed in the source).
Prerequisites: solid programming (Python) and intro-level math (derivatives, Gaussian-level familiarity).
Micrograd / backpropagation lecture: 2h25m, step-by-step building of backprop training.
Intro to language modeling (makemore) lecture: 1h57m; implements a bigram character-level model and introduces torch.Tensor.
MLP lecture (makemore Part 2): 1h15m; covers training basics, hyperparameters, and evaluation splits.
Activations & gradients / BatchNorm lecture: 1h55m; analyzes forward/backward statistics and introduces Batch Normalization.
Becoming a Backprop Ninja (manual backprop through MLP) lecture: 1h55m.
WaveNet-style deeper architecture lecture: 56m; constructs a tree-like CNN similar to WaveNet.
Let's build GPT: from scratch lecture: 1h56m; follows Transformer/GPT papers and connects to ChatGPT and Copilot.
Let's build the GPT Tokenizer lecture: 2h13m; builds a BPE-based tokenizer and examines tokenization-related issues.

What to watch next

Watch the earlier makemore videos before the GPT lecture — the course itself recommends this sequence.
View the BatchNorm and activations lecture to learn diagnostics and why training deep nets can be fragile.
Study the tokenizer lecture to understand how encoding/decoding and Byte Pair Encoding affect LLM behavior.
Release timing and future lecture dates (e.g., when residual connections or Adam optimizer coverage will appear) are not confirmed in the source.

Quick glossary

Backpropagation: An algorithm for computing gradients of a loss with respect to neural network parameters by propagating error signals backward through the network.
Transformer / GPT: A neural network architecture based on attention mechanisms; GPT refers to autoregressive transformer models trained to predict the next token.
Tokenizer: A component that converts text strings into discrete tokens and back; tokenizers are trained separately (often with Byte Pair Encoding) and implement encode/decode functions.
Batch Normalization: A technique that normalizes layer inputs across a minibatch to stabilize and accelerate training of deep networks.
Multilayer Perceptron (MLP): A feedforward neural network composed of multiple fully connected layers, often used as a basic building block in deep learning.

Reader FAQ

Who created the course?
Andrej Karpathy (confirmed in the source).

What prior knowledge do I need?
Solid Python programming skills and intro-level math such as derivatives; the source lists these prerequisites.

Does the course include code and hands-on builds?
Yes — the syllabus repeatedly emphasizes building networks and components from scratch in code.

Are topics like residual connections and the Adam optimizer covered?
The source says residual connections and the Adam optimizer remain notable todos for later videos.

Is there a schedule for future videos?
Not confirmed in the source.

Neural Networks: Zero to Hero A course by Andrej Karpathy on building neural networks, from scratch, in code. We start with the basics of backpropagation and build up to modern…

Neural Networks: Zero to Hero — Andrej Karpathy’s from-scratch course on LLMs

By

TL;DR

What happened

Why it matters

Key facts

What to watch next

Quick glossary

Reader FAQ

Sources

Related posts

By

Related Post

The waning era of scale-only AI: why scaling’s grip is weakening

McKinsey and General Catalyst: the ‘learn once, work forever’ era is over

Lenovo unveils Qira, a cross-device AI assistant for laptops and phones

Leave a Reply Cancel reply

You missed

SMTP Tunnel: A SOCKS5 proxy that masks TCP as SMTP to bypass DPI

Recreated: Steve Jobs’s 1975 Atari horoscope program — now runnable

Google to publish AOSP source twice yearly, a setback for custom ROMs

Transform your phone into a true productivity workhorse with a USB-C hub