🧠Pro

Neural Networks: Zero to Hero

Build neural networks from scratch, progressing from micrograd to GPT. Based on Andrej Karpathy's legendary course — covers backpropagation, language modeling, transformers, and tokenization.

8 modules 40 lessons ~10h AI voice coach
Start Learning — Pro

7-day free Pro trial included

Course Outline

1

Micrograd: Backpropagation Engine

5 lessons

Build an autograd engine and a small neural network library from scratch. Based on Karpathy's 'The spelled-out intro to neural networks and backpropagation: building micrograd' (https://www.youtube.com/watch?v=VMj-3S1tku0).

What is a Neural Network?
Building a Value Class
Backpropagation from Scratch
Building a Neuron/Layer/MLP
Training Loop
2

Bigram Language Model

5 lessons

Build a character-level language model from counting to neural networks. Based on Karpathy's 'The spelled-out intro to language modeling: building makemore' (https://www.youtube.com/watch?v=PaCmpygFfXo).

Language Modeling Basics
Bigram Model
PyTorch Tensors & Broadcasting
Training a Neural Bigram
Sampling & Evaluation
3

MLP Language Model

5 lessons

Build a multi-layer perceptron language model following Bengio et al. 2003. Based on Karpathy's 'Building makemore Part 2: MLP' (https://www.youtube.com/watch?v=TCH_1BHY58I).

Multi-Layer Perceptrons
Embedding Layers
Training MLPs
Hyperparameter Tuning
Overfitting & Regularization
4

Activations & Batch Normalization

5 lessons

Diagnose and fix training problems with proper initialization and normalization. Based on Karpathy's 'Building makemore Part 3: Activations & Gradients, BatchNorm' (https://www.youtube.com/watch?v=P6sfmUTpUmc).

Activation Statistics
Batch Normalization
Residual Connections
Kaiming Initialization
Diagnostic Tools
5

Manual Backpropagation

5 lessons

Derive and implement gradients by hand for every operation in the MLP. Based on Karpathy's 'Building makemore Part 4: Becoming a Backprop Ninja' (https://www.youtube.com/watch?v=q8SA3rM6ckI).

Why Manual Backprop?
Gradients Through Linear Layers
Gradients Through Batch Norm
Gradients Through Cross-Entropy & Softmax
Putting It All Together
6

WaveNet & Deeper Models

4 lessons

Build a hierarchical language model inspired by DeepMind's WaveNet. Based on Karpathy's 'Building makemore Part 5: Building a WaveNet' (https://www.youtube.com/watch?v=t3YJ5hKiMQ0).

WaveNet Architecture
Building a Tree-Structured Network
PyTorch nn.Module
Performance Analysis
7

Building GPT from Scratch

6 lessons

Implement a transformer-based language model from scratch. Based on Karpathy's 'Let's build GPT: from scratch, in code, spelled out' (https://www.youtube.com/watch?v=kCc8FmEb1nY) and the paper 'Attention is All You Need' (https://arxiv.org/abs/1706.03762).

Attention is All You Need
Self-Attention Mechanism
Multi-Head Attention
Transformer Block
Positional Encoding
Training GPT
8

Tokenization & BPE

5 lessons

Build a tokenizer from scratch using Byte Pair Encoding. Based on Karpathy's 'Let's build the GPT Tokenizer' (https://www.youtube.com/watch?v=zduSFxRajkE).

Why Tokenization Matters
Byte Pair Encoding
Implementing BPE from Scratch
GPT Tokenizer
Tokenizer Design Decisions