🧠Pro

Neural Networks: Zero to Hero

Build neural networks from scratch, progressing from micrograd to GPT. Based on Andrej Karpathy's legendary course — covers backpropagation, language modeling, transformers, and tokenization.

8 modules 40 lessons ~10h AI voice coach

Start Learning — Pro

7-day free Pro trial included

Course Outline

Micrograd: Backpropagation Engine

5 lessons

Build an autograd engine and a small neural network library from scratch. Based on Karpathy's 'The spelled-out intro to neural networks and backpropagation: building micrograd' (https://www.youtube.com/watch?v=VMj-3S1tku0).

What is a Neural Network?

Building a Value Class

Backpropagation from Scratch

Building a Neuron/Layer/MLP

Training Loop

Bigram Language Model

5 lessons

Build a character-level language model from counting to neural networks. Based on Karpathy's 'The spelled-out intro to language modeling: building makemore' (https://www.youtube.com/watch?v=PaCmpygFfXo).

Language Modeling Basics

Bigram Model

PyTorch Tensors & Broadcasting

Training a Neural Bigram

Sampling & Evaluation

MLP Language Model

5 lessons

Build a multi-layer perceptron language model following Bengio et al. 2003. Based on Karpathy's 'Building makemore Part 2: MLP' (https://www.youtube.com/watch?v=TCH_1BHY58I).

Multi-Layer Perceptrons

Embedding Layers

Training MLPs

Hyperparameter Tuning

Overfitting & Regularization

Activations & Batch Normalization

5 lessons

Diagnose and fix training problems with proper initialization and normalization. Based on Karpathy's 'Building makemore Part 3: Activations & Gradients, BatchNorm' (https://www.youtube.com/watch?v=P6sfmUTpUmc).

Activation Statistics

Batch Normalization

Residual Connections

Kaiming Initialization

Diagnostic Tools

Manual Backpropagation

5 lessons

Derive and implement gradients by hand for every operation in the MLP. Based on Karpathy's 'Building makemore Part 4: Becoming a Backprop Ninja' (https://www.youtube.com/watch?v=q8SA3rM6ckI).

Why Manual Backprop?

Gradients Through Linear Layers

Gradients Through Batch Norm

Gradients Through Cross-Entropy & Softmax

Putting It All Together

WaveNet & Deeper Models

4 lessons

Build a hierarchical language model inspired by DeepMind's WaveNet. Based on Karpathy's 'Building makemore Part 5: Building a WaveNet' (https://www.youtube.com/watch?v=t3YJ5hKiMQ0).

WaveNet Architecture

Building a Tree-Structured Network

PyTorch nn.Module

Performance Analysis

Building GPT from Scratch

6 lessons

Implement a transformer-based language model from scratch. Based on Karpathy's 'Let's build GPT: from scratch, in code, spelled out' (https://www.youtube.com/watch?v=kCc8FmEb1nY) and the paper 'Attention is All You Need' (https://arxiv.org/abs/1706.03762).

Attention is All You Need

Self-Attention Mechanism

Multi-Head Attention

Transformer Block

Positional Encoding

Training GPT

Tokenization & BPE

5 lessons

Build a tokenizer from scratch using Byte Pair Encoding. Based on Karpathy's 'Let's build the GPT Tokenizer' (https://www.youtube.com/watch?v=zduSFxRajkE).

Why Tokenization Matters

Byte Pair Encoding

Implementing BPE from Scratch

GPT Tokenizer

Tokenizer Design Decisions