NanoGPT

nanoGPT is a rewrite of minGPT that prioritizes teeth over education. Still under active development, but currently the file train.py reproduces GPT-2 (124M) on OpenWebText, running on a single 8XA100 40GB node in about 4 days of training https://github.com/karpathy/nanoGPT (built in Python)

The code itself is plain and readable: train.py is a ~300-line boilerplate training loop and model.py a ~300-line GPT model definition, which can optionally load the GPT-2 weights from OpenAI.

install

usage

baselines

finetuning

Finetuning takes very little time, e.g. on a single GPU just a few minutes.

efficiency notes


Edited:    |       |    Search Twitter for discussion