NanoGPT
nanoGPT is a rewrite of minGPT that prioritizes teeth over education. Still under active development, but currently the file train.py reproduces GPT-2 (124M) on OpenWebText, running on a single 8XA100 40GB node in about 4 days of training https://github.com/karpathy/nanoGPT (built in Python)
The code itself is plain and readable: train.py is a ~300-line boilerplate training loop and model.py a ~300-line GPT model definition, which can optionally load the GPT-2 weights from OpenAI.
install
usage
baselines
finetuning
Finetuning takes very little time, e.g. on a single GPU just a few minutes.
efficiency notes
Edited: | Tweet this! | Search Twitter for discussion
BackLinks: 2017-12-14-FoglemanBuildingASecondBrainInEmacsAndOrgmode | 2017-12-15-FoglemanImplementingASecondBrainInEmacsAndOrgmode | 2018-07-31-NielsenAugmentingLongTermMemory | 2020-01-07-PereiraHighAndLow | 2021-06-18-DorneanuNoteTakingIn2021 | 2021-07-03-AldrichDifferentiatingOnlineVariationsOfTheCommonplaceBook | 2022-04-04-FordIFinallyReachedComputingNirvanaWhatWasItAllFor | 2022-05-05-LogseqRaises41mToAccelerateGrowthOfTheNewWorldKnowledgeGraph | 2022-05-05-MeetLogseqAnOpensourceKnowledgeManagementSystemThatStoresDataLikeABrain | 2023-08-23-TaylorDogDays | 2023-10-11-SchroederWikiCulture | EMacs | HPI | OrgMode | Orgzly | ToDoTxt
TwinPages: DoubleLoop | Flancian | ThoughtStorms