Training Data
Provide a text file or use a default set of George Orwell's books.
Architecture
Select a preset and/or go deeper with architecture settings.
Architecture
Size
Parameters
5.7M
Batch Size
32
Learning Rate
0.001
Tokenizer
Choose how input text is tokenized.
Hyperparameters
Decide on core settings for training.
Understand
Better understand the model architecture, code, and math.
Train
Train your model.
Demo mode: pre-training disabled.
Metrics
Training loss and evaluation checkpoints.
Progress
0.0%
Elapsed
-
ETA
-
Loss
-
Running Loss
-
Grad Norm
-
Inspect Batch
Peek at tokens, next-token predictions, and attention.
Select a sample to inspect tokens and predictions.
Query ↓Key →
Evaluation
Train and validation losses recorded at eval intervals.
Logs
No logs yet.