diff --git a/README.md b/README.md
index 3e63e51dc..aa08b5b4c 100644
--- a/README.md
+++ b/README.md
@@ -4,6 +4,17 @@
OLMo: Open Language Model
+
+
+
+
+
+
+
+
+
+OLMo is a repository for training and using state-of-the-art open language models.
+It is built by scientists, for scientists.
## Installation
@@ -23,6 +34,16 @@ Otherwise you can install the model code by itself directly from PyPI with:
pip install ai2-olmo
```
+## Models overview
+
+The core models in the OLMo family released so far are (all trained on the [Dolma dataset](https://huggingface.co/datasets/allenai/dolma)):
+| Model | Training Tokens | Context Length |
+|------|--------|---------|
+| [OLMo 1B](https://huggingface.co/allenai/OLMo-1B) | 3 Trillion | 2048 |
+| [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) | 2.5 Trillion | 2048 |
+| [OLMo 7B Twin 2T](https://huggingface.co/allenai/OLMo-7B-Twin-2T) | 2 Trillion | 2048 |
+
+
## Fine-tuning
To fine-tune an OLMo model using our trainer you'll first need to prepare your dataset by tokenizing it and saving the tokens IDs to a flat numpy memory-mapped array. See [`scripts/prepare_tulu_data.py`](./scripts/prepare_tulu_data.py) for an example with the Tulu V2 dataset, which can be easily modified for other datasets.
@@ -46,3 +67,8 @@ torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
```
Note: passing CLI overrides like `--reset_trainer_state` is only necessary if you didn't update those fields in your config.
+
+
+## Evaluation
+
+Additional tools for evaluating OLMo models are available at the [OLMo Eval](https://github.com/allenai/ai2-olmo-eval) repo.
\ No newline at end of file