Skip to content

Releases: allenai/OLMo-core

v1.8.0

30 Jan 01:12
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added support for tensor parallelism. See the TransformerConfig class for usage.
  • Added more downstream tasks from the model ladder.
  • Added io.copy_dir() function.
  • Added new LR schedulers: LinearWithWarmup, InvSqrtWithWarmup, ConstantWithWarmup, SequentialScheduler.
  • Added option to pre-download checkpoint files from remote storage before trying to load a checkpoint.
  • Added a callback for sending Slack notifications.
  • Makes the MPS device work on Apple Silicon
  • Added SkipStepAdamW optimizer.
  • The trainer can load model-only checkpoints now.
  • Added the option to throttle checkpoint uploads to one rank from each node at a time.
  • Added support for logging rich Table objects as text in source mixture datasets.
  • Added unshard_strategy parameter to unshard_checkpoint() function in olmo_core.distributed.checkpoint.
  • Added function load_keys() to olmo_core.distributed.checkpoint.

Changed ⚠️

  • Changed storage of shared shard state in sharded checkpoints from smallest shard to lowest rank (normally 0).
  • Changed how the trainer handles loading a checkpoint when load_path is provided. Now load_path is only used if no checkpoint is found in the save_folder.

Fixed ✅

  • Added missing weights_only=False argument to fix loading train checkpoints with newer versions of PyTorch.
  • Fixed bug where GCS upload does not retry on transient failures.
  • Fixed bug where source mixture datasets were truncating source files instead of randomly sampling.
  • Fixed bug in source mixture datsets where sampling from small npy files raised an mmap exception due to 0 instances in the sampled index.

Commits

7899e7c (chore) prepare for release v1.8.0
907b9c5 Send Slack notification on releases (#151)
1ef7851 fix get_mock_batch() when training on MPS again
29a468d Fix mixture dataset class (#147)
98ccb67 remove ganymede cluster
205fe90 remove deleted cluster
7ec9114 always make mock batch on CPU
7122b1d save max steps to trainer state (#143)
9a78829 Log elapsed time per eval (#149)
075a36a Make training on the MPS device work (#131)
b4a195b Add more options to the unshard_checkpoint function to help scale (#145)
16885ab fix merge list with prefix
7b755c9 minor logging improvement
212108f Add option to throttle checkpoint uploads to one rank from each node at a time (#142)
7633461 pull fixes from 32B branch (#139)
48abe8c checkpoint hot fix (#140)
0c096e2 Handle model-only checkpoints with the trainer
9818232 move release scripts to subfolder (#137)
05ab673 update cluster list (#136)
7ccf726 add pr comments on release
0ff19d7 update citation
7519e0a Change the way load_path is handled (#132)
03a597a limit the number of exception lines posted to Slack
c634066 include link to Beaker job with Slack noties
3505660 Make context manager set original state correctly (#126)
9e0992b Add a callback for sending Slack notifications (#125)
6d60464 fix
ee27348 Sync eval changes in OLMo/ladder-1xC to here (#122)
0789479 Add option to pre-download checkpoint to load (#123)
1380f0e add copy_dir() io function
5cc704f Add learning rate schedulers (#119)
de5be27 don't check for beaker-py upgrades
b0103f0 Fix loading train state for newer versions of torch
5de774f updates
8474ee8 update docker image tags
d3f6f01 Update PyTorch and other deps in Docker images, change naming scheme of images (#120)
10c4978 Publish Docker images to GHCR (#118)
d6981b3 Add support for tensor parallelism and add OLMo2-26B model config / train script (#117)
aa4d188 Update table formatting

v1.7.0

27 Nov 22:31
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added key_mapping argument to olmo_core.distributed.checkpoint.load_model_and_optim_state()
    for loading checkpoints with different key names.
  • Added load_key_mapping field to the trainer, same idea as the new key_mapping argument above.
  • Added an implementation of nGPT called NormalizedTransformer.
  • Added an example showing how to convert a HuggingFace Llama 3.2 checkpoint into the right format for OLMo-core.
  • Added an API for scaling RoPE embeddings.
  • Added a ModelLadder API.

Changed ⚠️

  • The w_out and norm top-level children of the Transformer model are now wrapped together in an lm_head module. Training scripts will have backwards compatibility with older checkpoints due to the load_key_mapping explained above.

Fixed ✅

  • (Optimization) Mark model input sizes as dynamic for torch.compile() to avoid recompile during evals or variable-sequence / batch size training. This doesn't seem to hurt throughput.
  • Made HTTPS and GCS IO functions more robust.
  • Fixed a bug where we were always getting dolma2 tokenized validation data when generating config with DataMix.v3_small_ppl_validation.

Commits

62d2c9e (chore) prepare for release v1.7.0
cb77039 mark model ladder as a beta feature
08c8073 Adapt conversion script to work with OLMo2 models (#116)
8e716b5 Add model ladder building blocks (#114)
1647f78 Add some more tests for nGPT (#113)
37e0e88 improve docs
d68d47a Make nn configs more flexible (#112)
0bcc840 RoPE scaling, document how to convert HuggingFace checkpoints (#111)
7655a3b Add template variable to ppl validation file manifest (#110)
ca44cf4 Implement nGPT (#108)
c47df7c make IO functions more robust (#109)
4f2c8ef Update README.md
57b38ad Mark model input as dynamically sized (#105)
776e235 remove duplicate script

v1.6.3

15 Nov 19:14
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added olmo_core.distributed.checkpoint.get_checkpoint_metadata() function.
  • (BETA) Added flag to compile the optimizer step. So far only tested with AdamW. May not work with other optimizers.

Fixed ✅

  • Old ephemeral checkpoints won't be removed until after the latest ephemeral checkpoint is saved successfully.
  • Made GCS uploads more robust.
  • Fixed single-node training on Google Augusta cluster.
  • numpy.random.dirichlet() does not always sum to 1.0, so allow for a small tolerance in validating domain weights.

Commits

9c52bea (chore) prepare for release v1.6.3
ad5e9e5 Upgrade flash-attn to v2.7.0 (#104)
b9e9193 [beta] Enable compiling optimizer step (tested with AdamW) (#103)
fdbb76e Use allclose for comparing sum of small numbers (#102)
3284742 make GCS uploads more robust (#101)
63b3f43 Update isort requirement from <5.13,>=5.12 to >=5.12,<5.14 (#93)
dcbd988 update docs and theme version
6615ba9 Bump actions/download-artifact from 3 to 4 (#100)
2e2b35b Add function to get checkpoint metadata
c0e47cc clean up Dockerfile (#99)
6300bc7 replace printing table with logging table (#98)
e522886 Don't prematurely delete old ephemeral checkpoints (#97)
dea10fd Bump actions/upload-artifact from 3 to 4 (#90)
c2fe2db skip another test when creds missing
3ea9fa2 Bump softprops/action-gh-release from 1 to 2 (#87)
5a5c17f Bump actions/checkout from 3 to 4 (#91)
9c99b9c skip some tests when missing relevant credentials (#96)
53efa8c Bump actions/setup-python from 4 to 5 (#88)
d548d3b Bump actions/cache from 3 to 4 (#86)
ab80395 add depandabot config

v1.6.2

08 Nov 18:29
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added option to disable GarbageCollectorCallback, not that you'd want to do this usually, but I needed to run an experiment to show how important that callback is.

Fixed ✅

  • Fixed a bug where some default callbacks could be added twice if given a different name by the user.
  • Fixed a bug where some Trainer bookkeeping tasks may not complete before .fit() returns.

Commits

2384472 (chore) prepare for release v1.6.2
f721fa1 Ensure all bookkeeping tasks complete (#85)
26a2c63 Some callback improvements (#84)

v1.6.1

07 Nov 00:28
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added retries field to BeakerLaunchConfig.
  • Allow running on Augusta cluster with existing train scripts.
  • Added olmo_core.utils.logging_configured() function to check if logging has been configured.

Fixed ✅

  • Fixed a potential distributed deadlock bug when training without a separate CPU-only bookkeeping backend.
  • Removed some unnecessary host-device syncs in olmo_core.distributed.utils.
  • Added Trainer(Config).async_bookkeeping field to toggle async bookkeeping.

Commits

cae88f5 (chore) prepare for release v1.6.1
83db5f7 Some fixes/improvements around synchronous bookkeeping operations (#83)
c435c94 increase timeout for CI checks
4a56200 update cluster list (#82)
e27ba74 Update throughput numbers, add logging_configured() util function (#81)
bec0a3c Allow running on Augusta cluster (#80)
c7c3a5a Set env vars for Augusta cluster
b9351e2 Add retries field to BeakerLaunchConfig (#79)

v1.6.0

01 Nov 21:51
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added option to compile the trainer's loss function (Trainer.compile_loss).
  • Added SourceMixtureDataset for composing a training mixture based on ratios of source datasets.
  • Added NumpyFSLDatasetMixture for constructing a NumpyDatasetBase from a SourceMixtureDataset. Note this is only supported for FSL datasets.
  • Added tests for SourceMixture* and NumpyFSLDatasetMixture.
  • Added DownstreamEvaluatorCallbackConfig class for running in-loop downstream eval via OLMo-in-loop-evals.

Changed ⚠️

  • Moved some types into olmo_core.data.types to avoid some circular dependencies.

Fixed ✅

  • Made GCS client more robust by automatically retrying timeout errors for most operations.

Commits

29e1276 (chore) prepare for release v1.6.0
da39e97 Add note about optional dependencies
81b1249 Missed _bust_index_cache in one spot (#78)
00d34f6 Add option to compile loss function, move logits FP32 casting into loss function (#77)
4928f82 Adds mixing loader for FSL datasets (#70)
ecb0686 Allow stopping the experiment on keyboard int
41400c4 Add Llama 8B config (#76)
282c120 Update Docker build (#75)
55d261e Make GCS client more robust (#74)
3fe59b6 Add a callback for downstream evals, update Docker builds (#73)
ecd523e include release chore commit in release notes

v1.5.0

23 Oct 17:36
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added Google Cloud support for list_directory() and clear_directory().
  • Added CometCallback for logging training runs to Comet.ml.
  • Added DataMixBase class, to allow extending to new data mix groups.
  • Added support for MoE-based models.
  • Added method DataLoaderBase.get_mock_batch().
  • Trainer now starts with a dry-run of a fake batch created by DataLoaderBase.get_mock_batch().
  • Added Callback.pre_backward(), .pre_eval_batch(), and .post_eval_batch() methods.
  • Added Trainer.model_forward(), .get_losses(), and .eval_batch() methods.
  • Added a new TransformerActivationCheckpointingMode, "selected_ops" (requires torch 2.5 or newer).

Changed ⚠️

  • BeakerLaunchConfig.setup_steps should now include steps to clone your repo (which it will by default). This change allows support for private repos.

Fixed ✅

  • prepare_cli_environment() now calls add_cached_path_clients().
  • Removed an unnecessary host-device sync.

Commits

984eb26 Update README.md
0f0d282 Update README.md
310866e Add FP8 numbers for the 13B
425f7db Add "selected_ops" transformer AC mode (#71)
d90292e Move transformer config components to its own submodule
4d3b231 Add support for MoE models with megablocks (#60)
6e32043 Add Google Cloud support for more io functions (#69)
5af60ba Avoid an unnecessary host-device sync when created initial loss tensors (#68)
ad4c8bb Switch to comet callback in official train scripts
d90f5da Add comet API key to launch config
0c75ef6 Do a dry-run batch before starting training (#67)
71bc5c8 Add save_state_dict function
9c25aed Update the Comet.ml callback (#66)
6e4ee4e Add BaseDataMix class (#65)
54a74c3 Add a Comet.ml trainer callback (#64)
9ba0e63 Update base image with newer torch and flash-attn versions (#63)
97172fc avoid omegaconf interpolation in setup steps
48892ee include clone commands in setup steps (#62)

v1.4.0

02 Oct 18:22
Compare
Choose a tag to compare

What's new

Changed ⚠️

  • Updated default layer norm epsilon for OLMo models from 1e-5 to 1e-6 to match latest model.
  • Renamed FSLDataLoader to NumpyFSLDataLoader.
  • Renamed VSLDataLoader to NumpyVSLDataLoader.
  • The trainer now takes a data_loader: DataLoaderBase instead of a dataset: NumpyDatasetBase.

Commits

55343dd fix loading training state dict
b921299 Allow unknown number of batches with data loaders
87f1e89 fix restarts for custom data loader
767c550 Add example of custom data loader
6237f7d Trainer now takes a data loader instead of a dataset (#59)
f6fc369 update default LN eps to match latest OLMo model (#58)
db522d1 allow loading via pickling
7d26589 make VSL curr config more flexible

v1.3.2

27 Sep 22:19
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added Config.validate(), Config.replace(), and Config.apply() methods.
  • Trainer now records sequence length as a metric.

Fixed ✅

  • Ensure additional cached-path clients are added in the process pool workers from some dataset preparation methods.
  • Fixed label_mask tensor created by NumpyPaddedFSLDataset.
  • Removed redundant warning messages about CUDA alloc retries.
  • Fixed non-deterministic deadlock bug with async checkpointing.

Commits

a0a680d keep old beaker images instead of deleting them
25a71f4 Minor training improvements (#57)
f32d0bf Improve formatting of throughput table in readme (#56)
f7f3709 Add some more Config methods
916ecf9 Fix label mask tensor created by NumpyPaddedFSLDataset (#55)
b29838a Ensure additional scheme clients are added in worker procs

v1.3.1

26 Sep 18:33
Compare
Choose a tag to compare

What's new

Fixed ✅

  • Fixed the name given to evaluator metrics logged.

Commits

40057ff fix eval metric name