diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index adac82794..1093adb3d 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -127,6 +127,7 @@ jobs: spec: | version: v2 description: GPU Tests + budget: ai2/oe-training tasks: - name: tests image: diff --git a/CHANGELOG.md b/CHANGELOG.md index 7369aa7e2..fae7e99eb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## Unreleased +### Fixed + +- Fixed default value of `--tokenizer` argument to `scripts/prepare_tulu_data.py` to be an absolute path, not relative path, the script can be run from other directories. +- Added the option to directly pass input embeddings to `OLMo` and `OLMoForCausalLM`. +- Added support for Python 3.8. +- Added code to throw an error if `output_attentions` is set to `True` in forward call to `OLMoForCausalLM`. This functionality hasn't been implemented yet. + +### Added +- Added `output_hidden_states` argument and associated functionality to `OLMo` and `OLMoForCausalLM` to return model intermediate hidden states. + ## [v0.2.4](https://github.com/allenai/OLMo/releases/tag/v0.2.4) - 2024-02-02 ### Fixed diff --git a/README.md b/README.md index 0d672aa9e..75d6947cf 100644 --- a/README.md +++ b/README.md @@ -41,41 +41,17 @@ pip install ai2-olmo ## Models overview The core models in the OLMo family released so far are (all trained on the [Dolma dataset](https://huggingface.co/datasets/allenai/dolma)): -| Model | Training Tokens | Context Length | -|------|--------|---------| -| [OLMo 1B](https://huggingface.co/allenai/OLMo-1B) | 3 Trillion | 2048 | -| [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) | 2.5 Trillion | 2048 | -| [OLMo 7B Twin 2T](https://huggingface.co/allenai/OLMo-7B-Twin-2T) | 2 Trillion | 2048 | - - -## Fine-tuning - -To fine-tune an OLMo model using our trainer you'll first need to prepare your dataset by tokenizing it and saving the tokens IDs to a flat numpy memory-mapped array. See [`scripts/prepare_tulu_data.py`](./scripts/prepare_tulu_data.py) for an example with the Tulu V2 dataset, which can be easily modified for other datasets. - -Next, prepare your training config. There are many examples in the [`configs/`](./configs) directory that you can use as a starting point. The most important thing is to make sure the model parameters (the `model` field in the config) match up with the checkpoint you're starting from. To be safe you can always start from the config that comes with the model checkpoint. At a minimum you'll need to make the following changes to the config or provide the corresponding overrides from the command line: - -- Update `load_path` to point to the checkpoint you want to start from. -- Set `reset_trainer_state` to `true`. -- Update `data.paths` to point to the `token_ids.npy` file you generated. -- Optionally update `data.label_mask_paths` to point to the `label_mask.npy` file you generated, unless you don't need special masking for the loss. -- Update `evaluators` to add/remove in-loop evaluations. - -Once you're satisfied with your training config, you can launch the training job via `torchrun`. For example: - -``` -torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \ - --data.paths=[{path_to_data}/input_ids.npy] \ - --data.label_mask_paths=[{path_to_data}/label_mask.npy] \ - --load_path={path_to_checkpoint} \ - --reset_trainer_state -``` - -Note: passing CLI overrides like `--reset_trainer_state` is only necessary if you didn't update those fields in your config. +| Model | Training Tokens | Context Length | Training Config | W&B Logs | Data Order File(s) ☨ | +|-------|-----------------|:--------------:|-----------------|----------|--------------------| +| [OLMo 1B](https://huggingface.co/allenai/OLMo-1B) | 3 Trillion | 2048 | [configs/official/OLMo-1B.yaml](https://github.com/allenai/OLMo/blob/main/configs/official/OLMo-1B.yaml) | [wandb.ai/…/OLMo-1B](https://wandb.ai/ai2-llm/OLMo-1B/reports/OLMo-1B--Vmlldzo2NzY1Njk1) | [epoch 1](https://olmo-checkpoints.org/ai2-llm/olmo-small/46zc5fly/train_data/global_indices.npy) | +| [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) | 2.5 Trillion | 2048 | [configs/official/OLMo-7B.yaml](https://github.com/allenai/OLMo/blob/main/configs/official/OLMo-7B.yaml) | [wandb.ai/…/OLMo-7B](https://wandb.ai/ai2-llm/OLMo-7B/reports/OLMo-7B--Vmlldzo2NzQyMzk5) | [epoch 1](https://olmo-checkpoints.org/ai2-llm/olmo-medium/wvc30anm/train_data/global_indices.npy), [epoch 2](https://olmo-checkpoints.org/ai2-llm/olmo-medium/wd2gxrza/train_data/global_indices.npy) | +| [OLMo 7B Twin 2T](https://huggingface.co/allenai/OLMo-7B-Twin-2T) | 2 Trillion | 2048 | [configs/official/OLMo-7B.yaml](https://github.com/allenai/OLMo/blob/main/configs/official/OLMo-7B.yaml) | [wandb.ai/…/OLMo-7B-Twin-2T](https://wandb.ai/ai2-llm/OLMo-7B/reports/OLMo-7B-Twin-2T--Vmlldzo2NzU0NTIz) | [epoch 1](https://olmo-checkpoints.org/ai2-llm/olmo-medium/wvc30anm/train_data/global_indices.npy) | +> ☨ *See [Inspecting training data](#inspecting-training-data) below for usage.* ## Inference -You can utilize our HuggingFace integration to run inference on the olmo checkpoints: +You can utilize our Hugging Face integration to run inference on the olmo checkpoints: ```python from hf_olmo import * # registers the Auto* classes @@ -91,7 +67,7 @@ response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, print(tokenizer.batch_decode(response, skip_special_tokens=True)[0]) ``` -Alternatively, with the huggingface pipeline abstraction: +Alternatively, with the Hugging Face pipeline abstraction: ```python from transformers import pipeline @@ -99,10 +75,9 @@ olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B") print(olmo_pipe("Language modeling is")) ``` - ### Inference on finetuned checkpoints -If you finetune the model using the code above, you can use the conversion script to convert a native OLMo checkpoint to a HuggingFace-compatible checkpoint +If you finetune the model using the code above, you can use the conversion script to convert a native OLMo checkpoint to a Hugging Face-compatible checkpoint ```bash python hf_olmo/convert_olmo_to_hf.py --checkpoint-dir /path/to/checkpoint @@ -116,7 +91,111 @@ olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B", torch_dtype=torch The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as inputs.input_ids.to('cuda') to avoid potential issues. +## Reproducibility + +### Training + +The configs used to train the official OLMo models are provided in the [`configs/official/`](https://github.com/allenai/OLMo/blob/main/configs/official) directory. + +Note that while the training and validation data is public and free to download, the paths to the data within those configs are pointed at a CloudFlare R2 bucket, which requires an API key for programmatic access. +So in order to use any of these configs to reproduce a training run you'll first have to download the corresponding data to a location of your choosing and then update the paths in the config accordingly. + +You can derive the public HTTP URL from an R2 URL by replacing `r2://olmo-data` with `https://olmo-data.org`. +For example, if the R2 data URL is: + +`r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-000-00000.npy` + +then the corresponding public URL is: + +`https://olmo-data.org/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-000-00000.npy` + +Once you've updated the data paths in the config you can launch a training run via `torchrun`. For example, to launch the 1B model training on a single 8x GPU node, you would run: + +```bash +torchrun --nproc_per_node=8 scripts/train.py configs/official/OLMo-1B.yaml +``` + +You can use the same method to launch multi-node jobs as well. See [the documentation](https://pytorch.org/docs/stable/elastic/run.html) for `torchrun` to understand the additional arguments you'll need to configure the rendezvous backend / endpoint. + +### Inspecting training data + +You may be interesting in inspecting the exact tokens that composed a particular batch during the training of one of the OLMo models. +We provide tools to do this, but first you'll need to download the data as above (unless you have an R2 API key) and update the corresponding config accordingly. + +Then take note of the URL of the data order file you want, which can be found in the [Models Overview](#models-overview) table. For example, the data order file for the first epoch of the OLMo-7B model is [https://olmo-checkpoints.org/ai2-llm/olmo-medium/wvc30anm/train_data/global_indices.npy](https://olmo-checkpoints.org/ai2-llm/olmo-small/46zc5fly/train_data/global_indices.npy). + +Once you have that you can use this snippet to inspect the data within a particular batch: + +```python +import numpy as np +from cached_path import cached_path + +from olmo.config import TrainConfig +from olmo.data import build_memmap_dataset + +# Update these paths to what you want: +data_order_file_path = cached_path("https://olmo-checkpoints.org/ai2-llm/olmo-medium/wvc30anm/train_data/global_indices.npy") +train_config_path = "configs/official/OLMo-7B.yaml" + + +cfg = TrainConfig.load(train_config_path) +dataset = build_memmap_dataset(cfg, cfg.data) +batch_size = cfg.global_train_batch_size +global_indices = np.memmap(data_order_file_path, mode="r+", dtype=np.uint32) + + +def get_batch_instances(batch_idx: int) -> list[list[int]]: + batch_start = batch_idx * batch_size + batch_end = (batch_idx + 1) * batch_size + batch_indices = global_indices[batch_start:batch_end] + batch_instances = [] + for index in batch_indices: + token_ids = dataset[index]["input_ids"].tolist() + batch_instances.append(token_ids) + return batch_instances + + +# Get all 2048 x 2048 token IDs in the first batch. +get_batch_instances(0) +``` + + +## Fine-tuning + +To fine-tune an OLMo model using our trainer you'll first need to prepare your dataset by tokenizing it and saving the tokens IDs to a flat numpy memory-mapped array. See [`scripts/prepare_tulu_data.py`](./scripts/prepare_tulu_data.py) for an example with the Tulu V2 dataset, which can be easily modified for other datasets. + +Next, prepare your training config. There are many examples in the [`configs/`](https://github.com/allenai/OLMo/blob/main/configs) directory that you can use as a starting point. The most important thing is to make sure the model parameters (the `model` field in the config) match up with the checkpoint you're starting from. To be safe you can always start from the config that comes with the model checkpoint. At a minimum you'll need to make the following changes to the config or provide the corresponding overrides from the command line: + +- Update `load_path` to point to the checkpoint you want to start from. +- Set `reset_trainer_state` to `true`. +- Update `data.paths` to point to the `token_ids.npy` file you generated. +- Optionally update `data.label_mask_paths` to point to the `label_mask.npy` file you generated, unless you don't need special masking for the loss. +- Update `evaluators` to add/remove in-loop evaluations. + +Once you're satisfied with your training config, you can launch the training job via `torchrun`. For example: + +``` +torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \ + --data.paths=[{path_to_data}/input_ids.npy] \ + --data.label_mask_paths=[{path_to_data}/label_mask.npy] \ + --load_path={path_to_checkpoint} \ + --reset_trainer_state +``` + +Note: passing CLI overrides like `--reset_trainer_state` is only necessary if you didn't update those fields in your config. ## Evaluation Additional tools for evaluating OLMo models are available at the [OLMo Eval](https://github.com/allenai/ai2-olmo-eval) repo. + +## Citing + +```bibtex +@article{OLMo, + title={OLMo: Accelerating the Science of Language Models}, + author={Dirk Groeneveld and Iz Beltagy and Pete Walsh and Akshita Bhagia and Rodney Kinney and Oyvind Tafjord and A. Jha and Hamish Ivison and Ian Magnusson and Yizhong Wang and Shane Arora and David Atkinson and Russell Authur and Khyathi Raghavi Chandu and Arman Cohan and Jennifer Dumas and Yanai Elazar and Yuling Gu and Jack Hessel and Tushar Khot and William Merrill and Jacob Daniel Morrison and Niklas Muennighoff and Aakanksha Naik and Crystal Nam and Matthew E. Peters and Valentina Pyatkin and Abhilasha Ravichander and Dustin Schwenk and Saurabh Shah and Will Smith and Emma Strubell and Nishant Subramani and Mitchell Wortsman and Pradeep Dasigi and Nathan Lambert and Kyle Richardson and Luke Zettlemoyer and Jesse Dodge and Kyle Lo and Luca Soldaini and Noah A. Smith and Hanna Hajishirzi}, + year={2024}, + url={https://api.semanticscholar.org/CorpusID:267365485}, + journal={arXiv preprint}, +} +``` diff --git a/configs/official/OLMo-1B.yaml b/configs/official/OLMo-1B.yaml new file mode 100644 index 000000000..6f0d9c95a --- /dev/null +++ b/configs/official/OLMo-1B.yaml @@ -0,0 +1,446 @@ +run_name: OLMo-1B +seed: 6198 +dry_run: false + +wandb: + name: ${run_name} + project: olmo-small + +model: + d_model: 2048 + n_heads: 16 + n_layers: 16 + mlp_ratio: 8 + weight_tying: true + alibi: false + rope: true + flash_attention: false # not available on AMD + attention_dropout: 0.0 + attention_layer_norm: false + multi_query_attention: false + include_bias: false + block_type: sequential + layer_norm_type: default + layer_norm_with_affine: false + bias_for_layer_norm: false + attention_layer_norm_with_affine: false + activation_type: swiglu + residual_dropout: 0.0 + embedding_dropout: 0.0 + max_sequence_length: 2048 + vocab_size: 50280 + embedding_size: 50304 + eos_token_id: 50279 + pad_token_id: 1 + init_device: meta + init_fn: mitchell + +compile: null # causes instability on AMD GPUs + +optimizer: + name: adamw + learning_rate: 4.0e-4 + weight_decay: 0.1 + betas: + - 0.9 + - 0.95 + metrics_log_interval: 10 + +scheduler: + name: cosine_with_warmup + t_warmup: 2000 + alpha_f: 0.1 + +tokenizer: + identifier: tokenizers/allenai_eleuther-ai-gpt-neox-20b-pii-special.json + truncate_direction: right + +save_folder: ${path.choose:${oc.env:SCRATCH_DIR,no_exist}/checkpoints,/results}/${oc.env:SLURM_JOB_ID,${run_name}} +save_overwrite: false +# Sharded checkpoints (best for restarts) +save_interval: 1000 +save_num_checkpoints_to_keep: 9 +# Unsharded checkpoints (for final storage) +save_interval_unsharded: 10000 +save_num_unsharded_checkpoints_to_keep: -1 + +load_path: null + +max_duration: 739_328 # 3.1T tokens +global_train_batch_size: 2048 +device_train_microbatch_size: 8 + +precision: amp_bf16 + +fsdp: + wrapping_strategy: null + precision: mixed + +max_grad_norm: 1.0 +max_grad_norm_ratio: null + +speed_monitor: + window_size: 20 + +eval_interval: ${save_interval} +eval_subset_num_batches: -1 +device_eval_batch_size: ${device_train_microbatch_size} +evaluators: + # lump all the small datasets together (we still get separate metrics). + - label: v3-small-ppl-validation + data: + num_workers: 0 + drop_last: true + datasets: + v3-small-c4_en-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/c4_en/val/part-0-00000.npy + v3-small-dolma_books-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/dolma_books/val/part-0-00000.npy + v3-small-dolma_common-crawl-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/dolma_common-crawl/val/part-0-00000.npy + v3-small-dolma_pes2o-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/dolma_pes2o/val/part-0-00000.npy + v3-small-dolma_reddit-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/dolma_reddit/val/part-0-00000.npy + v3-small-dolma_stack-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/dolma_stack/val/part-0-00000.npy + v3-small-dolma_wiki-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/dolma_wiki/val/part-0-00000.npy + v3-small-ice-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/ice/val/part-0-00000.npy + v3-small-m2d2_s2orc-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/m2d2_s2orc/val/part-0-00000.npy + v3-small-pile-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/pile/val/part-0-00000.npy + v3-small-wikitext_103-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/wikitext_103/val/part-0-00000.npy + + - label: v2-small-ppl-validation + data: + num_workers: 0 + drop_last: true + datasets: + v2-small-4chan-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/4chan/val.npy + v2-small-c4_100_domains-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/c4_100_domains/val.npy + v2-small-c4_en-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/c4_en/val.npy + v2-small-gab-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/gab/val.npy + v2-small-ice-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/ice/val.npy + v2-small-m2d2_s2orc-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/m2d2_s2orc/val.npy + v2-small-m2d2_wiki-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/m2d2_wiki/val.npy + v2-small-manosphere-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/manosphere/val.npy + v2-small-mc4_en-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/mc4_en/val.npy + v2-small-pile-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/pile/val.npy + v2-small-ptb-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/ptb/val.npy + v2-small-twitterAEE-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/twitterAEE/val.npy + v2-small-wikitext_103-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/wikitext_103/val.npy + + - label: piqa + type: downstream + + - label: hellaswag + type: downstream + + - label: winogrande + type: downstream + + - label: openbook_qa + type: downstream + + # - label: boolq # requires implemention of the pmi_dc matrix + # type: downstream + + - label: sciq + type: downstream + + - label: arc_easy + type: downstream + + # - label: arc_challenge # requires implemention of the pmi_dc matrix + # type: downstream + + - label: copa + type: downstream + + - label: rte + type: downstream + + - label: commitment_bank + type: downstream + + - label: mrpc + type: downstream + + - label: sst2 + type: downstream + +data: + pad_direction: right + num_workers: 0 + drop_last: true + pin_memory: true + prefetch_factor: 16 + persistent_workers: true + timeout: 0 + paths: + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-000-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-000-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-001-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-002-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-003-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-004-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-004-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-005-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-005-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-006-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-006-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-007-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-008-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-008-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-009-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-009-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-010-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-010-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-011-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-012-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-013-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-014-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-015-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-016-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-017-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-017-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-018-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-018-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-019-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-020-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-020-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-021-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-022-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-023-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-024-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-025-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-025-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-026-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-026-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-027-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-027-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-028-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-029-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-030-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-031-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-032-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-033-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-033-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-034-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-034-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-035-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-035-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-036-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-036-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-037-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-038-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-039-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-039-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-040-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-041-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-042-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-043-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-044-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-045-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-045-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-046-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-047-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-047-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-048-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-049-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-050-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-051-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-052-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-053-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-054-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-055-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-056-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-057-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-058-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-059-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-060-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-061-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-062-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-063-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-064-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-064-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-065-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-065-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-066-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-066-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-067-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-067-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-068-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-068-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-069-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-069-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-070-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-071-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-072-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-073-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-074-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-074-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-075-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-075-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-076-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-076-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-077-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-078-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-078-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-079-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-079-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-080-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-081-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-082-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-083-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-083-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-084-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-085-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-086-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-087-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-088-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-088-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-089-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-089-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-090-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-090-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-091-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-092-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-093-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-094-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-095-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-096-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-096-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-097-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-098-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-099-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-100-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-101-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-102-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-102-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-103-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-104-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-105-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-105-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-106-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-107-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-108-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-109-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-110-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-111-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-112-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-112-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-113-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-114-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-115-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-116-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-117-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-118-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-118-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-119-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-120-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-120-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-121-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-122-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-123-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-124-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-125-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-126-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-126-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-127-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-128-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-129-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-130-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-131-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-132-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-133-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-134-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-135-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-136-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-137-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-138-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-139-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-139-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-140-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-141-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-142-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-143-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-143-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-144-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-145-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-145-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-146-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-147-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-147-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-148-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-149-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-149-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-150-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-151-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-151-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-152-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-152-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-153-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-153-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-154-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-155-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-156-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-156-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-157-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-158-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-158-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-159-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-160-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-160-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-161-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-161-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-162-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-163-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-163-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-164-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-165-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-165-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-166-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-166-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-167-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-167-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-168-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-169-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-170-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-171-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-172-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-173-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-174-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-174-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-175-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-176-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-177-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-178-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-179-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-179-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-180-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-181-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-182-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-183-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-184-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-185-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-185-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-186-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-187-00000.npy diff --git a/configs/official/OLMo-7B.yaml b/configs/official/OLMo-7B.yaml new file mode 100644 index 000000000..1f6aec03d --- /dev/null +++ b/configs/official/OLMo-7B.yaml @@ -0,0 +1,648 @@ +run_name: OLMo-7B +seed: 6198 +dry_run: false + +wandb: + name: ${run_name} + project: olmo-medium + group: OLMo-7B + +model: + d_model: 4096 + n_heads: 32 + n_layers: 32 + mlp_hidden_size: 22016 + weight_tying: false + alibi: false + rope: true + flash_attention: true + attention_dropout: 0.0 + attention_layer_norm: false + multi_query_attention: false + include_bias: false + block_type: sequential + layer_norm_type: default + layer_norm_with_affine: false + bias_for_layer_norm: false + attention_layer_norm_with_affine: false + activation_type: swiglu + residual_dropout: 0.0 + embedding_dropout: 0.0 + max_sequence_length: 2048 + vocab_size: 50280 + embedding_size: 50304 + eos_token_id: 50279 + pad_token_id: 1 + init_device: meta + init_fn: mitchell + +compile: + fullgraph: false + +optimizer: + name: adamw + learning_rate: 3.0e-4 + weight_decay: 0.1 + betas: + - 0.9 + - 0.95 + metrics_log_interval: 10 + +scheduler: + name: linear_with_warmup + t_warmup: 5000 + alpha_f: 0.1 + grad_clip_warmup_steps: 1000 + grad_clip_warmup_factor: 10.0 + +tokenizer: + identifier: tokenizers/allenai_eleuther-ai-gpt-neox-20b-pii-special.json + truncate_direction: right + +save_folder: runs/${run_name} +remote_save_folder: null +save_overwrite: true +# Sharded checkpoints (best for restarts) +save_interval: 1000 +save_num_checkpoints_to_keep: -1 +# Unsharded checkpoints (for final storage) +save_interval_unsharded: null +save_num_unsharded_checkpoints_to_keep: -1 + +load_path: null + +max_duration: 2e12T # 2T tokens +global_train_batch_size: 2048 +device_train_microbatch_size: 2 +time_limit: null + +precision: amp_bf16 + +fsdp: + wrapping_strategy: by_block + precision: mixed + +max_grad_norm: 1.0 +max_grad_norm_ratio: null + +speed_monitor: + window_size: 20 + +eval_interval: ${save_interval} +eval_subset_num_batches: -1 +device_eval_batch_size: ${device_train_microbatch_size} +evaluators: + - label: v3-small-ppl-validation + data: + num_workers: 0 + drop_last: true + datasets: + v3-small-c4_en-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/c4_en/val/part-0-00000.npy + v3-small-dolma_books-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/dolma_books/val/part-0-00000.npy + v3-small-dolma_common-crawl-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/dolma_common-crawl/val/part-0-00000.npy + v3-small-dolma_pes2o-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/dolma_pes2o/val/part-0-00000.npy + v3-small-dolma_reddit-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/dolma_reddit/val/part-0-00000.npy + v3-small-dolma_stack-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/dolma_stack/val/part-0-00000.npy + v3-small-dolma_wiki-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/dolma_wiki/val/part-0-00000.npy + v3-small-ice-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/ice/val/part-0-00000.npy + v3-small-m2d2_s2orc-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/m2d2_s2orc/val/part-0-00000.npy + v3-small-pile-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/pile/val/part-0-00000.npy + v3-small-wikitext_103-validation: + - r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/wikitext_103/val/part-0-00000.npy + + - label: v2-small-ppl-validation + data: + num_workers: 0 + drop_last: true + datasets: + v2-small-4chan-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/4chan/val.npy + v2-small-c4_100_domains-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/c4_100_domains/val.npy + v2-small-c4_en-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/c4_en/val.npy + v2-small-gab-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/gab/val.npy + v2-small-ice-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/ice/val.npy + v2-small-m2d2_s2orc-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/m2d2_s2orc/val.npy + v2-small-m2d2_wiki-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/m2d2_wiki/val.npy + v2-small-manosphere-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/manosphere/val.npy + v2-small-mc4_en-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/mc4_en/val.npy + v2-small-pile-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/pile/val.npy + v2-small-ptb-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/ptb/val.npy + v2-small-twitterAEE-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/twitterAEE/val.npy + v2-small-wikitext_103-validation: + - r2://olmo-data/eval-data/perplexity/v2_small_gptneox20b/wikitext_103/val.npy + + ########################## + # Downstream evaluations # + ########################## + - label: piqa + type: downstream + + - label: hellaswag + type: downstream + + - label: winogrande + type: downstream + + - label: openbook_qa + type: downstream + + # - label: boolq # requires implemention of the pmi_dc matrix + # type: downstream + + - label: sciq + type: downstream + + - label: arc_easy + type: downstream + + # - label: arc_challenge # requires implemention of the pmi_dc matrix + # type: downstream + + - label: copa + type: downstream + + - label: rte + type: downstream + + - label: commitment_bank + type: downstream + + - label: mrpc + type: downstream + + - label: sst2 + type: downstream + +data: + pad_direction: right + num_workers: 16 + drop_last: true + pin_memory: true + prefetch_factor: 1 + persistent_workers: true + timeout: 0 + paths: + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-000-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-000-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-001-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-001-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-002-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-002-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-003-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-003-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-004-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-004-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-005-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-005-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-006-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-006-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-006-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-007-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-007-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-008-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-008-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-008-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-009-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-009-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-010-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-010-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-010-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-011-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-011-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-012-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-012-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-013-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-013-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-013-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-014-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-014-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-014-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-015-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-015-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-016-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-016-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-017-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-017-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-018-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-018-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-019-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-019-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-020-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-020-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-021-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-021-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-022-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-022-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-023-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-023-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-024-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-024-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-025-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-025-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-025-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-026-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-026-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-027-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-027-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-027-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-028-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-028-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-028-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-029-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-029-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-030-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-030-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-031-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-031-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-032-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-032-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-033-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-033-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-033-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-034-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-034-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-034-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-035-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-035-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-036-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-036-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-037-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-037-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-038-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-038-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-039-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-039-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-040-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-040-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-041-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-041-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-042-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-042-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-042-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-043-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-043-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-043-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-044-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-044-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-044-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-045-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-045-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-046-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-046-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-046-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-046-00003.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-047-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-047-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-048-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-048-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-049-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-049-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-050-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-050-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-051-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-051-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-052-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-052-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-052-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-053-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-053-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-053-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-054-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-054-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-055-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-055-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-055-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-056-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-056-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-056-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-057-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-057-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-057-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-058-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-058-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-059-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-059-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-060-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-060-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-061-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-061-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-062-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-062-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-062-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-063-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-063-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-063-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-064-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-064-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-064-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-065-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-065-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-065-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-066-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-066-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-067-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-067-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-068-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-068-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-069-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-069-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-070-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-070-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-071-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-071-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-072-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-072-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-073-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-073-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-074-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-074-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-075-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-075-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-076-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-076-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-077-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-077-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-078-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-078-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-079-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-079-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-080-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-080-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-081-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-081-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-082-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-082-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-083-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-083-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-084-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-084-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-085-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-085-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-086-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-086-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-087-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-087-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-088-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-088-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-089-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-089-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-089-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-090-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-090-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-091-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-091-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-091-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-092-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-092-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-093-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-093-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-093-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-094-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-094-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-094-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-095-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-095-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-096-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-096-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-097-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-097-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-097-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-098-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-098-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-099-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-099-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-100-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-100-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-100-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-101-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-101-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-102-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-102-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-103-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-103-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-104-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-104-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-105-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-105-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-106-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-106-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-106-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-107-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-107-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-108-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-108-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-109-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-109-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-109-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-110-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-110-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-110-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-111-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-111-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-112-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-112-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-113-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-113-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-114-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-114-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-114-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-115-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-115-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-116-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-116-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-117-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-117-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-118-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-118-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-119-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-119-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-120-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-120-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-120-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-121-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-121-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-122-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-122-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-122-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-123-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-123-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-123-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-124-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-124-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-125-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-125-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-126-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-126-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-127-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-127-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-127-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-128-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-128-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-129-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-129-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-129-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-130-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-130-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-131-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-131-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-132-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-132-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-133-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-133-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-133-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-134-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-134-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-134-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-135-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-135-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-135-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-136-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-136-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-137-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-137-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-137-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-138-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-138-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-139-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-139-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-140-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-140-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-141-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-141-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-141-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-142-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-142-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-142-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-143-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-143-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-144-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-144-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-144-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-145-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-145-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-145-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-146-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-146-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-146-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-147-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-147-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-147-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-148-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-148-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-149-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-149-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-149-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-150-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-150-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-150-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-150-00003.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-151-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-151-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-152-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-152-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-153-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-153-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-154-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-154-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-155-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-155-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-155-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-156-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-156-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-157-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-157-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-157-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-158-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-158-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-159-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-159-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-160-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-160-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-161-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-161-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-161-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-162-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-162-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-163-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-163-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-164-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-164-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-165-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-165-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-165-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-166-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-166-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-166-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-167-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-167-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-167-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-168-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-168-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-169-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-169-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-170-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-170-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-171-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-171-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-172-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-172-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-173-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-173-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-173-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-174-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-174-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-174-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-175-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-175-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-175-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-176-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-176-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-176-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-177-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-177-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-178-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-178-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-179-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-179-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-180-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-180-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-181-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-181-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-182-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-182-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-182-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-183-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-183-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-183-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-184-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-184-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-185-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-185-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-185-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-186-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-186-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-186-00002.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-187-00000.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-187-00001.npy + - r2://olmo-data/preprocessed/olmo-mix/v1_5-sample/gpt-neox-20b-pii-special/part-187-00002.npy diff --git a/hf_olmo/modeling_olmo.py b/hf_olmo/modeling_olmo.py index 8856be8ad..6a279cb10 100644 --- a/hf_olmo/modeling_olmo.py +++ b/hf_olmo/modeling_olmo.py @@ -48,7 +48,9 @@ def __init__(self, config: OLMoConfig, model: Optional[Olmo] = None, init_params def forward( self, input_ids: torch.LongTensor = None, + inputs_embeds: Optional[torch.FloatTensor] = None, attention_mask: Optional[torch.Tensor] = None, + attention_bias: Optional[torch.Tensor] = None, past_key_values: Optional[List[torch.FloatTensor]] = None, labels: Optional[torch.LongTensor] = None, use_cache: Optional[bool] = None, @@ -59,17 +61,24 @@ def forward( if use_cache is None: use_cache = self.config.use_cache + if output_attentions: + raise ValueError("output_attentions is not yet supported in OLMo") + return_dict = return_dict if return_dict is not None else self.config.use_return_dict # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn) outputs = self.model.forward( input_ids=input_ids, + input_embeddings=inputs_embeds, attention_mask=attention_mask, + attention_bias=attention_bias, past_key_values=past_key_values, use_cache=use_cache, + output_hidden_states=output_hidden_states, ) logits = outputs.logits + hidden_states = outputs.hidden_states loss = None if labels is not None: @@ -92,6 +101,7 @@ def forward( loss=loss, logits=logits, past_key_values=outputs.attn_key_values, + hidden_states=hidden_states, ) def can_generate(self) -> bool: diff --git a/olmo/model.py b/olmo/model.py index cc621a37b..a11eceb71 100644 --- a/olmo/model.py +++ b/olmo/model.py @@ -8,9 +8,9 @@ import logging import math +import sys from abc import abstractmethod from collections import defaultdict -from collections.abc import MutableMapping from functools import partial from typing import ( Callable, @@ -46,6 +46,13 @@ from .initialization import ModuleType, init_weights from .torch_util import ensure_finite_ +if sys.version_info.minor > 8: + from collections.abc import MutableMapping +elif sys.version_info.minor == 8: + from typing import MutableMapping +else: + raise SystemExit("This script supports Python 3.8 or higher") + __all__ = [ "LayerNormBase", "LayerNorm", @@ -927,6 +934,11 @@ class OlmoOutput(NamedTuple): Attention keys and values from each block. """ + hidden_states: Optional[Tuple[torch.Tensor]] + """ + Hidden states from each block. + """ + class OlmoGenerateOutput(NamedTuple): token_ids: torch.LongTensor @@ -1137,14 +1149,18 @@ def get_alibi_attention_bias(self, seq_len: int, device: torch.device) -> torch. def forward( self, input_ids: torch.LongTensor, + input_embeddings: Optional[torch.FloatTensor] = None, attention_mask: Optional[torch.Tensor] = None, attention_bias: Optional[torch.Tensor] = None, past_key_values: Optional[Sequence[Tuple[torch.Tensor, torch.Tensor]]] = None, use_cache: bool = False, last_logits_only: bool = False, + output_hidden_states: Optional[bool] = None, ) -> OlmoOutput: """ :param input_ids: A tensor of shape `(batch_size, seq_len)`. + :param input_embeddings: A tensor of shape `(batch_size, seq_len, d_model)` with input + embeddings. When provided, it is treated as the output of the input embedding layer. :param attention_mask: A tensor of shape `(batch_size, seq_len)` that indicates which input IDs are masked. A `1` value in the mask means that the corresponding input ID should *not* be ignored. A `0` means @@ -1171,10 +1187,12 @@ def forward( :param last_logits_only: If `True`, only compute the logits for the last token of each sequence. This can speed up decoding when you only care about the next token. """ + output_hidden_states = output_hidden_states if output_hidden_states is not None else False + if past_key_values: assert len(past_key_values) == self.config.n_layers - batch_size, seq_len = input_ids.size() + batch_size, seq_len = input_ids.size() if input_embeddings is None else input_embeddings.size()[:2] if past_key_values is None: past_length = 0 else: @@ -1182,14 +1200,12 @@ def forward( # Get embeddings of input. # shape: (batch_size, seq_len, d_model) - x = self.transformer.wte(input_ids) # type: ignore + x = self.transformer.wte(input_ids) if input_embeddings is None else input_embeddings # type: ignore if not (self.config.alibi or self.config.rope): # Get positional embeddings. # shape: (1, seq_len) - pos = torch.arange( - past_length, past_length + seq_len, dtype=torch.long, device=input_ids.device - ).unsqueeze(0) + pos = torch.arange(past_length, past_length + seq_len, dtype=torch.long, device=x.device).unsqueeze(0) # shape: (1, seq_len, d_model) pos_emb = self.transformer.wpe(pos) # type: ignore x = pos_emb + x @@ -1229,7 +1245,7 @@ def forward( if attention_mask is not None: mask_len = attention_mask.shape[-1] elif past_key_values is not None: - mask_len = past_key_values[0][0].shape[-2] + input_ids.shape[-1] + mask_len = past_key_values[0][0].shape[-2] + seq_len attention_bias = attention_bias[:, :, :mask_len, :mask_len].to(dtype=torch.float) # Add in the masking bias. @@ -1242,9 +1258,16 @@ def forward( attn_key_values: Optional[List[Tuple[torch.Tensor, torch.Tensor]]] = [] if use_cache else None + # decoder layers + all_hidden_states = [] + # Apply blocks one-by-one. if self.config.block_group_size == 1: for block_idx, block in enumerate(self.transformer.blocks): + if output_hidden_states: + # add hidden states + all_hidden_states.append(x) + layer_past = None if past_key_values is None else past_key_values[block_idx] if ( (self.activation_checkpointing_strategy == ActivationCheckpointingStrategy.whole_layer) @@ -1273,6 +1296,10 @@ def forward( attn_key_values.append(cache) else: for group_idx, block_group in enumerate(self.transformer.block_groups): + if output_hidden_states: + # add hidden states + all_hidden_states.append(x) + layers_past = ( None if past_key_values is None @@ -1294,6 +1321,9 @@ def forward( # Apply final layer norm. # shape: (batch_size, seq_len or 1, d_model) x = self.transformer.ln_f(x) # type: ignore + if output_hidden_states: + # add final hidden state post-final-layernorm, following HuggingFace's convention + all_hidden_states.append(x) # Get logits. # shape: (batch_size, seq_len or 1, vocab_size) @@ -1304,30 +1334,42 @@ def forward( if self.config.scale_logits: logits.mul_(1 / math.sqrt(self.config.d_model)) - return OlmoOutput(logits=logits, attn_key_values=attn_key_values) # type: ignore[arg-type] + return OlmoOutput(logits=logits, attn_key_values=attn_key_values, hidden_states=tuple(all_hidden_states) if output_hidden_states else None) # type: ignore[arg-type] def get_fsdp_wrap_policy(self, wrap_strategy: Optional[FSDPWrapStrategy] = None): if wrap_strategy is None: return None + + # The 'recurse' mode for the wrap function does not behave like you'd expect. + # Even if we return False, it may still recurse because PyTorch does what it wants, + # not what you want. This causes issues when, for example, we want to wrap 'ff_out' (a linear layer) + # but not other linear layers within a block. + # So we have to explicitly tell PyTorch which linear layers to wrap, and we also just + # return True in 'recurse' mode for simplicity. + size_based_module_to_wrap = {self.transformer.wte} + if hasattr(self.transformer, "ff_out"): + size_based_module_to_wrap.add(self.transformer.ff_out) + if wrap_strategy == FSDPWrapStrategy.by_block: def fsdp_wrap_fn(module, recurse: bool = True, nonwrapped_numel: int = 0): del nonwrapped_numel + wrap = isinstance(module, OlmoBlock) if recurse: - return True # always recurse for simplicity - return isinstance(module, OlmoBlock) + return True + else: + return wrap return fsdp_wrap_fn elif wrap_strategy == FSDPWrapStrategy.by_block_and_size: def fsdp_wrap_fn(module, recurse: bool = True, nonwrapped_numel: int = 0): del nonwrapped_numel + wrap = isinstance(module, (OlmoBlock,)) or module in size_based_module_to_wrap if recurse: - # Determine if we should recurse. - return not isinstance(module, OlmoBlock) + return True else: - # Determine if we should wrap. - return isinstance(module, (OlmoBlock, nn.Linear, nn.Embedding)) + return wrap return fsdp_wrap_fn elif wrap_strategy == FSDPWrapStrategy.by_block_group: @@ -1338,9 +1380,11 @@ def fsdp_wrap_fn(module, recurse: bool = True, nonwrapped_numel: int = 0): def fsdp_wrap_fn(module, recurse: bool = True, nonwrapped_numel: int = 0): del nonwrapped_numel + wrap = isinstance(module, OlmoBlockGroup) if recurse: - return True # always recurse for simplicity - return isinstance(module, OlmoBlockGroup) + return True + else: + return wrap return fsdp_wrap_fn elif wrap_strategy == FSDPWrapStrategy.by_block_group_and_size: @@ -1351,12 +1395,11 @@ def fsdp_wrap_fn(module, recurse: bool = True, nonwrapped_numel: int = 0): def fsdp_wrap_fn(module, recurse: bool = True, nonwrapped_numel: int = 0): del nonwrapped_numel + wrap = isinstance(module, (OlmoBlockGroup,)) or module in size_based_module_to_wrap if recurse: - # Determine if we should recurse. - return not isinstance(module, OlmoBlockGroup) + return True else: - # Determine if we should wrap. - return isinstance(module, (OlmoBlockGroup, nn.Linear, nn.Embedding)) + return wrap return fsdp_wrap_fn elif wrap_strategy == FSDPWrapStrategy.size_based: @@ -1378,9 +1421,11 @@ def fsdp_wrap_fn(module, recurse: bool = True, nonwrapped_numel: int = 0): def fsdp_wrap_fn(module, recurse: bool = True, nonwrapped_numel: int = 0): del nonwrapped_numel + wrap = isinstance(module, OlmoBlock) and module.layer_id % c == 0 if recurse: - return True # always recurse for simplicity - return isinstance(module, OlmoBlock) and module.layer_id % c == 0 + return True + else: + return wrap return fsdp_wrap_fn else: @@ -1470,7 +1515,7 @@ def generate( tokens_generated = 0 def flatten_past_key_values( - past_key_values: List[Tuple[torch.Tensor, torch.Tensor]] + past_key_values: List[Tuple[torch.Tensor, torch.Tensor]], ) -> Dict[str, torch.Tensor]: out = {} for i, (key, value) in enumerate(past_key_values): @@ -1479,7 +1524,7 @@ def flatten_past_key_values( return out def unflatten_past_key_values( - past_key_values: Dict[str, torch.Tensor] + past_key_values: Dict[str, torch.Tensor], ) -> List[Tuple[torch.Tensor, torch.Tensor]]: out = [] for i in range(self.config.n_layers): diff --git a/olmo/train.py b/olmo/train.py index f459ad88d..79132f0fc 100644 --- a/olmo/train.py +++ b/olmo/train.py @@ -877,7 +877,7 @@ def fit(self): if self.cfg.torch_profiling and get_global_rank() == 0: from torch.profiler import schedule - profiling_schedule = schedule(wait=1, warmup=5, active=3) + profiling_schedule = schedule(wait=1, warmup=5, active=3, repeat=1) def on_trace_ready(p): profiler_output_dir = Path(self.cfg.save_folder) / "profiler" diff --git a/olmo/util.py b/olmo/util.py index 62e964b5d..71ee67e60 100644 --- a/olmo/util.py +++ b/olmo/util.py @@ -7,7 +7,6 @@ import warnings from datetime import datetime from enum import Enum -from functools import cache from itertools import cycle, islice from pathlib import Path from queue import Queue @@ -34,6 +33,11 @@ ) from .torch_util import get_global_rank, get_local_rank, get_node_rank, is_distributed +try: + from functools import cache +except ImportError: + from functools import lru_cache as cache + class StrEnum(str, Enum): """ diff --git a/scripts/prepare_tulu_data.py b/scripts/prepare_tulu_data.py index 4eba35945..7994b406f 100644 --- a/scripts/prepare_tulu_data.py +++ b/scripts/prepare_tulu_data.py @@ -116,7 +116,7 @@ def get_parser() -> ArgumentParser: "--tokenizer", type=str, help="""Tokenizer path or identifier.""", - default="tokenizers/allenai_eleuther-ai-gpt-neox-20b-pii-special.json", + default=Path(__file__).parent / "tokenizers" / "allenai_eleuther-ai-gpt-neox-20b-pii-special.json", ) parser.add_argument("-s", "--seq-len", type=int, help="""Max sequence length.""", default=2048) parser.add_argument("--eos", type=int, help="""EOS token ID.""", default=50279)