Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNM: Loss issue checkpoint with refine1b setups #682

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
e7ab869
Adds refine launch/config
undfined Jul 29, 2024
89c1c0d
Drop new config option
undfined Jul 29, 2024
94c8ae8
Small change to rewrite config
undfined Jul 29, 2024
f9fcf16
Add baseline launch/config
undfined Jul 29, 2024
9018e88
Match rewrite config
undfined Jul 29, 2024
61ee2f0
Use s3 for now
undfined Jul 29, 2024
8c69b97
Cleanup
undfined Jul 29, 2024
4a85106
Only jupiter
undfined Jul 29, 2024
12b4f4c
Double CxN mixed data setup
undfined Jul 31, 2024
a858da6
Add 2ep setups
undfined Jul 31, 2024
0ba08d8
Task names
undfined Aug 1, 2024
15d58fd
Fix config for rewrite only 2ep
undfined Aug 2, 2024
bc59dcb
Filtered config and launch
undfined Aug 8, 2024
160e539
Use oe-data budget
undfined Aug 8, 2024
38d57c9
Cx2 setups
undfined Aug 12, 2024
f33451d
Typo
undfined Aug 13, 2024
1a58147
cx5 setup for baseline
undfined Aug 21, 2024
ebd7311
sync start
undfined Aug 21, 2024
80c5ff3
Try without saving
undfined Aug 21, 2024
bb1e53c
Drop olmo shared fs
undfined Aug 21, 2024
c8e4148
Add checkpoints back
undfined Aug 21, 2024
d9f0920
More Cx5 setups
undfined Aug 22, 2024
8635403
Mixed setup
undfined Aug 22, 2024
f10b1e2
Cx2 setups for mixed and unfiltered
undfined Aug 23, 2024
70462e3
copy pazta
undfined Aug 23, 2024
1e154a4
Use high/1 node for cx2 runs
undfined Aug 23, 2024
ea45835
gantry things
undfined Aug 23, 2024
1c38953
npy paths
soldni Aug 28, 2024
44df68a
Add fw filtered cx5 setup
undfined Sep 20, 2024
738ce8f
Merge branch 'loss-issue-cp' of github.com:allenai/OLMo into loss-iss…
undfined Sep 20, 2024
7eae80e
Adds 80th percentile fw score setup
undfined Sep 24, 2024
f0029e1
Add dclm ft delta setup
undfined Sep 25, 2024
6025d97
Cx5 rewrites inupt length filter
undfined Oct 8, 2024
6398fb4
50pctl + length filter
undfined Oct 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
321 changes: 321 additions & 0 deletions configs/refine/olmo-1b-refine-mixed-2ep.yaml

Large diffs are not rendered by default.

480 changes: 480 additions & 0 deletions configs/refine/olmo-1b-refine-mixed-50pctl-dclm-Cx5.yaml

Large diffs are not rendered by default.

893 changes: 893 additions & 0 deletions configs/refine/olmo-1b-refine-mixed-50pctl-fw-Cx5.yaml

Large diffs are not rendered by default.

492 changes: 492 additions & 0 deletions configs/refine/olmo-1b-refine-mixed-50pctl-length-filter-dclm-Cx5.yaml

Large diffs are not rendered by default.

893 changes: 893 additions & 0 deletions configs/refine/olmo-1b-refine-mixed-80pctl-fw-Cx5.yaml

Large diffs are not rendered by default.

492 changes: 492 additions & 0 deletions configs/refine/olmo-1b-refine-mixed-Cx2.yaml

Large diffs are not rendered by default.

779 changes: 779 additions & 0 deletions configs/refine/olmo-1b-refine-mixed-Cx5.yaml

Large diffs are not rendered by default.

4,251 changes: 4,251 additions & 0 deletions configs/refine/olmo-1b-refine-mixed-length-filter-dclm-Cx5.yaml

Large diffs are not rendered by default.

321 changes: 321 additions & 0 deletions configs/refine/olmo-1b-refine-rewrite-only-2ep.yaml

Large diffs are not rendered by default.

523 changes: 523 additions & 0 deletions configs/refine/olmo-1b-refine-rewrite-only-Cx2.yaml

Large diffs are not rendered by default.

991 changes: 991 additions & 0 deletions configs/refine/olmo-1b-refine-rewrite-only-Cx5.yaml

Large diffs are not rendered by default.

417 changes: 417 additions & 0 deletions configs/refine/olmo-1b-refine-rewrite-only-filtered-Cx2.yaml

Large diffs are not rendered by default.

355 changes: 355 additions & 0 deletions configs/refine/olmo-1b-refine-rewrite-only-filtered.yaml

Large diffs are not rendered by default.

323 changes: 323 additions & 0 deletions configs/refine/olmo-1b-refine-rewrite-only.yaml

Large diffs are not rendered by default.

321 changes: 321 additions & 0 deletions configs/refine/olmo-1b-refine-source-only-2ep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,321 @@
run_name: olmo-1b-refine-source-only-2ep-001
seed: 6198
dry_run: false
no_pre_train_checkpoint: true

wandb:
name: ${run_name}
project: refine-train
group: ${run_name}

model:
d_model: 2048
n_heads: 16
n_layers: 16
mlp_ratio: 8
weight_tying: false
alibi: false
rope: true
flash_attention: true
attention_dropout: 0.0
include_bias: false
block_type: sequential
layer_norm_type: rms
layer_norm_with_affine: true
layer_norm_eps: 1e-6
attention_layer_norm: true
bias_for_layer_norm: false
attention_layer_norm_with_affine: false
activation_type: swiglu
residual_dropout: 0.0
embedding_dropout: 0.0
max_sequence_length: 2048
vocab_size: 100278
embedding_size: 100352
eos_token_id: 100257
pad_token_id: 100277
init_device: cuda
init_fn: normal
init_std: 0.02
init_cutoff_factor: 3

compile: null

optimizer:
name: adamw
learning_rate: 0.002
eps: 1.0e-8
weight_decay: 0.05
decay_norm_and_bias: true
decay_embeddings: true
betas:
- 0.9
- 0.95
metrics_log_interval: 10

# Cx1: t_max = 1.3B params * 20 = 26e9
# Cx2: t_max = 1.3B params * 40 = 52e9
# Cx3: t_max = 1.3B params * 60 = 78e9

scheduler:
name: cosine_with_warmup
units: tokens
t_warmup: 2e9
alpha_f: 0.01

tokenizer:
identifier: allenai/dolma2-tokenizer
truncate_direction: right

save_folder: runs/${run_name}
remote_save_folder: s3://ai2-llm/checkpoints/refine-1b/${run_name}
save_overwrite: false

save_interval: 5000
save_interval_ephemeral: null
save_num_checkpoints_to_keep: -1
sharded_checkpointer: olmo_core

save_interval_unsharded: null
save_num_unsharded_checkpoints_to_keep: -1

load_path: null

max_duration: 2ep
global_train_batch_size: 1024
device_train_microbatch_size: 4

fused_loss: true

ddp:
grad_sync_mode: batch
find_unused_params: false

precision: amp_bf16

distributed_strategy: ddp

max_grad_norm: 1.0
max_grad_norm_ratio: null

speed_monitor:
window_size: 1

eval_interval: 1000
eval_subset_num_batches: -1
device_eval_batch_size: ${device_train_microbatch_size}
evaluators:
- label: all-small-ppl-validation
data:
num_workers: 0
drop_last: true
memmap_dtype: uint32
datasets:
c4_en-validation:
- s3://ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/c4_en/val/part-0-00000.npy
dolma_books-validation:
- s3://ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_books/val/part-0-00000.npy
dolma_common-crawl-validation:
- s3://ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_common-crawl/val/part-0-00000.npy
dolma_pes2o-validation:
- s3://ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_pes2o/val/part-0-00000.npy
dolma_reddit-validation:
- s3://ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_reddit/val/part-0-00000.npy
dolma_stack-validation:
- s3://ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_stack/val/part-0-00000.npy
dolma_wiki-validation:
- s3://ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_wiki/val/part-0-00000.npy
ice-validation:
- s3://ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/ice/val/part-0-00000.npy
m2d2_s2orc-validation:
- s3://ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/m2d2_s2orc/val/part-0-00000.npy
pile-validation:
- s3://ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/pile/val/part-0-00000.npy
wikitext_103-validation:
- s3://ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/wikitext_103/val/part-0-00000.npy

##########################
# Downstream evaluations #
##########################
- label: piqa
type: downstream

- label: hellaswag
type: downstream

- label: winogrande
type: downstream

- label: openbook_qa
type: downstream

- label: boolq
type: downstream

- label: sciq
type: downstream

- label: arc_easy
type: downstream

- label: arc_challenge
type: downstream

- label: copa
type: downstream

- label: commonsense_qa
type: downstream

- label: social_iqa
type: downstream

- label: mmlu_stem_var
type: downstream

- label: mmlu_humanities_var
type: downstream

- label: mmlu_social_sciences_var
type: downstream

- label: mmlu_other_var
type: downstream

- label: mmlu_stem_mc_5shot
type: downstream

- label: mmlu_humanities_mc_5shot
type: downstream

- label: mmlu_social_sciences_mc_5shot
type: downstream

- label: mmlu_other_mc_5shot
type: downstream

- label: mmlu_stem_mc_5shot_test
type: downstream

- label: mmlu_humanities_mc_5shot_test
type: downstream

- label: mmlu_social_sciences_mc_5shot_test
type: downstream

- label: mmlu_other_mc_5shot_test
type: downstream

data:
pad_direction: right
num_workers: 16
drop_last: true
pin_memory: true
prefetch_factor: 8
persistent_workers: true
timeout: 0
memmap_dtype: uint32
instance_filter:
repetition_max_period: 13
repetition_min_period: 1
repetition_max_count: 32
paths:
# Cx1 20b sample set 01
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-00-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-01-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-02-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-03-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-04-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-05-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-06-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-07-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-08-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-09-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-10-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-11-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-12-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-13-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-14-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-15-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-16-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-17-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-18-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-19-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-20-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-21-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-22-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-23-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-24-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-25-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-26-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-27-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-28-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-29-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-30-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-31-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-32-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-33-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-34-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-35-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-36-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-37-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-38-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-39-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-40-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-41-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-42-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-43-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-44-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-45-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-46-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/001/allenai/dolma2-tokenizer/part-47-00000.npy

# Cx1 20b sample set 02
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-00-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-01-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-02-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-03-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-04-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-05-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-06-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-07-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-08-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-09-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-10-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-11-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-12-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-13-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-14-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-15-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-16-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-17-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-18-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-19-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-20-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-21-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-22-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-23-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-24-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-25-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-26-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-27-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-28-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-29-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-30-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-31-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-32-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-33-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-34-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-35-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-36-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-37-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-38-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-39-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-40-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-41-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-42-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-43-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-44-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-45-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-46-00000.npy
- s3://ai2-llm/preprocessed/dclm/samples/src-20b/002/allenai/dolma2-tokenizer/part-47-00000.npy
Loading
Loading