Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull in upstream changes from argonne-lcf/Megatron-DeepSpeed #1

Open
wants to merge 442 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
442 commits
Select commit Hold shift + click to select a range
96cb1e5
fixed overflow issue
zhenghh04 May 31, 2024
cb2f1dc
removed unnecessary mpi4py
zhenghh04 May 31, 2024
b48d6f8
Merge pull request #18 from argonne-lcf/distributed_loading_v2
zhenghh04 May 31, 2024
0dea6aa
Update dataset_utils.py
zhenghh04 Jun 4, 2024
f16416a
merge distributed_loading
zhenghh04 Jun 5, 2024
5d26dfe
fixed a minor bug
zhenghh04 Jun 5, 2024
3dc424f
remove unnecessary barrier
zhenghh04 Jun 5, 2024
60fc482
added pfw tracing for test_blendable_dataset
zhenghh04 Jun 5, 2024
b1f17d5
fixed bug
zhenghh04 Jun 5, 2024
10a3737
added more loging
zhenghh04 Jun 5, 2024
bc28f84
removed allreduce calls that are not needed
zhenghh04 Jun 5, 2024
6eb21b7
removed allreduce call that are not needed any more
zhenghh04 Jun 5, 2024
20a2430
fixed a bug
zhenghh04 Jun 5, 2024
f718694
added more logging info
zhenghh04 Jun 5, 2024
699bde4
Merge branch 'distributed_loading' of ../Megatron-DeepSpeed-distribut…
zhenghh04 Jun 5, 2024
dd3b070
Merge branch 'distributed_loading' of github.com:argonne-lcf/Megatron…
zhenghh04 Jun 5, 2024
b4c832e
added more logging for index_dataset
zhenghh04 Jun 5, 2024
1719b0e
added new log
zhenghh04 Jun 5, 2024
053b42d
changed things into helper
zhenghh04 Jun 5, 2024
52b2cca
fixed issue with dlioprofiler
zhenghh04 Jun 6, 2024
cbc7830
fixed some bugs
zhenghh04 Jun 6, 2024
03a9bfa
Merge branch 'pfw_trace' of github.com:argonne-lcf/Megatron-DeepSpeed…
zhenghh04 Jun 6, 2024
36a2671
fixed profiler issue
zhenghh04 Jun 6, 2024
5c8d376
reduced printing
zhenghh04 Jun 6, 2024
d9085b6
added more timing info
zhenghh04 Jun 10, 2024
0ef6bfd
fixed timing issue for all reduce
zhenghh04 Jun 10, 2024
26ee1c3
Merge pull request #20 from argonne-lcf/pfw_trace
zhenghh04 Jun 10, 2024
9413dc9
Merge pull request #21 from argonne-lcf/distributed_loading_v2
zhenghh04 Jun 10, 2024
f6363fb
changed init
zhenghh04 Jun 12, 2024
a55df51
reducing printing from non-root ranks
zhenghh04 Jun 12, 2024
a24f01b
reduce printing
zhenghh04 Jun 12, 2024
5a54149
reducing printing
zhenghh04 Jun 12, 2024
3acdda7
added MiCS as an option
zhenghh04 Jun 13, 2024
73f6cee
Merge branch 'mics' into distributed_loading
zhenghh04 Jun 13, 2024
712d08d
Update `dropout` in `ALCF/helpers.sh`
saforem2 Jun 14, 2024
482c235
Update {`ALCF/helpers.sh`, `train_llama_alcf.sh`}
saforem2 Jun 14, 2024
2e26950
Merge pull request #22 from argonne-lcf/sequence-parallel
saforem2 Jun 14, 2024
f4c2c16
Add `ALCF/data-lists/aurora/*.txt`
saforem2 Jun 14, 2024
231d2b5
Add `setup_conda_aurora` to `ALCF/helpers.sh`
saforem2 Jun 14, 2024
852575d
Merge pull request #23 from argonne-lcf/aurora-updates
saforem2 Jun 14, 2024
aaf6152
Fix `ezpz_{save,get}jobenv` in `ALCF/helpers.sh`
saforem2 Jun 14, 2024
56a1c37
Merge pull request #24 from argonne-lcf/ezpz-hotfix
saforem2 Jun 14, 2024
b905e53
Correctly set `dfl_fallback` on Aurora if no `DATA_FILE_LIST` specified
saforem2 Jun 14, 2024
4a07103
Merge pull request #25 from argonne-lcf/aurora-dfl-fix
saforem2 Jun 14, 2024
ba5f871
added warning if the file list is not provided correctly
zhenghh04 Jun 14, 2024
c690202
make it still compatible to previous
zhenghh04 Jun 14, 2024
a96bcea
added support for XPU
zhenghh04 Jun 14, 2024
30fe479
Update README.md
saforem2 Jun 14, 2024
9208eae
Update README.md
saforem2 Jun 14, 2024
caf82d7
Merge pull request #26 from argonne-lcf/saforem2-patch-1
saforem2 Jun 14, 2024
1f983f3
Create `llama-toggle` branch
saforem2 Jun 14, 2024
f902e91
Merge pull request #19 from argonne-lcf/checkpoint_convert
saforem2 Jun 15, 2024
67d6810
Update README.md
saforem2 Jun 15, 2024
3091871
Update `setEnv` for Aurora in `ALCF/helpers.sh`
saforem2 Jun 15, 2024
81fe55f
Update README.md
saforem2 Jun 15, 2024
983a0bd
Merge pull request #27 from argonne-lcf/saforem2-patch-1
saforem2 Jun 15, 2024
7d1784b
Updates to `NO_LLAMA` mode
saforem2 Jun 15, 2024
bf979a7
Update `pretrain_gpt_alcf.py`
saforem2 Jun 15, 2024
84fa77c
Update `pretrain_gpt_alcf.py`
saforem2 Jun 15, 2024
e6461f5
Merge pull request #28 from argonne-lcf/llama-toggle
saforem2 Jun 16, 2024
f138b27
added more log
zhenghh04 Jun 17, 2024
a7249fe
resolve conflict in file list
zhenghh04 Jun 17, 2024
a36569e
added warning info when XPU profiling is not available
zhenghh04 Jun 17, 2024
79d11a7
Create `alcf-patch-1` branch
saforem2 Jun 18, 2024
e058427
Update `ALCF/helpers.sh`
saforem2 Jun 18, 2024
1ae3768
Update `ALCF/helpers.sh`
saforem2 Jun 19, 2024
abead32
Update `ALCF/README.md`
saforem2 Jun 19, 2024
025ff3f
Update ALCF/README.md`
saforem2 Jun 19, 2024
d012937
Merge pull request #29 from argonne-lcf/alcf-patch-1
saforem2 Jun 19, 2024
ef5356b
Merge pull request #16 from argonne-lcf/distributed_loading
saforem2 Jun 19, 2024
732e567
Add `ALCF/data-lists/aurora/*.txt`
saforem2 Jun 19, 2024
0320b69
Update `ALCF/data-lists/sunspot/*.txt`
saforem2 Jun 19, 2024
a51fb11
Update `ALCF/data-lists/polaris/*.txt`
saforem2 Jun 19, 2024
9d10704
Update `.gitignore`
saforem2 Jun 19, 2024
ec600e5
Update `ALCF/helpers.sh`
saforem2 Jun 19, 2024
168cdda
Add `ALCF/requirements/requirements.txt`
saforem2 Jun 19, 2024
7df9329
Update `ALCF/helpers.sh`
saforem2 Jun 19, 2024
77ffd10
Update `ALCF/helpers.sh`
saforem2 Jun 19, 2024
e884f15
Update `ALCF/helpers.sh,requirements/requirements.txt}`
saforem2 Jun 19, 2024
10a17e2
Merge pull request #30 from argonne-lcf/distributed-data-lists
saforem2 Jun 19, 2024
fb49de8
Update `ALCF/helpers.sh`
saforem2 Jun 19, 2024
7272326
Update `ALCF/helpers.sh`
saforem2 Jun 19, 2024
18ca369
Merge pull request #31 from argonne-lcf/alcf-helpers-patch-1
saforem2 Jun 19, 2024
f826667
Update `ALCF/helpers.sh` with kvs fix on Aurora
saforem2 Jun 21, 2024
26b846a
Update `ALCF/helpers.sh`
saforem2 Jun 21, 2024
7cd5bfa
Merge pull request #32 from argonne-lcf/alcf-aurora-kvs-fix
saforem2 Jun 21, 2024
bc7fbc6
Update `ALCF/README.md`
saforem2 Jun 21, 2024
f94b845
Update `ALCF/README.md`
saforem2 Jun 21, 2024
6f98d5a
Merge pull request #33 from argonne-lcf/alcf-update-readme
saforem2 Jun 21, 2024
06357f4
Create `alcf-startup-time`
saforem2 Jun 21, 2024
c7a1e36
Add `ALCF/notes/deepspeed_init_time.md`
saforem2 Jun 24, 2024
0548bfb
Update `ALCF/notes/deepspeed_init_time.md`
saforem2 Jun 24, 2024
6a8f55c
Update deepspeed_init_time.md
saforem2 Jun 24, 2024
d0e3d79
Update `ALCF/helpers.sh`
saforem2 Jun 25, 2024
bb690e3
Update `pretrain_gpt_alcf.py`
saforem2 Jun 25, 2024
aa698da
Update `train_llama_alcf.sh`
saforem2 Jun 25, 2024
12baf30
Update `megatron/training.py`
saforem2 Jun 25, 2024
8eabb7a
Update `megatron/training.py`
saforem2 Jun 25, 2024
d9fc18e
Update `ALCF/helpers.sh`
saforem2 Jun 25, 2024
1d413c6
Update `megatron/training.py`
saforem2 Jun 25, 2024
93e4a51
Update `megatron/utils.py`
saforem2 Jun 25, 2024
99bddfa
Update `ALCF/helpers.sh`
saforem2 Jun 25, 2024
9a8ccfd
Update `ALCF/helpers.sh`
saforem2 Jun 25, 2024
c6a63bc
Merge pull request #34 from argonne-lcf/alcf-startup-time
saforem2 Jun 25, 2024
57ba1fb
Update `ALCF/helpers.sh`
saforem2 Jun 26, 2024
634e37b
Add steps and results for running ZeRO stage 3 withUniversal Checkpoi…
xylian86 Jun 26, 2024
527957e
Add Zero Bubble Pipeline Parallelism H1 Schedule (#396)
nvmdava Jun 27, 2024
f2d7589
Fix ParallelMLP and enable accelerator test (#403)
xinyu-intel Jun 27, 2024
ea4b67a
Fix test_deallocate_output_tensor (#404)
xinyu-intel Jun 27, 2024
7388c1a
Update `ALCF/helpers.sh`
saforem2 Jun 29, 2024
37a7c5c
Merge pull request #36 from argonne-lcf/alcf-helpers-patch
saforem2 Jun 29, 2024
08f5a99
Fixed missing BookCorpus dataset. (#407)
costin-eseanu Jul 1, 2024
c3a13be
Set proper arguments when constructing models in unit tests (#408)
xinyu-intel Jul 1, 2024
b511a2e
Update `ALCF/helpers.sh`
saforem2 Jul 5, 2024
330f9f2
use split/squeeze instead of slice for performance (#409)
polisettyvarma Jul 8, 2024
af06d14
improve performance by keeping attention_mask on device and run ops f…
polisettyvarma Jul 8, 2024
561ddc1
Fix micro batch size on Polaris
saforem2 Jul 10, 2024
9ee09fe
Update `ALCF/helpers.sh`
saforem2 Jul 10, 2024
76209f4
Update `ALCF/helpers.sh`
saforem2 Jul 10, 2024
541ebf1
Update `ALCF/helpers.sh`
saforem2 Jul 10, 2024
d76331f
Update `ALCF/helpers.sh`
saforem2 Jul 10, 2024
d017b4c
Update `ALCF/helpers.sh`
saforem2 Jul 11, 2024
bac8aab
Update `ALCF/helpers.sh`
saforem2 Jul 11, 2024
ec3f1f4
Improve RoPE perf by using cached sin/cos tensors (#410)
polisettyvarma Jul 11, 2024
911cc5c
Update `ALCF/helpers.sh`
saforem2 Jul 12, 2024
354e420
Extend test utilities to support more accelerators (#418)
xinyu-intel Jul 12, 2024
73252c0
clear document (#395)
inkcherry Jul 12, 2024
0971e68
add PyTorch profiler support (#414)
polisettyvarma Jul 15, 2024
2ac4fb0
Update `ALCF/helpers.sh`
saforem2 Jul 15, 2024
4876eb8
Update `ALCF/helpers.sh` on Polaris
saforem2 Jul 16, 2024
7385e3b
Update `ALCF/helpers.sh`
saforem2 Jul 16, 2024
5f5bbd4
Update `pretrain_gpt_alcf.py`
saforem2 Jul 16, 2024
73029ed
[Wandb] Refine wandb logging function (#416)
billishyahao Jul 16, 2024
fc989b8
add kill switch file support to gracefully exit training at runtime (…
polisettyvarma Jul 17, 2024
7d23e33
add support to run custom Hf tokenizer for training and dataset pre-p…
polisettyvarma Jul 18, 2024
13f2673
improve repeat_kv GQA perf (#419)
polisettyvarma Jul 19, 2024
3af2e25
acquire device when required (#420)
polisettyvarma Jul 19, 2024
b38bcb6
Update `ALCF/helpers.sh`
saforem2 Jul 19, 2024
0999de2
Update `ALCF/requirements/requirements.txt`
saforem2 Jul 19, 2024
6ad3a99
Fix opt hyperparams in `ALCF/helpers.sh`
saforem2 Jul 19, 2024
08b9376
Add basic compilation test (#426)
loadams Jul 19, 2024
3afd267
Update yml to be valid (#427)
loadams Jul 19, 2024
019dc3c
Update `ALCF/helpers.sh`
saforem2 Jul 20, 2024
54bd608
Track grad_norm in `megatron/training.py`
saforem2 Jul 20, 2024
969f4c5
Update `train_aGPT_7B.sh`
saforem2 Jul 20, 2024
9550656
Update `train_llama_alcf.sh`
saforem2 Jul 22, 2024
5d96d64
Update `train_aGPT_7B.sh`
saforem2 Jul 22, 2024
8897dc2
Merge pull request #43 from argonne-lcf/alcf-helpers-patch-1
saforem2 Jul 22, 2024
8822a5c
Update/add GPT/Llama universal checkpointing scripts (#391)
lekurile Jul 29, 2024
bcbe75f
Update README.md
saforem2 Jul 31, 2024
0270321
Merge pull request #49 from argonne-lcf/saforem2-patch-2
saforem2 Jul 31, 2024
1bfc35c
fixing the bug of flash_attn import and the wrong gather index when u…
YJHMITWEB Aug 1, 2024
53b241f
add fused_rms_norm support on XPU device (#431)
ys950902 Aug 4, 2024
61350c5
pass batch_dim_idx to deepspeed sequence parallel distributed attenti…
YJHMITWEB Aug 7, 2024
f132876
[LLaMa] Adding support converting checkpoint from mds to hf (#432)
billishyahao Aug 10, 2024
cdf5194
add device check when import ipex (#436)
ys950902 Aug 14, 2024
b7b2d5e
fix TFLOPs calculation (#371)
polisettyvarma Aug 19, 2024
b7c17ca
Move `ALCF/mds_to_hf.py` to `mds_to_hf.py`
saforem2 Aug 23, 2024
81470e9
Merge pull request #51 from argonne-lcf/checkpoint-conversion
saforem2 Aug 23, 2024
4f9f1f6
fix nan issue when running megatron-deepspeed (#434)
ys950902 Aug 24, 2024
8e9d973
enable empty cache on XPU device (#438)
ys950902 Aug 26, 2024
543543a
[wandb] disable wandb more gracefully (#422)
billishyahao Aug 27, 2024
1280f59
[Bug] Fix crash when logging optimizer state to tb (#417)
billishyahao Aug 27, 2024
5001600
fixed data loader issue for TP>1 PP>1
zhenghh04 Aug 30, 2024
38b2505
Update `ALCF/data-lists/aurora/*.txt`
saforem2 Aug 30, 2024
461bc7f
Merge pull request #52 from argonne-lcf/bugfix/tp_pp_dataloader
saforem2 Aug 30, 2024
ea0c3c7
fixed dftracer compatibility
zhenghh04 Aug 30, 2024
50e2729
hf cp conversion and inference scripts added
Aug 31, 2024
464a0d2
Merge pull request #53 from argonne-lcf/checkpoint_hf
saforem2 Aug 31, 2024
a0ac750
added requirements.txt
zhenghh04 Sep 3, 2024
0d6e379
Enable Sequence Parallelism (#429)
polisettyvarma Sep 4, 2024
de7f22f
Update utils.py
zhenghh04 Sep 4, 2024
3edba7f
Add `--train-range-to-skip` to `megatron/arguments.py`
saforem2 Sep 9, 2024
76a259b
Add logic for `--trin-range-to-skip` to `megatron/training.py`
saforem2 Sep 9, 2024
fd1ac6d
Update `ALCF/helpers.sh`
saforem2 Sep 10, 2024
6f27f5d
Update `train_aGPT_7B.sh`
saforem2 Sep 10, 2024
6df33ad
fix: `--override-opt_param-scheduler` if `OVERRIDE_CKPT_OPT_PARAM=1`
saforem2 Sep 11, 2024
73720c2
Merge pull request #56 from argonne-lcf/train-skip-range
saforem2 Sep 11, 2024
8bc5313
merge: Create `microsoft-main`
saforem2 Sep 12, 2024
a1ede68
Remove duplicate `--profile` arg
saforem2 Sep 12, 2024
6b32cff
debug: `sequence_parallel` issue in `RMSNorm` ??
saforem2 Sep 12, 2024
12f6f8e
fix check
zhenghh04 Sep 12, 2024
5ac877a
Update `megatron/training_log_alcf.py`
saforem2 Sep 12, 2024
b3e0f6f
Update `megatron/training.py`
saforem2 Sep 13, 2024
2113dbc
Update `megatron/utils.py`
saforem2 Sep 13, 2024
7f71572
Update `megatron/training_log.py`
saforem2 Sep 13, 2024
7cb9c11
Update `pretrain_gpt_alcf.py`
saforem2 Sep 15, 2024
e83de19
Update `megatron/training_log.py`
saforem2 Sep 15, 2024
29756d6
Warn if mismatch b/w iters in `megatron/checkpointing.py`
saforem2 Sep 15, 2024
1a7f03b
fix: `try/except` for non tensors in `megatron/training_log.py`
saforem2 Sep 16, 2024
828f6a9
fix: Correctly draw `grad_acc_steps` batches of data when skipping step
saforem2 Sep 17, 2024
295fcb3
Update `pretrain_gpt_alcf.py`
saforem2 Sep 17, 2024
598c092
grad_wei can't be NoneType when running with DeepSpeed, for zero3 wil…
ys950902 Sep 20, 2024
cf80e6b
added sophia
Sep 23, 2024
09accde
Merge pull request #59 from mngom2/spike-skipper
saforem2 Sep 30, 2024
8be7f48
fix init issue for rms_norm in squence_parallel (#448)
ys950902 Oct 4, 2024
4448492
enable profiler for specific ranks (#451)
ranzhejiang Oct 8, 2024
cef3fc7
Merge pull request #58 from argonne-lcf/spike-skipper
saforem2 Oct 8, 2024
fd94b37
merge: Resolve merge conflicts pulling in from Microsoft upstream
saforem2 Oct 8, 2024
ecc248a
Merge branch 'microsoft:main' into main
saforem2 Oct 11, 2024
9b5be12
merge: `argonne-lcf-microsoft-main` into `main`
saforem2 Oct 11, 2024
5394156
shuffle concate dataset index
zhenghh04 Oct 12, 2024
573b668
fixed bugs
zhenghh04 Oct 12, 2024
41ff059
Update `ALCF/helpers.sh`, `train_aGPT_7B.sh`
saforem2 Oct 12, 2024
89db92a
merge: `feature/profile` with data fix into `microsoft-main`
saforem2 Oct 12, 2024
9de83a9
Fix `shuffle_idx` in `megatron/data/gpt_dataset.py`
saforem2 Oct 12, 2024
d7a2594
Fix `shuffle_idx` in `megatron/data/gpt_dataset.py`
saforem2 Oct 12, 2024
3e33a6a
Update `ALCF/helpers.sh`, `train_aGPT_7B.sh`
saforem2 Oct 13, 2024
43cde2b
Update `pretrain_gpt_alcf.py`
saforem2 Oct 13, 2024
9f09733
Update `megatron/data/{blendable,gpt,indexed}_dataset.py`
saforem2 Oct 13, 2024
2b31b44
Update `ALCF/requirements/requirements.txt`
saforem2 Oct 13, 2024
5e9eed0
Update `megatron/utils.py`
saforem2 Oct 13, 2024
3dcb297
fixed bugs and added commandline option
zhenghh04 Oct 14, 2024
bec9b7a
Merge branch 'debug-logging' into feature/profile
saforem2 Oct 14, 2024
43fc2fe
fixed typo
zhenghh04 Oct 14, 2024
94d5337
Merge branch 'feature/profile' of github.com:argonne-lcf/Megatron-Dee…
zhenghh04 Oct 14, 2024
bb55e97
Merge pull request #67 from argonne-lcf/feature/profile
saforem2 Oct 14, 2024
d50239f
added support for blending samples across different files in the same…
zhenghh04 Oct 14, 2024
9b4f510
Merge pull request #64 from argonne-lcf/debug-logging
saforem2 Oct 14, 2024
324ef11
Merge branch 'alcf-hzheng-data-fix' into hzheng-data-fix
saforem2 Oct 15, 2024
45ff652
Discard changes to megatron/data/gpt_dataset.py
saforem2 Oct 15, 2024
52a406c
Consistent logging in `megatron/data/*.py`
saforem2 Oct 15, 2024
63b1901
Update `megatron/data/gpt_dataset.py`
saforem2 Oct 16, 2024
7ef26bf
Use `time.perf_counter` in `megatron/data/blendable_dataset.py`
saforem2 Oct 16, 2024
deb95cd
fix init issue for silently ignoring the deepspeed config (#452)
xylian86 Oct 17, 2024
68da2db
Update `ALCF/helpers.sh`
saforem2 Oct 17, 2024
ab3a8ec
Merge branch 'main' of https://github.com/microsoft/Megatron-DeepSpee…
saforem2 Oct 18, 2024
ed21bd9
Merge branch 'hzheng-data-fix' of https://github.com/argonne-lcf/Mega…
saforem2 Oct 18, 2024
6acc370
fix moe tflops (#445)
ranzhejiang Oct 18, 2024
467279b
Merge 'upstream/main' into `hzeng-data-fix`
saforem2 Oct 18, 2024
9e015cc
Remove duplicate `gradient_accumulation_steps` in DS config
saforem2 Oct 18, 2024
58dc2d7
Update default EVAL args
saforem2 Oct 21, 2024
277d308
Catch eval metrics in `megatron/training.py`
saforem2 Oct 21, 2024
af4cba1
Save git branch to env in `train_aGPT_7B.sh`
saforem2 Oct 21, 2024
8a8472c
fixed print out bug
zhenghh04 Oct 21, 2024
dfd0643
Merge pull request #68 from argonne-lcf/feature/blending_corpus
saforem2 Oct 21, 2024
6cb727d
Fix `args.shuffle` in `megatron/data/gpt_dataset.py`
saforem2 Oct 21, 2024
5d10179
Update `--{shuffle,blend}-sample-in-corpus` arg in `ALCF/helpers.sh`
saforem2 Oct 24, 2024
160d6a6
fix: `GRAD_ACC_STEPS` when `NHOSTS == 256`
saforem2 Oct 31, 2024
40db8c2
Merge pull request #63 from argonne-lcf/hzheng-data-fix
saforem2 Nov 5, 2024
ce7d553
🚧 `ALCF/ds_to_universal.py`
saforem2 Nov 7, 2024
8e0bff8
docs: Add `ALCF/notes/checkpoints.md`
saforem2 Nov 7, 2024
bd8c246
feat: Enable `--use-flash-attn-builder` by default on Aurora
saforem2 Nov 7, 2024
26f2e71
Update python.yml
saforem2 Nov 7, 2024
48b3c81
Update python.yml
saforem2 Nov 7, 2024
0a997bb
Update python.yml
saforem2 Nov 7, 2024
c4de4d1
Merge pull request #62 from argonne-lcf/microsoft-main
saforem2 Nov 12, 2024
1a36004
Update `ALCF/helpers.sh`
saforem2 Nov 16, 2024
8d0b43b
Merge pull request #69 from argonne-lcf/saforem2-helpers-fix
saforem2 Nov 16, 2024
b8007f4
fix: `GRAD_ACC_STEPS` on 32 nodes of Aurora
saforem2 Nov 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: python

on:
workflow_dispatch:
pull_request:
branches:
'**'
schedule:
- cron: "0 0 * * *"

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
unit-tests:
strategy:
matrix:
pyVersion: ["3.10"]
fail-fast: false

runs-on: ubuntu-22.04
container:
image: deepspeed/gh-builder:py${{ matrix.pyVersion }}

steps:
- uses: actions/checkout@v4

- name: environment
run: |
which python
python --version
- name: Install Megatron-DeepSpeed
run: |
pip3 install .
42 changes: 42 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,45 @@
# User Added
.jobenv
**.e[0-9]**
**.o[0-9]**
**.e6**
**.o6**
**.e9**
**.o9**
**.e1**
**.o1**
*.o17*
*.e17*
*.o1
*.e1
deps/*
OUTPUTS/*
ALCF/OUTPUTS/*
*tmp*
*core.*
*old*
*.bak
**index-cache**
**pbslogs**
ezpz
*hostfile*
.deepspeed_env
*.DS_Store
old/*
**venv**
*.json
outputs/
venvs/
wandb/
llama-logs/
checkpoints/
*.gz
*.txt
*.idx
*.bin
*.log
__pycache__

.deepspeed_env
*.bak
.cache/*
Expand Down
1,088 changes: 1,030 additions & 58 deletions ALCF/README.md

Large diffs are not rendered by default.

20 changes: 20 additions & 0 deletions ALCF/aws_ofi_nccl_plugin.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash --login

# AWS NCCL OFI Plugin settings below
export NCCL_CROSS_NIC=1
export NCCL_COLLNET_ENABLE=1
export NCCL_NET="AWS Libfabric"
export LD_LIBRARY_PATH=/soft/libraries/aws-ofi-nccl/v1.9.1-aws/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/soft/libraries/hwloc/lib/:$LD_LIBRARY_PATH
export FI_CXI_DISABLE_HOST_REGISTER=1
export FI_MR_CACHE_MONITOR=userfaultfd
export FI_CXI_DEFAULT_CQ_SIZE=131072
#########################################################
# WARNING: !!!
# - Currently, `export NCCL_NET_GDR_LEVEL=PHB`
# causes a hang on Polaris.
# so, we don't set it for the time being [2024-05-14].
# - Seems to work on Perlmutter ???
#
# export NCCL_NET_GDR_LEVEL=PHB
#########################################################
16 changes: 16 additions & 0 deletions ALCF/data-lists/aurora/algebraic.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
0.0018520780893211373 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0000_text_document algebraic-stack-train
0.0017591050606817512 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0001_text_document algebraic-stack-train
0.001459052794333798 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0002_text_document algebraic-stack-train
0.0007405667281569194 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0003_text_document algebraic-stack-train
0.00019420030110896795 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0004_text_document algebraic-stack-train
0.0009008668715801845 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0005_text_document algebraic-stack-train
0.00015115827957143057 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0006_text_document algebraic-stack-train
0.0014552844319220648 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0007_text_document algebraic-stack-train
0.0012469861325685161 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0008_text_document algebraic-stack-train
0.00136412011372413 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0009_text_document algebraic-stack-train
0.0007064279699221103 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0010_text_document algebraic-stack-train
0.0008472240000687427 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0011_text_document algebraic-stack-train
0.0001984375713341955 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0012_text_document algebraic-stack-train
0.0005472773881697123 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0013_text_document algebraic-stack-train
0.001815779629850992 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0014_text_document algebraic-stack-train
0.0018313600689757324 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/algebraic-stack-train-0015_text_document algebraic-stack-train
100 changes: 100 additions & 0 deletions ALCF/data-lists/aurora/arxiv.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
0.0002583902668716813 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0000_text_document arxiv
0.0002646575141232155 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0001_text_document arxiv
0.0003165521247456758 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0002_text_document arxiv
0.0002920706460176214 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0003_text_document arxiv
0.00028396813182810215 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0004_text_document arxiv
0.00030445161883108107 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0005_text_document arxiv
0.00031628781276576474 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0006_text_document arxiv
0.0003083776568189157 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0007_text_document arxiv
0.0003176359471472902 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0008_text_document arxiv
0.0002536009369131698 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0009_text_document arxiv
0.0003067491424681363 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0010_text_document arxiv
0.0002597217257557784 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0011_text_document arxiv
0.0003788556450109768 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0012_text_document arxiv
0.0002796563272052598 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0013_text_document arxiv
0.00033573826524290287 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0014_text_document arxiv
0.00030523658022800287 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0015_text_document arxiv
0.00032211552192240096 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0016_text_document arxiv
0.0003329295675164247 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0017_text_document arxiv
0.0003101982186639862 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0018_text_document arxiv
0.00032361798234223355 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0019_text_document arxiv
0.0003495541581652915 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0020_text_document arxiv
0.0002821637448858042 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0021_text_document arxiv
0.00030399523537629673 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0022_text_document arxiv
0.0002955658968247219 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0023_text_document arxiv
0.00028942158502924254 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0024_text_document arxiv
0.00028769546171490733 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0025_text_document arxiv
0.0002938111057234182 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0026_text_document arxiv
0.0002711150403010948 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0027_text_document arxiv
0.00031130095874747565 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0028_text_document arxiv
0.0003002996118160777 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0029_text_document arxiv
0.0003732757901604459 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0030_text_document arxiv
0.00026784205751795894 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0031_text_document arxiv
0.0002799626521661984 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0032_text_document arxiv
0.00034334276069078164 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0033_text_document arxiv
0.0003582469803674965 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0034_text_document arxiv
0.00031094844818418623 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0035_text_document arxiv
0.0002766228384977191 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0036_text_document arxiv
0.00030297116159471485 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0037_text_document arxiv
0.00027033888377464685 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0038_text_document arxiv
0.00030090862368377933 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0039_text_document arxiv
0.00028543875802490955 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0040_text_document arxiv
0.00027559768459074204 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0041_text_document arxiv
0.0003182185533962886 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0042_text_document arxiv
0.0003311392971435837 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0043_text_document arxiv
0.00028751652060804325 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0044_text_document arxiv
0.000303466863212589 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0045_text_document arxiv
0.00033400462801277524 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0046_text_document arxiv
0.0002589234031777426 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0047_text_document arxiv
0.0002913508598466723 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0048_text_document arxiv
0.0002670572450004856 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0049_text_document arxiv
0.00032027399105647656 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0050_text_document arxiv
0.00032188376258379377 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0051_text_document arxiv
0.0003161585784100882 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0052_text_document arxiv
0.0003184249182974135 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0053_text_document arxiv
0.00030381336664000807 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0054_text_document arxiv
0.0003190437442184283 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0055_text_document arxiv
0.0002537961798200545 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0056_text_document arxiv
0.0003017817117223326 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0057_text_document arxiv
0.00028685268513240224 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0058_text_document arxiv
0.00031265179094451165 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0059_text_document arxiv
0.00034708319096986816 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0060_text_document arxiv
0.00026650837943080664 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0061_text_document arxiv
0.00034588832248507335 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0062_text_document arxiv
0.0002416982248399037 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0063_text_document arxiv
0.0003089296918222243 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0064_text_document arxiv
0.00029137184185700827 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0065_text_document arxiv
0.00026464226846800774 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0066_text_document arxiv
0.00030545397919456627 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0067_text_document arxiv
0.0003206778460448875 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0068_text_document arxiv
0.00030968971641110967 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0069_text_document arxiv
0.00023325653928600864 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0070_text_document arxiv
0.00030526899198338555 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0071_text_document arxiv
0.00035376719076633584 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0072_text_document arxiv
0.000290224385981026 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0073_text_document arxiv
0.000294650083382008 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0074_text_document arxiv
0.00028768858128616436 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0075_text_document arxiv
0.00030856965235527843 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0076_text_document arxiv
0.00030579942447879054 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0077_text_document arxiv
0.0002863101084704357 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0078_text_document arxiv
0.0002870032092492213 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0079_text_document arxiv
0.000264182727569885 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0080_text_document arxiv
0.0002974012367036449 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0081_text_document arxiv
0.00032238412143059203 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0082_text_document arxiv
0.00031683716893819036 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0083_text_document arxiv
0.00031157434937617524 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0084_text_document arxiv
0.0003411742735695989 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0085_text_document arxiv
0.00026778444816570715 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0086_text_document arxiv
0.0003037045797275201 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0087_text_document arxiv
0.00027746114370081314 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0088_text_document arxiv
0.00027148285946862043 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0089_text_document arxiv
0.00028042950114678207 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0090_text_document arxiv
0.0003235607816590721 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0091_text_document arxiv
0.0003086692227306295 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0092_text_document arxiv
0.00033990349455148105 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0093_text_document arxiv
0.00030945053208470265 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0094_text_document arxiv
0.00027309074552265303 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0095_text_document arxiv
0.00028737393506316194 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0096_text_document arxiv
0.0003098868328009879 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0097_text_document arxiv
0.0002614229162588409 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0098_text_document arxiv
0.0002884388407820923 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/arxiv-0099_text_document arxiv
3 changes: 3 additions & 0 deletions ALCF/data-lists/aurora/books.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
0.0031025147279277244 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/books-0000_text_document books
0.003102019887362634 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/books-0001_text_document books
0.0009996745994661548 /flare/Aurora_deployment/AuroraGPT/datasets/dolma/data_v1.7_Llama2Tokenizer/books-0002_text_document books
Loading