Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【BAAI】add MoFlow pretraining std case #397

Merged
merged 15 commits into from
Mar 6, 2024
Merged

Conversation

yuzhou03
Copy link
Contributor

No description provided.

@yuzhou03
Copy link
Contributor Author

以下目录下的代码搬运自原始仓库:

  • moflow/pytorch/runtime
  • moflow/pytorch/data

@yuzhou03
Copy link
Contributor Author

yuzhou03 commented Mar 2, 2024

1x8 official bs log:

evaluate results: {'loglik': 1.187796175479889, 'nll_x': 0.6530599296092987, 'nll_adj': 0.5347362458705902, 'valid': 80.875, 'unique': 100.0, 'validity': 88.572265625, 'novelty': 100.0, 'uniqueness': 99.8599200437043, 'abs_novelty': 88.572265625, 'abs_uniqueness': 88.4482421875, 'nuv': 88.4482421875}
converged_success. eval_nuv: 88.4482421875, target_nuv: 80
[PerfLog] {"event": "FINISHED", "value": {"e2e_time": 3219.93017244339, "train_time": 2493.0309982299805, "train_no_eval_time": 2448.092131955719, "pure_training_computing_time": 2188.087756736374, "throughput(ips)_raw": 27023.49, "throughput(ips)_no_eval": 27519.55, "throughput(ips)_pure_compute": 30789.62, "converged": true, "final_nuv": 88.4482421875}, "metadata": {"file": "/home/zhouyu/workspace/FlagPerf/training/benchmarks/moflow/pytorch/run_pretraining.py", "lineno": 196, "time_ms": 1709360005975, "rank": -1}}

1x8 log(official bs)
moflow-nvidia-a100-1x8-nuv88.45-converge.zip

@shh2000 shh2000 merged commit c8aed9f into FlagOpen:main Mar 6, 2024
1 check passed
nrikoh pushed a commit to nrikoh/FlagPerf that referenced this pull request Mar 14, 2024
* add MoFlow std case

* update readme

* add case example for test_conf

* change to comment

* rdkit add version

* add jit & cuda_graph to mutable_params, overwritten by vendors are allowed

* rename config_name to dataset_name

* set time statistic variables to 0

* update seed and target_nuv

* update 1x8 result for official bs

* update notice for readme

* Update test_conf.py

---------

Co-authored-by: zhouyu <[email protected]>
Co-authored-by: shh2000 <[email protected]>
shh2000 added a commit that referenced this pull request Apr 9, 2024
* commit

* .

* fix

* fix

* add net

* fix

* Update README.md

* Update README.md

* Update README.md

* add package version

* Update README.md

fix

* add aquila_7b_finetune

* fix

* Update cluster_conf.py

* Update README.md

* Update flagscale_main.sh

* fix

* Update README.md

* fix

* Update README.md

* Update test_conf.py

* update t5small & txl readme (#443)

* update t5small pytorch version & doc

* add training result for 1x1, 2x8

* update readme for txl

---------

Co-authored-by: zhouyu <[email protected]>

* [metax] bert_hf (#456)

* add bert_hf result

* Update README.md

1

* [metax] add efficientnet (#455)

* add efficientnet

* add code

---------

Co-authored-by: xiaofeng guo <[email protected]>

* add nv results for distilbert (#458)

Co-authored-by: zhouyu <[email protected]>

* Update test_conf.py (#461)

[metax] add waveglow case

* Merge Aquila70B and others into main branch (#460)

* Aquila multinode (#349)

* init

* 123123

* rm privacy

* fixnet

* monitor

* 123123

* fix

* add req

* 123123

* 123132

* 23123

* 123123

* 123123

* 123123

* 123123

* 23123

* Aquila 34/70B (#364)

* init

* 123123

* rm privacy

* fixnet

* monitor

* 123123

* fix

* add req

* 123123

* 123132

* 23123

* 123123

* 123123

* 123123

* 123123

* 23123

* try-3470

* sync FlagScale & vendor_shell (#374)

* sync FlagScale & vendor_shell

* fix

* 123

* add vis

* add vis

* add vis

* add vis

* add vis

* [Kunlun] add aquila 7b/34b/70b pretrained for ai platform (#396)

* [Kunlun] add aquila 7b/34b/70b pretrained for ai platform

* [Kunlun] add aquila 7b/34b/70b pretrained for ai platform

* [Kunlun] add monitory.py for xpu

* [Kunlun] add Dockerfile

* [Kunlun] add 2 file: monitor data processing file

* [Kunlun] add singlenode_correctness.sh file

* [Kunlun] rm lr config in singlenode_adapt.sh

* [Kunlun] add 7B mpu config

---------

Co-authored-by: root <[email protected]>

* [Cambricon] support FlagPerf (#398)

* [Cambricon] support FlagPerf

* [Cambricon] fixed vendor name in singlenode_adapt.sh; deleted useless directory in aquila2_7B_container-in_container; fixed standalone_monitor.py&cambricon_monitor

* [Cambricon] revised standalone_monitor.py (#424)

* [mthreads] support Aquila2 7B/34B/70B (#385)

* [mthreads] support Aquila2 7B/34B/70B

* [mthreads] add config and singlenode adapt for 34B/70B

* add singlenode_correctness.sh

* add display_line and scatter_gpu script

* add recompute attention&layernorm

---------

Co-authored-by: yehua.zhang <[email protected]>

* [mthreads] modify recompute argument (#426)

* [mthreads] modify recompute argument

* add 70B of 128&256 gpus' recompute argument

---------

Co-authored-by: yehua.zhang <[email protected]>

* [DCU]  support Aquila2 7B/34B/70B (#427)

* added dcu-aquila2

* added dcu-aquila2

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update singlenode_adapt.sh

* Update config.py

* Update singlenode_run.sh

* Update config.py

* Update singlenode_run.sh

* updata run_benchmarks

* added readme for in_cluster

---------

Co-authored-by: ying zhao <yingzhao27>
Co-authored-by: shh2000 <[email protected]>

* [Iluvatar] support Aquila2 7B/34B/70B. (#435)

* update iluvatar aquila2 7b/34b/70b.

update iluvatar aquila2 7b/34b/70b.

* update iluvatar aquila2 7B/34B/70B configuration parameters.

update iluvatar aquila2 7B/34B/70B configuration parameters.

* update iluvatar Aquila2 7B accuracy testing.

update iluvatar Aquila2 7B accuracy testing.

* Move the location of the iluvatar visual script.

Move the location of the iluvatar visual script.

* [Ascend] Support Aquila2 7B (#433)

* [Ascend] Support Aquila2 34B&70B (#436)

* DCU mini-update (#447)

* [Mthreads] modify 7B test args (#445)

* [Cambricon] add display_line.py, scatter_gpu.py and revise standalone_monitor.py (#449)

* [Ascend] Update training scripts for Aquila2 (#450)

* [Ascend] update monitor (#451)

* Update singlenode_adapt.sh (#452)

* [Ascend] modify monitor (#453)

* [Ascend] Update scripts for Aquila2 (#454)

---------

Co-authored-by: helen88 <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: HawkL327 <[email protected]>
Co-authored-by: shang-mt <[email protected]>
Co-authored-by: yehua.zhang <[email protected]>
Co-authored-by: Ying Zhao <[email protected]>
Co-authored-by: forestlee95 <[email protected]>
Co-authored-by: LoomisChen <[email protected]>
Co-authored-by: Haitao Wang <[email protected]>

* [metax] Waveglow pr (#457)

* add t5_small and transformer_xl

* Update README.md

change t5_small readme

* Update README.md

change transformer_xl readme

* Update README.md

fix t5_small readme

* Update README.md

fix transformer_xl readme

* Update README.md

fix t5_small readme requirement.txt path

* first commit

* Update README.md

change metax val loss to -5.7461

---------

Co-authored-by: jiaxing xie <[email protected]>

* update readme (#463)

* format?

* ur

* update readme (#464)

Co-authored-by: zhouyu <[email protected]>

* 【Metax】Add mobilenetv2 (#465)

* add mobilenetv2

* fix

* fix

* Update test_conf.py (#469)

[metax] add bert_hf /glm sample

Co-authored-by: shh2000 <[email protected]>

* [metax] add glm result (#466)

* add bert_hf result

* Update README.md

1

* add glm result

* [metax] Update glm README.md

* 【metax】add model mask_rcnn and detr (#459)

* add model mask_rcnn and detr

* maskrcnn & detr model logs

* [metax] stablediffusion inference pr (#468)

* update

* update inference

* update readme

* update dockerfile

---------

Co-authored-by: Shengchu Zhao <[email protected]>

* [KUNLUN] add llama70B case (#470)

* [KUNLUN] add llama70B case

* [KUNLUN] add llama70B case

* Merge branch 'main' of https://github.com/ZLkanyo009/FlagPerf into main

* Update README.md

---------

Co-authored-by: zhangling21 <[email protected]>

* [metax] swintransformer-inference pr (#473)

* add metax swin-transformer

* mod readme

* mod readme

* mod swin

* Update README.md

* Update config_common.py

* Update requirements.txt

* fix torch_six in swin_transformer

* Update utils.py

* add metax swintrans-infer

---------

Co-authored-by: jingyifa <[email protected]>

* [DCU]Add glm case of dcu in FlagPerf. (#472)

* Add glm case of dcu in Flagperf.

* update 1*1 log

* Update README infos in glm_pytorch of DCU.

---------

Co-authored-by: shh2000 <[email protected]>

* add resnet infer metax (#474)

Co-authored-by: yaguang.wuyaguang <[email protected]>

* [metax] add bert_large inference result (#476)

* add bert_hf result

* Update README.md

1

* add glm result

* [metax] Update glm README.md

* update metax bertlarge inference result

* update metax bert_large inference result

* Update README.md

* 【BAAI】add MoFlow pretraining std case (#397)

* add MoFlow std case

* update readme

* add case example for test_conf

* change to comment

* rdkit add version

* add jit & cuda_graph to mutable_params, overwritten by vendors are allowed

* rename config_name to dataset_name

* set time statistic variables to 0

* update seed and target_nuv

* update 1x8 result for official bs

* update notice for readme

* Update test_conf.py

---------

Co-authored-by: zhouyu <[email protected]>
Co-authored-by: shh2000 <[email protected]>

* ur (#478)

* [metax]添加sam/vit推理结果 (#477)

* 添加sam/vit推理结果

* 添加硬件信息

---------

Co-authored-by: fdeng <[email protected]>

* 【Metax】Add yolov5 infer (#479)

* add yolov5

* add readme

* fix

* Update Dockerfile

* .

add llava case

* v2

.

* .

.

* .

* commit

commit

* readme1

commit

* llava13b

llava13b

* llava1.5_7b

llava1.5_7b

* del

del

* del

del

* fix

fix

* del

del

* add

add

* add

add

---------

Co-authored-by: shh2000 <[email protected]>
Co-authored-by: Zhou Yu <[email protected]>
Co-authored-by: zhouyu <[email protected]>
Co-authored-by: Kathrine <[email protected]>
Co-authored-by: xfguo <[email protected]>
Co-authored-by: xiaofeng guo <[email protected]>
Co-authored-by: sherryxie1 <[email protected]>
Co-authored-by: helen88 <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: HawkL327 <[email protected]>
Co-authored-by: shang-mt <[email protected]>
Co-authored-by: yehua.zhang <[email protected]>
Co-authored-by: Ying Zhao <[email protected]>
Co-authored-by: forestlee95 <[email protected]>
Co-authored-by: LoomisChen <[email protected]>
Co-authored-by: Haitao Wang <[email protected]>
Co-authored-by: jiaxing xie <[email protected]>
Co-authored-by: 会意 <[email protected]>
Co-authored-by: happyxuwork <[email protected]>
Co-authored-by: fred1912 <[email protected]>
Co-authored-by: Shengchu Zhao <[email protected]>
Co-authored-by: Ling Zhang <[email protected]>
Co-authored-by: zhangling21 <[email protected]>
Co-authored-by: FaJingyi <[email protected]>
Co-authored-by: jingyifa <[email protected]>
Co-authored-by: Rayyyyy <[email protected]>
Co-authored-by: jsnoc <[email protected]>
Co-authored-by: yaguang.wuyaguang <[email protected]>
Co-authored-by: dfgan <[email protected]>
Co-authored-by: fdeng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants