【BAAI】add MoFlow pretraining std case #397

yuzhou03 · 2024-01-17T09:33:32Z

No description provided.

yuzhou03 · 2024-01-17T09:39:02Z

以下目录下的代码搬运自原始仓库：

moflow/pytorch/runtime
moflow/pytorch/data

training/benchmarks/moflow/pytorch/config/_base.py

training/benchmarks/moflow/pytorch/run_pretraining.py

training/benchmarks/moflow/pytorch/runtime/__init__.py

training/benchmarks/moflow/pytorch/train/trainer_adapter.py

training/benchmarks/moflow/pytorch/config/_base.py

training/benchmarks/moflow/pytorch/config/mutable_params.py

…lowed

yuzhou03 · 2024-03-02T11:34:12Z

1x8 official bs log:

evaluate results: {'loglik': 1.187796175479889, 'nll_x': 0.6530599296092987, 'nll_adj': 0.5347362458705902, 'valid': 80.875, 'unique': 100.0, 'validity': 88.572265625, 'novelty': 100.0, 'uniqueness': 99.8599200437043, 'abs_novelty': 88.572265625, 'abs_uniqueness': 88.4482421875, 'nuv': 88.4482421875}
converged_success. eval_nuv: 88.4482421875, target_nuv: 80
[PerfLog] {"event": "FINISHED", "value": {"e2e_time": 3219.93017244339, "train_time": 2493.0309982299805, "train_no_eval_time": 2448.092131955719, "pure_training_computing_time": 2188.087756736374, "throughput(ips)_raw": 27023.49, "throughput(ips)_no_eval": 27519.55, "throughput(ips)_pure_compute": 30789.62, "converged": true, "final_nuv": 88.4482421875}, "metadata": {"file": "/home/zhouyu/workspace/FlagPerf/training/benchmarks/moflow/pytorch/run_pretraining.py", "lineno": 196, "time_ms": 1709360005975, "rank": -1}}

1x8 log(official bs)
moflow-nvidia-a100-1x8-nuv88.45-converge.zip

* add MoFlow std case * update readme * add case example for test_conf * change to comment * rdkit add version * add jit & cuda_graph to mutable_params, overwritten by vendors are allowed * rename config_name to dataset_name * set time statistic variables to 0 * update seed and target_nuv * update 1x8 result for official bs * update notice for readme * Update test_conf.py --------- Co-authored-by: zhouyu <[email protected]> Co-authored-by: shh2000 <[email protected]>

* commit * . * fix * fix * add net * fix * Update README.md * Update README.md * Update README.md * add package version * Update README.md fix * add aquila_7b_finetune * fix * Update cluster_conf.py * Update README.md * Update flagscale_main.sh * fix * Update README.md * fix * Update README.md * Update test_conf.py * update t5small & txl readme (#443) * update t5small pytorch version & doc * add training result for 1x1, 2x8 * update readme for txl --------- Co-authored-by: zhouyu <[email protected]> * [metax] bert_hf (#456) * add bert_hf result * Update README.md 1 * [metax] add efficientnet (#455) * add efficientnet * add code --------- Co-authored-by: xiaofeng guo <[email protected]> * add nv results for distilbert (#458) Co-authored-by: zhouyu <[email protected]> * Update test_conf.py (#461) [metax] add waveglow case * Merge Aquila70B and others into main branch (#460) * Aquila multinode (#349) * init * 123123 * rm privacy * fixnet * monitor * 123123 * fix * add req * 123123 * 123132 * 23123 * 123123 * 123123 * 123123 * 123123 * 23123 * Aquila 34/70B (#364) * init * 123123 * rm privacy * fixnet * monitor * 123123 * fix * add req * 123123 * 123132 * 23123 * 123123 * 123123 * 123123 * 123123 * 23123 * try-3470 * sync FlagScale & vendor_shell (#374) * sync FlagScale & vendor_shell * fix * 123 * add vis * add vis * add vis * add vis * add vis * [Kunlun] add aquila 7b/34b/70b pretrained for ai platform (#396) * [Kunlun] add aquila 7b/34b/70b pretrained for ai platform * [Kunlun] add aquila 7b/34b/70b pretrained for ai platform * [Kunlun] add monitory.py for xpu * [Kunlun] add Dockerfile * [Kunlun] add 2 file: monitor data processing file * [Kunlun] add singlenode_correctness.sh file * [Kunlun] rm lr config in singlenode_adapt.sh * [Kunlun] add 7B mpu config --------- Co-authored-by: root <[email protected]> * [Cambricon] support FlagPerf (#398) * [Cambricon] support FlagPerf * [Cambricon] fixed vendor name in singlenode_adapt.sh; deleted useless directory in aquila2_7B_container-in_container; fixed standalone_monitor.py&cambricon_monitor * [Cambricon] revised standalone_monitor.py (#424) * [mthreads] support Aquila2 7B/34B/70B (#385) * [mthreads] support Aquila2 7B/34B/70B * [mthreads] add config and singlenode adapt for 34B/70B * add singlenode_correctness.sh * add display_line and scatter_gpu script * add recompute attention&layernorm --------- Co-authored-by: yehua.zhang <[email protected]> * [mthreads] modify recompute argument (#426) * [mthreads] modify recompute argument * add 70B of 128&256 gpus' recompute argument --------- Co-authored-by: yehua.zhang <[email protected]> * [DCU] support Aquila2 7B/34B/70B (#427) * added dcu-aquila2 * added dcu-aquila2 * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update singlenode_adapt.sh * Update config.py * Update singlenode_run.sh * Update config.py * Update singlenode_run.sh * updata run_benchmarks * added readme for in_cluster --------- Co-authored-by: ying zhao <yingzhao27> Co-authored-by: shh2000 <[email protected]> * [Iluvatar] support Aquila2 7B/34B/70B. (#435) * update iluvatar aquila2 7b/34b/70b. update iluvatar aquila2 7b/34b/70b. * update iluvatar aquila2 7B/34B/70B configuration parameters. update iluvatar aquila2 7B/34B/70B configuration parameters. * update iluvatar Aquila2 7B accuracy testing. update iluvatar Aquila2 7B accuracy testing. * Move the location of the iluvatar visual script. Move the location of the iluvatar visual script. * [Ascend] Support Aquila2 7B (#433) * [Ascend] Support Aquila2 34B&70B (#436) * DCU mini-update (#447) * [Mthreads] modify 7B test args (#445) * [Cambricon] add display_line.py, scatter_gpu.py and revise standalone_monitor.py (#449) * [Ascend] Update training scripts for Aquila2 (#450) * [Ascend] update monitor (#451) * Update singlenode_adapt.sh (#452) * [Ascend] modify monitor (#453) * [Ascend] Update scripts for Aquila2 (#454) --------- Co-authored-by: helen88 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: HawkL327 <[email protected]> Co-authored-by: shang-mt <[email protected]> Co-authored-by: yehua.zhang <[email protected]> Co-authored-by: Ying Zhao <[email protected]> Co-authored-by: forestlee95 <[email protected]> Co-authored-by: LoomisChen <[email protected]> Co-authored-by: Haitao Wang <[email protected]> * [metax] Waveglow pr (#457) * add t5_small and transformer_xl * Update README.md change t5_small readme * Update README.md change transformer_xl readme * Update README.md fix t5_small readme * Update README.md fix transformer_xl readme * Update README.md fix t5_small readme requirement.txt path * first commit * Update README.md change metax val loss to -5.7461 --------- Co-authored-by: jiaxing xie <[email protected]> * update readme (#463) * format? * ur * update readme (#464) Co-authored-by: zhouyu <[email protected]> * 【Metax】Add mobilenetv2 (#465) * add mobilenetv2 * fix * fix * Update test_conf.py (#469) [metax] add bert_hf /glm sample Co-authored-by: shh2000 <[email protected]> * [metax] add glm result (#466) * add bert_hf result * Update README.md 1 * add glm result * [metax] Update glm README.md * 【metax】add model mask_rcnn and detr (#459) * add model mask_rcnn and detr * maskrcnn & detr model logs * [metax] stablediffusion inference pr (#468) * update * update inference * update readme * update dockerfile --------- Co-authored-by: Shengchu Zhao <[email protected]> * [KUNLUN] add llama70B case (#470) * [KUNLUN] add llama70B case * [KUNLUN] add llama70B case * Merge branch 'main' of https://github.com/ZLkanyo009/FlagPerf into main * Update README.md --------- Co-authored-by: zhangling21 <[email protected]> * [metax] swintransformer-inference pr (#473) * add metax swin-transformer * mod readme * mod readme * mod swin * Update README.md * Update config_common.py * Update requirements.txt * fix torch_six in swin_transformer * Update utils.py * add metax swintrans-infer --------- Co-authored-by: jingyifa <[email protected]> * [DCU]Add glm case of dcu in FlagPerf. (#472) * Add glm case of dcu in Flagperf. * update 1*1 log * Update README infos in glm_pytorch of DCU. --------- Co-authored-by: shh2000 <[email protected]> * add resnet infer metax (#474) Co-authored-by: yaguang.wuyaguang <[email protected]> * [metax] add bert_large inference result (#476) * add bert_hf result * Update README.md 1 * add glm result * [metax] Update glm README.md * update metax bertlarge inference result * update metax bert_large inference result * Update README.md * 【BAAI】add MoFlow pretraining std case (#397) * add MoFlow std case * update readme * add case example for test_conf * change to comment * rdkit add version * add jit & cuda_graph to mutable_params, overwritten by vendors are allowed * rename config_name to dataset_name * set time statistic variables to 0 * update seed and target_nuv * update 1x8 result for official bs * update notice for readme * Update test_conf.py --------- Co-authored-by: zhouyu <[email protected]> Co-authored-by: shh2000 <[email protected]> * ur (#478) * [metax]添加sam/vit推理结果 (#477) * 添加sam/vit推理结果 * 添加硬件信息 --------- Co-authored-by: fdeng <[email protected]> * 【Metax】Add yolov5 infer (#479) * add yolov5 * add readme * fix * Update Dockerfile * . add llava case * v2 . * . . * . * commit commit * readme1 commit * llava13b llava13b * llava1.5_7b llava1.5_7b * del del * del del * fix fix * del del * add add * add add --------- Co-authored-by: shh2000 <[email protected]> Co-authored-by: Zhou Yu <[email protected]> Co-authored-by: zhouyu <[email protected]> Co-authored-by: Kathrine <[email protected]> Co-authored-by: xfguo <[email protected]> Co-authored-by: xiaofeng guo <[email protected]> Co-authored-by: sherryxie1 <[email protected]> Co-authored-by: helen88 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: HawkL327 <[email protected]> Co-authored-by: shang-mt <[email protected]> Co-authored-by: yehua.zhang <[email protected]> Co-authored-by: Ying Zhao <[email protected]> Co-authored-by: forestlee95 <[email protected]> Co-authored-by: LoomisChen <[email protected]> Co-authored-by: Haitao Wang <[email protected]> Co-authored-by: jiaxing xie <[email protected]> Co-authored-by: 会意 <[email protected]> Co-authored-by: happyxuwork <[email protected]> Co-authored-by: fred1912 <[email protected]> Co-authored-by: Shengchu Zhao <[email protected]> Co-authored-by: Ling Zhang <[email protected]> Co-authored-by: zhangling21 <[email protected]> Co-authored-by: FaJingyi <[email protected]> Co-authored-by: jingyifa <[email protected]> Co-authored-by: Rayyyyy <[email protected]> Co-authored-by: jsnoc <[email protected]> Co-authored-by: yaguang.wuyaguang <[email protected]> Co-authored-by: dfgan <[email protected]> Co-authored-by: fdeng <[email protected]>