-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support transfer llama hf weight to megatron weight #246
base: main
Are you sure you want to change the base?
Conversation
@microsoft-github-policy-service agree |
parallel_output=True, | ||
pre_process=pre_process, | ||
post_process=post_process) | ||
|
||
with deepspeed.zero.Init(sequence_data_parallel_group=mpu.get_sequence_data_parallel_group(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there must be some better solution to init model without init distibute group. please help me ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The distributed initialization only occurs for args.zero_stage==3
. Have you tried with different stage value on command line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The distributed initialization only occurs for
args.zero_stage==3
. Have you tried with different stage value on command line?
The problem is mpu.get_sequence_data_parallel_group()
. How can I solve this problem?
File "/mnt/megatron-deepspeed/pretrain_gpt.py", line 48, in model_provider
with deepspeed.zero.Init(sequence_data_parallel_group=mpu.get_sequence_data_parallel_group(),
File "/mnt/megatron-deepspeed/megatron/core/parallel_state.py", line 369, in get_sequence_data_parallel_group
assert _SEQUENCE_DATA_PARALLEL_GROUP is not None, \
AssertionError: sequence data parallel group is not initialized
[feature]add weight transfer script for llama2
add llama transfer script
… into fy/hf2megatron
86dcc48
to
e9191fb
Compare
hi @tjruwase. The code has been completed. Could you please take some time to review this pull request? |
How to convert megatron model to deepspeed? |
@uygnef, thanks for the PR. Will review now. |
tools/convert_checkpoint/weights2megatron/weights2megatron_llama.py
Outdated
Show resolved
Hide resolved
tools/convert_checkpoint/weights2megatron/weights2megatron_llama.py
Outdated
Show resolved
Hide resolved
tools/convert_checkpoint/weights2megatron/weights2megatron_llama.py
Outdated
Show resolved
Hide resolved
95dec64
to
6c71b7d
Compare
6c71b7d
to
fbb2ef9
Compare
hello @tjruwase |
|
hi,
|
Do you need transfer it to hf ckpt? this script can help you. https://github.com/epfLLM/Megatron-LLM/blob/main/weights_conversion/megatron_to_hf.py Some weight name should be change |
megatron/model/transformer.py
Outdated
|
||
self.enable_ds_sequence_parallel = parallel_state.get_sequence_parallel_world_size() > 1 \ | ||
or args.force_ds_sequence_parallel | ||
if hasattr(args, 'ckpt_transfer') and args.ckpt_transfer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not notice --ckpt_transfer
in the argument parsing code. How is this attribute added to args
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the ckpt splitting program loads the model, it actually doesn't initialize the parallel_state, so running parallel_state.get_sequence_parallel_world_size() will cause an error.
File "/mnt/megatron-deepspeed/megatron/core/parallel_state.py", line 362, in get_sequence_parallel_group
assert _SEQUENCE_PARALLEL_GROUP is not None, \
AssertionError: sequence parallel group is not initialized
Therefore, I used ckpt_transfer to skip getting get_sequence_parallel_world_size.
I also think this modification is not good, do you have any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not notice
--ckpt_transfer
in the argument parsing code. How is this attribute added toargs
?
I understand that you are likely busy with many responsibilities, but I would greatly appreciate your feedback on this PR when you get a chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not notice
--ckpt_transfer
in the argument parsing code. How is this attribute added toargs
?I understand that you are likely busy with many responsibilities, but I would greatly appreciate your feedback on this PR when you get a chance.
Hi, @uygnef , thank you for great your work! I am trying to use this script for convert HF LLAMA to Megatron-Deepspeed format and I met the same error AssertionError: sequence parallel group is not initialized
. Do you solve this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not notice
--ckpt_transfer
in the argument parsing code. How is this attribute added toargs
?I understand that you are likely busy with many responsibilities, but I would greatly appreciate your feedback on this PR when you get a chance.
Hi, @uygnef, I changed ckpt_transfer
parameter so it works. But it seems the output format is Megatron-LM format not Megatron-DeepSpeed format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @uygnef, thank you so much for this pr! Would it be possible for you to provide an example of a launch script(pretrain or finetune) for it?
This PR updates how the enable_cuda_graph param is set depending on the world_size i.e. CUDA graphs should only be enabled when world_size==1.
Hi there,
I hope this message finds you well. I would like to request the availability of the pretrained checkpoint for the pretrain and SFT stages of the project. Currently, there is no corresponding checkpoint available for llama2 in the Megatron repository.
To address this issue, I have modify a script from that facilitates the conversion from hf (Hugging Face) format to Megatron format. This script will enable the usage of llama2's pretrained checkpoint in the Megatron framework.
Please let me know if there are any further steps required or if you need any additional information from my end to proceed with this request.
Thank you for your attention and assistance.
Best regards,