Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add normalizedconfig for baichuan and mixformer-sequential #1394

Closed

Conversation

changwangss
Copy link
Contributor

@changwangss changwangss commented Sep 19, 2023

What does this PR do?

Fixes # (issue)
we want to use the models https://huggingface.co/baichuan-inc/Baichuan2-13B-Base and https://huggingface.co/microsoft/phi-1, but the normalizedconfig doesn't contain.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Signed-off-by: changwangss <[email protected]>
Signed-off-by: changwangss <[email protected]>
@changwangss changwangss requested a review from fxmarty September 19, 2023 07:28
@fxmarty
Copy link
Contributor

fxmarty commented Sep 19, 2023

Yes, normalizedconfig does not work well with custom models. WDYT @michaelbenayoun ?

@michaelbenayoun
Copy link
Member

I think the issu is that we can end-up with multiple names potentially pointing to different NormalizedConfigs.

I think what we can do is allow the user to provide a NormalizedConfig on its own:

  1. When exporting the model via the export function it is trivial to do so
  2. When exporting the model via the CLI, it is less trivial: either the desired NormalizedConfig already exists and in this case it can simply be the name of the NormalizedConfig to use, or it does not exist yet, in which case I think we can point the user to use the export function?

@fxmarty
Copy link
Contributor

fxmarty commented Sep 19, 2023

couldn't we just be greedy and have a heuristic of most commonly used key names, and try to use those?

@michaelbenayoun
Copy link
Member

Yes, we could do that.

@changwangss
Copy link
Contributor Author

I think the issu is that we can end-up with multiple names potentially pointing to different NormalizedConfigs.

I think what we can do is allow the user to provide a NormalizedConfig on its own:

  1. When exporting the model via the export function it is trivial to do so
  2. When exporting the model via the CLI, it is less trivial: either the desired NormalizedConfig already exists and in this case it can simply be the name of the NormalizedConfig to use, or it does not exist yet, in which case I think we can point the user to use the export function?

Could you have the plan continue to do this feature? @fxmarty @michaelbenayoun
I close the PR first and hope the feature can be completed soon

@changwangss changwangss closed this Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants