-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add et export with gguf with test #245
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving with mixed emotions.
model_to_pte = model | ||
model_to_dso = model | ||
else: | ||
if output_pte_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very kludgy and I would prefer to export to int4 and then handle it from there. Basing front end decisions on backend is a very bad practice because we're going to end up in a world of hurt.
Kimish and I had discussed doing a transform from int4 ->a8w4dq. Right now we just get a de-quantized model.
Please plan to land that asap, Kimish?
cc: @kimishpatel
with torch.no_grad(): | ||
if output_pte_path: | ||
output_pte_path = str(os.path.abspath(output_pte_path)) | ||
print(f">{output_pte_path}<") | ||
if executorch_export_available: | ||
print(f"Exporting model using Executorch to {output_pte_path}") | ||
export_model_et(model, builder_args.device, args.output_pte_path, args) | ||
export_model_et(model_to_pte, builder_args.device, args.output_pte_path, args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this at all :( But we're out of runway, so I will approve for now.
@@ -68,7 +90,7 @@ def main(args): | |||
if output_dso_path: | |||
output_dso_path = str(os.path.abspath(output_dso_path)) | |||
print(f"Exporting model using AOT Inductor to {output_dso_path}") | |||
export_model_aoti(model, builder_args.device, output_dso_path, args) | |||
export_model_aoti(model_to_dso, builder_args.device, output_dso_path, args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
f8884e6
to
bc92599
Compare
bc92599
to
9563191
Compare
* add et export with gguf with test * fix generate too * add gguf path to generate
* add et export with gguf with test * fix generate too * add gguf path to generate
* add et export with gguf with test * fix generate too * add gguf path to generate
* add et export with gguf with test * fix generate too * add gguf path to generate
* add et export with gguf with test * fix generate too * add gguf path to generate
* add et export with gguf with test * fix generate too * add gguf path to generate
ET does not support _weight_int4pack_mm, so this adds gguf_kwargs that can be passed to building that control whether GGUF should be load_as_quantized. If load_as_quantized=False, GGUF is converted to floating point.
Also adds test for torchchat export + generate to et.yml with gguf file.