Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export to ExecuTorch: Initial Integration #2090

Merged

Conversation

guangy10
Copy link
Contributor

@guangy10 guangy10 commented Nov 6, 2024

What does this PR do?

This PR is the first initiative to create an e2e path for "Export to ExecuTorch".

In this very first revision, I'm focusing outline the skeleton of the integration work:

./setup.py: Specify new dependency in order to "Export to ExecuTorch". In this case, it's adding a new pip package executorch(Beta version) and require the latest released transformfers (models from older versions may not work with ExecuTorch)

Env setup

./optimum/commands/export/executorch.py: To support running export with ExecuTorch backend via cli.

Export AOT(Ahead-of-time)

./optimum/exporters/executorch/: Main entry point to export via ExecuTorch.

optimum/exporters/executorch/
├── __init__.py
├── __main__.py
├── convert.py
├── recipe_registry.py
├── recipes
│   └── xnnpack.py
├── task_registry.py
└── tasks
    └── causal_lm.py
  • It contains the coverter.py which defines the common workflow to export a transformers model from 🤗 to ExecuTorch.
  • The export workflow can be called with different recipes (e.g. quantize and delegate to XNNPACK, Core ML, QNN, MPS, etc.).
  • For each model that performs different tasks may require different patches, this is handled via registered tasks. For example, CausalLM w/ cache must be loaded with generation_config and exported via transformers.integrations.executorch.

Run with ExecuTorch model

optimum/executorchruntime/modeling_executorch.py defines the python class that wrap the Transformers AutoModelForXXX classes. We start with ExecuTorchModelForCausalLM in this file, which inherits the base OptimizedModel and overrides all abstract methods. This is where the ExecuTorch pybinding and the runtime gets integrated.

Tests

Export AOT(Ahead-of-time)

Export to ExecuTorch via CLI:
optimum-cli export executorch --model "meta-llama/Llama-3.2-1B" --task "text-generation" --recipe "xnnpack" --output_dir="meta_llama3_2_1b"
It generates the ExecuTorch model to meta_llama3_2_1b/model.pte.

Run with ExecuTorch model (model.pte)

model = ExecuTorchModelForCausalLM.from_pretrained("meta_llama3_2_1b/", export=False)
print(model.text_generation(tokenizer=tokenizer, prompt="Simply put, the theory of relativity states that", max_seq_len=100))

And we got:
"Hey, can you tell me any fun things to do in New York? I’m going to be there for a week and I’m not sure what to do. I’m a little bit of a history buff and I’m also a little bit of a foodie. I’m not sure if I should go to the museum or the zoo or the aquarium. I’m not sure if I should go to the theater or the opera or the ballet..."

Full new tests:

RUN_SLOW=1 pytest tests/executorch/*/test_*.py -s -v

================================================================================= slowest durations =================================================================================
260.35s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_gemma2_text_generation_with_xnnpack
251.20s call     tests/executorch/export/test_exporters_executorch.py::TestExportToExecuTorchCLI::test_gemma2_export_to_executorch
240.18s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_llama3_2_3b_text_generation_with_xnnpack
175.28s call     tests/executorch/export/test_exporters_executorch.py::TestExportToExecuTorchCLI::test_llama3_2_3b_export_to_executorch
137.61s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_qwen2_5_text_generation_with_xnnpack
126.18s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_gemma_text_generation_with_xnnpack
122.14s call     tests/executorch/export/test_exporters_executorch.py::TestExportToExecuTorchCLI::test_qwen2_5_export_to_executorch
106.66s call     tests/executorch/export/test_exporters_executorch.py::TestExportToExecuTorchCLI::test_gemma_export_to_executorch
84.63s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_llama3_2_1b_text_generation_with_xnnpack
83.23s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_load_model_from_local_path
78.27s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_load_model_from_hub
76.22s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_olmo_text_generation_with_xnnpack
72.95s call     tests/executorch/export/test_exporters_executorch.py::TestExportToExecuTorchCLI::test_llama3_2_1b_export_to_executorch
61.25s call     tests/executorch/export/test_exporters_executorch.py::TestExportToExecuTorchCLI::test_olmo_export_to_executorch
2.89s call     tests/executorch/export/test_exporters_executorch.py::TestExportToExecuTorchCLI::test_helps_no_raise

(30 durations < 0.005s hidden.  Use -vv to show these durations.)
=================================================================== 15 passed, 18 warnings in 1892.32s (0:31:32) ====================================================================

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@michaelbenayoun @echarlaix @mergennachin

@guangy10 guangy10 force-pushed the executorch_integration_skeleton branch 2 times, most recently from 4ad0b3d to 328069d Compare November 12, 2024 00:28
@guangy10
Copy link
Contributor Author

Fixed the export path. Now optimum-cli export executorch --model "meta-llama/Llama-3.2-1B" --task "text-generation" --recipe "xnnpack" --output_dir="meta_llama3_2_1b" can work as expected

@guangy10 guangy10 force-pushed the executorch_integration_skeleton branch 4 times, most recently from 6b38215 to 757f152 Compare November 14, 2024 00:37
@guangy10 guangy10 changed the title Export to ExecuTorch: Code Skeleton Export to ExecuTorch: Initial Integration Nov 14, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@michaelbenayoun michaelbenayoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. I have to say it is really great work what you have done here since it is not easy to go from our main ONNX codebase and generalize!

Do you think we could also write little tests in the same spirit as what's done for ONNX, where we validate the exported model against a vanilla PyTorch model?

optimum/commands/export/executorch.py Show resolved Hide resolved
optimum/executorchruntime/modeling_executorch.py Outdated Show resolved Hide resolved
optimum/exporters/executorch/convert.py Outdated Show resolved Hide resolved
optimum/exporters/executorch/convert.py Outdated Show resolved Hide resolved
optimum/exporters/executorch/xnnpack.py Outdated Show resolved Hide resolved
optimum/pipelines/pipelines_base.py Outdated Show resolved Hide resolved
test_executorch.py Outdated Show resolved Hide resolved
@guangy10
Copy link
Contributor Author

guangy10 commented Dec 2, 2024

Thanks for the feedback @michaelbenayoun @echarlaix. I was on PTO last week right after the Hackathon. I will prioritize and continue to iterating this PR this week.

@guangy10
Copy link
Contributor Author

guangy10 commented Dec 4, 2024

Do you think we could also write little tests in the same spirit as what's done for ONNX, where we validate the exported model against a vanilla PyTorch model?

I should definitely add tests and make sure it runs in CI, for both export cli and runtime. However, note that the generated contents from exported model may not be identical as the eager pytorch model due to specialization during export. We can goal on testing the functionality first in this PR, i.g. the ability to generate using the exported model, then guard on correctness later though some lm_eval or something similar.

Tests are added.
pytest tests/cli/test_cli.py -k test_export_commands -v

tests/cli/test_cli.py::TestCLI::test_export_commands PASSED                                                            [100%]

pytest tests/executorch/test_modeling.py -k test_text_generation_with_xnnpack -v

tests/executorch/test_modeling.py::ExecuTorchModelIntegrationTest::test_text_generation_with_xnnpack PASSED                          [100%]

Don't see any CI is running. @michaelbenayoun @echarlaix How can I make sure these tests are running in CI?

@guangy10 guangy10 force-pushed the executorch_integration_skeleton branch from 757f152 to 28aae25 Compare December 4, 2024 04:18
@guangy10
Copy link
Contributor Author

guangy10 commented Dec 4, 2024

@michaelbenayoun @echarlaix Could you guide me where I should add documentations for the new backend ExecuTorch?

@guangy10 guangy10 force-pushed the executorch_integration_skeleton branch from 28aae25 to 4385d25 Compare December 5, 2024 02:14
@guangy10
Copy link
Contributor Author

guangy10 commented Dec 5, 2024

Several new updates:

  • Support dynamic task & recipe discovery and import
  • Support load from_pretrained with export=True which will fetch model_id from hub and export to ExecuTorch to a temp dir and load the exported model back
  • Clean up and fix code API documentations
  • Add unit tests for optimum-cli export executorch and runtime using ExecuTorchModelForCausalLM.

tests/cli/test_cli.py Outdated Show resolved Hide resolved
@michaelbenayoun
Copy link
Member

Do you think we could also write little tests in the same spirit as what's done for ONNX, where we validate the exported model against a vanilla PyTorch model?

I should definitely add tests and make sure it runs in CI, for both export cli and runtime. However, note that the generated contents from exported model may not be identical as the eager pytorch model due to specialization during export. We can goal on testing the functionality first in this PR, i.g. the ability to generate using the exported model, then guard on correctness later though some lm_eval or something similar.

Tests are added. pytest tests/cli/test_cli.py -k test_export_commands -v

tests/cli/test_cli.py::TestCLI::test_export_commands PASSED                                                            [100%]

pytest tests/executorch/test_modeling.py -k test_text_generation_with_xnnpack -v

tests/executorch/test_modeling.py::ExecuTorchModelIntegrationTest::test_text_generation_with_xnnpack PASSED                          [100%]

Don't see any CI is running. @michaelbenayoun @echarlaix How can I make sure these tests are running in CI?

The CI must be triggered by a HF member, that is why it is not running. I just triggered it.

@guangy10 guangy10 force-pushed the executorch_integration_skeleton branch from 58326c4 to c36e2e2 Compare December 17, 2024 20:14
@guangy10
Copy link
Contributor Author

  • Updated expected texts for llama3.2 3b, gemma, gemma2
  • Fixed linter error

@guangy10
Copy link
Contributor Author

I see executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_gemma2_text_generation_with_xnnpack PASSED but

/Users/runner/work/_temp/e6f2ff22-a7d4-4ea4-b0a1-8b1d5a386502.sh: line 1:  4688 Killed: 9               RUN_SLOW=1 pytest executorch/runtime/test_*.py -s -vvvv --durations=0
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_gemma_text_generation_with_xnnpack

In ExecuTorch Runtime jobs.

Trying to figure out if it's because previous tests is taking pretty much all SSD on the runner. I'm splitting the tests for each model into a separate test file, so each job will only run test for a single model, which is more scalable as we adding more recipes, optimizations, and tests for a model. cc: @michaelbenayoun

@guangy10
Copy link
Contributor Author

Weird doc-build error but doesn't look relevant to this PR:

69.99 ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device
69.99 
------
Dockerfile:11
--------------------
   9 |     
  10 |     RUN git clone $clone_url && cd optimum && git checkout $commit_sha
  11 | >>> RUN python3 -m pip install --no-cache-dir ./optimum[onnxruntime,benchmark,quality,exporters-tf,doc-build,diffusers]
  12 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c python3 -m pip install --no-cache-dir ./optimum[onnxruntime,benchmark,quality,exporters-tf,doc-build,diffusers]" did not complete successfully: exit code: 1

@michaelbenayoun safe to merge?

Comment on lines +18 to +30
python-version: ['3.10', '3.11', '3.12']
os: [macos-15]

runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v2
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies for ExecuTorch
run: |
pip install .[tests,exporters-executorch]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @michaelbenayoun Compatible python versions and package installation.

@michaelbenayoun michaelbenayoun merged commit d21256c into huggingface:main Dec 20, 2024
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants