Export to ExecuTorch: Initial Integration #2090

guangy10 · 2024-11-06T03:00:52Z

What does this PR do?

This PR is the first initiative to create an e2e path for "Export to ExecuTorch".

In this very first revision, I'm focusing outline the skeleton of the integration work:

./setup.py: Specify new dependency in order to "Export to ExecuTorch". In this case, it's adding a new pip package executorch(Beta version) and require the latest released transformfers (models from older versions may not work with ExecuTorch)

Env setup

./optimum/commands/export/executorch.py: To support running export with ExecuTorch backend via cli.

Export AOT(Ahead-of-time)

./optimum/exporters/executorch/: Main entry point to export via ExecuTorch.

optimum/exporters/executorch/
├── __init__.py
├── __main__.py
├── convert.py
├── recipe_registry.py
├── recipes
│   └── xnnpack.py
├── task_registry.py
└── tasks
    └── causal_lm.py

It contains the coverter.py which defines the common workflow to export a transformers model from 🤗 to ExecuTorch.
The export workflow can be called with different recipes (e.g. quantize and delegate to XNNPACK, Core ML, QNN, MPS, etc.).
For each model that performs different tasks may require different patches, this is handled via registered tasks. For example, CausalLM w/ cache must be loaded with generation_config and exported via transformers.integrations.executorch.

Run with ExecuTorch model

optimum/executorchruntime/modeling_executorch.py defines the python class that wrap the Transformers AutoModelForXXX classes. We start with ExecuTorchModelForCausalLM in this file, which inherits the base OptimizedModel and overrides all abstract methods. This is where the ExecuTorch pybinding and the runtime gets integrated.

Tests

Export AOT(Ahead-of-time)

Export to ExecuTorch via CLI:
optimum-cli export executorch --model "meta-llama/Llama-3.2-1B" --task "text-generation" --recipe "xnnpack" --output_dir="meta_llama3_2_1b"
It generates the ExecuTorch model to meta_llama3_2_1b/model.pte.

Run with ExecuTorch model (`model.pte`)

model = ExecuTorchModelForCausalLM.from_pretrained("meta_llama3_2_1b/", export=False)
print(model.text_generation(tokenizer=tokenizer, prompt="Simply put, the theory of relativity states that", max_seq_len=100))

And we got:
"Hey, can you tell me any fun things to do in New York? I’m going to be there for a week and I’m not sure what to do. I’m a little bit of a history buff and I’m also a little bit of a foodie. I’m not sure if I should go to the museum or the zoo or the aquarium. I’m not sure if I should go to the theater or the opera or the ballet..."

Full new tests:

RUN_SLOW=1 pytest tests/executorch/*/test_*.py -s -v

================================================================================= slowest durations =================================================================================
260.35s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_gemma2_text_generation_with_xnnpack
251.20s call     tests/executorch/export/test_exporters_executorch.py::TestExportToExecuTorchCLI::test_gemma2_export_to_executorch
240.18s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_llama3_2_3b_text_generation_with_xnnpack
175.28s call     tests/executorch/export/test_exporters_executorch.py::TestExportToExecuTorchCLI::test_llama3_2_3b_export_to_executorch
137.61s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_qwen2_5_text_generation_with_xnnpack
126.18s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_gemma_text_generation_with_xnnpack
122.14s call     tests/executorch/export/test_exporters_executorch.py::TestExportToExecuTorchCLI::test_qwen2_5_export_to_executorch
106.66s call     tests/executorch/export/test_exporters_executorch.py::TestExportToExecuTorchCLI::test_gemma_export_to_executorch
84.63s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_llama3_2_1b_text_generation_with_xnnpack
83.23s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_load_model_from_local_path
78.27s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_load_model_from_hub
76.22s call     tests/executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_olmo_text_generation_with_xnnpack
72.95s call     tests/executorch/export/test_exporters_executorch.py::TestExportToExecuTorchCLI::test_llama3_2_1b_export_to_executorch
61.25s call     tests/executorch/export/test_exporters_executorch.py::TestExportToExecuTorchCLI::test_olmo_export_to_executorch
2.89s call     tests/executorch/export/test_exporters_executorch.py::TestExportToExecuTorchCLI::test_helps_no_raise

(30 durations < 0.005s hidden.  Use -vv to show these durations.)
=================================================================== 15 passed, 18 warnings in 1892.32s (0:31:32) ====================================================================

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@michaelbenayoun @echarlaix @mergennachin

optimum/executorchruntime/modeling_executorch.py

optimum/pipelines/pipelines_base.py

guangy10 · 2024-11-12T00:36:46Z

Fixed the export path. Now optimum-cli export executorch --model "meta-llama/Llama-3.2-1B" --task "text-generation" --recipe "xnnpack" --output_dir="meta_llama3_2_1b" can work as expected

HuggingFaceDocBuilderDev · 2024-11-18T14:50:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

michaelbenayoun

Left a few comments. I have to say it is really great work what you have done here since it is not easy to go from our main ONNX codebase and generalize!

Do you think we could also write little tests in the same spirit as what's done for ONNX, where we validate the exported model against a vanilla PyTorch model?

optimum/commands/export/executorch.py

optimum/executorchruntime/modeling_executorch.py

optimum/exporters/executorch/convert.py

optimum/exporters/executorch/xnnpack.py

optimum/pipelines/pipelines_base.py

test_executorch.py

optimum/executorchruntime/modeling_executorch.py

optimum/pipelines/pipelines_base.py

guangy10 · 2024-12-02T22:59:30Z

Thanks for the feedback @michaelbenayoun @echarlaix. I was on PTO last week right after the Hackathon. I will prioritize and continue to iterating this PR this week.

guangy10 · 2024-12-04T01:39:07Z

Do you think we could also write little tests in the same spirit as what's done for ONNX, where we validate the exported model against a vanilla PyTorch model?

I should definitely add tests and make sure it runs in CI, for both export cli and runtime. However, note that the generated contents from exported model may not be identical as the eager pytorch model due to specialization during export. We can goal on testing the functionality first in this PR, i.g. the ability to generate using the exported model, then guard on correctness later though some lm_eval or something similar.

Tests are added.
pytest tests/cli/test_cli.py -k test_export_commands -v

tests/cli/test_cli.py::TestCLI::test_export_commands PASSED                                                            [100%]

pytest tests/executorch/test_modeling.py -k test_text_generation_with_xnnpack -v

tests/executorch/test_modeling.py::ExecuTorchModelIntegrationTest::test_text_generation_with_xnnpack PASSED                          [100%]

Don't see any CI is running. @michaelbenayoun @echarlaix How can I make sure these tests are running in CI?

guangy10 · 2024-12-04T04:27:32Z

@michaelbenayoun @echarlaix Could you guide me where I should add documentations for the new backend ExecuTorch?

guangy10 · 2024-12-05T02:21:28Z

Several new updates:

Support dynamic task & recipe discovery and import
Support load from_pretrained with export=True which will fetch model_id from hub and export to ExecuTorch to a temp dir and load the exported model back
Clean up and fix code API documentations
Add unit tests for optimum-cli export executorch and runtime using ExecuTorchModelForCausalLM.

tests/cli/test_cli.py

michaelbenayoun · 2024-12-05T16:27:47Z

Do you think we could also write little tests in the same spirit as what's done for ONNX, where we validate the exported model against a vanilla PyTorch model?

I should definitely add tests and make sure it runs in CI, for both export cli and runtime. However, note that the generated contents from exported model may not be identical as the eager pytorch model due to specialization during export. We can goal on testing the functionality first in this PR, i.g. the ability to generate using the exported model, then guard on correctness later though some lm_eval or something similar.

Tests are added. pytest tests/cli/test_cli.py -k test_export_commands -v
tests/cli/test_cli.py::TestCLI::test_export_commands PASSED                                                            [100%]
pytest tests/executorch/test_modeling.py -k test_text_generation_with_xnnpack -v
tests/executorch/test_modeling.py::ExecuTorchModelIntegrationTest::test_text_generation_with_xnnpack PASSED                          [100%]
Don't see any CI is running. @michaelbenayoun @echarlaix How can I make sure these tests are running in CI?

The CI must be triggered by a HF member, that is why it is not running. I just triggered it.

tests/executorch/runtime/test_modeling.py

guangy10 · 2024-12-17T20:14:39Z

Updated expected texts for llama3.2 3b, gemma, gemma2
Fixed linter error

guangy10 · 2024-12-18T03:47:05Z

I see executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_gemma2_text_generation_with_xnnpack PASSED but

/Users/runner/work/_temp/e6f2ff22-a7d4-4ea4-b0a1-8b1d5a386502.sh: line 1:  4688 Killed: 9               RUN_SLOW=1 pytest executorch/runtime/test_*.py -s -vvvv --durations=0
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
executorch/runtime/test_modeling.py::ExecuTorchModelIntegrationTest::test_gemma_text_generation_with_xnnpack

In ExecuTorch Runtime jobs.

Trying to figure out if it's because previous tests is taking pretty much all SSD on the runner. I'm splitting the tests for each model into a separate test file, so each job will only run test for a single model, which is more scalable as we adding more recipes, optimizations, and tests for a model. cc: @michaelbenayoun

guangy10 · 2024-12-18T19:34:50Z

Weird doc-build error but doesn't look relevant to this PR:

69.99 ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device
69.99 
------
Dockerfile:11
--------------------
   9 |     
  10 |     RUN git clone $clone_url && cd optimum && git checkout $commit_sha
  11 | >>> RUN python3 -m pip install --no-cache-dir ./optimum[onnxruntime,benchmark,quality,exporters-tf,doc-build,diffusers]
  12 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c python3 -m pip install --no-cache-dir ./optimum[onnxruntime,benchmark,quality,exporters-tf,doc-build,diffusers]" did not complete successfully: exit code: 1

@michaelbenayoun safe to merge?

guangy10 · 2024-12-18T21:17:21Z

.github/workflows/test_executorch_export.yml

+        python-version: ['3.10', '3.11', '3.12']
+        os: [macos-15]
+
+    runs-on: ${{ matrix.os }}
+    steps:
+      - uses: actions/checkout@v2
+      - name: Setup Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v2
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Install dependencies for ExecuTorch
+        run: |
+          pip install .[tests,exporters-executorch]


cc: @michaelbenayoun Compatible python versions and package installation.