Introducing a generic `ModelConverter` interface. #823

balancap · 2025-02-06T15:14:27Z

This model handler interface should cover most cases in quantization, fused layer optimization, ...

This PR adds:

A generic interface for a ModelConverter class, transforming a model;
An argument model.converters where the user can add a list of converters to apply to the model (e.g. float8)
Converting Float8Handler to ModelConverter interface.

Related issue: #790

tianyu-l

I think in general it looks quite reasonable to me. I left several suggestions. Please see if they make sense.
cc @fegin @wconstab for any additional feedbacks.

torchtitan/model_handler.py

tianyu-l · 2025-02-07T00:23:42Z

torchtitan/config_manager.py

+        # self.parser.add_argument(
+        #     "--float8.enable_float8_linear",
+        #     action="store_true",
+        #     help="""
+        #         If true, swaps `torch.nn.Linear` with `Float8Linear`.
+        #         This feature requires you to install 'torchao' which can be found
+        #         here: https://github.com/pytorch/ao
+        #     """,
+        # )


I somehow think we can keep this as is in this PR, and by default include "float8" in "model.handlers" configs.
My idea is to decouple handler registration and handler turning on/off.
So that it enables your use case, but doesn't change anyone else's need to adapt code / mental model.

Happy to keep this one as it is not to break current float8 workflow.

But model.handlers is not a registration list, it is a list of handlers to apply to the model. Registration of a new model handler is done in the code with register_model_handler.

If we don't have model.handlers, we are adding complexity for the end user + the codebase: for every new handler, one needs to add an my_handler.enable flag in config + having an if/else logic in every handler to check if it is activated or not.

It feels to me as the wrong design pattern: it is much simpler for the user to specify a list of handlers to apply (and for the case float8, I would advocate to raise an error, not just a warning if the hardware does not support it).

yeah for sure, pls don't get me wrong.

I needed to run a lot of experiments with & without Float8, as a user of torchtitan. In the past, to enable / disable Float8, I just had to turn the three flags to True/False in the toml file. So I prefer not to go to two different places (model.converters and float8.enable...) to enable/disable float8 configs.

But I agree it's not good pattern keeping it as is. So let's follow your original change.

Thanks 👍

torchtitan/parallelisms/parallelize_llama.py

train.py

train_configs/debug_model.toml

torchtitan/model_handler.py

fegin

I have two general comments:

ModelHandler is too vague. Sorry that I am not good at naming either, so I asked GPT and ModelConverter is the recommended way. And it is consistent with the convert() method.
We should not create a new optimizer hooks since torch.optim.Optimizer already support hooks. The major benefit is that some other components, like TorchFT, may modify the behavior of torch.optim.Optimizer.step(). Adding additional hooks is likely to be incompatible (or make the integration hard) to those libraries. I'll submit a new PR to demonstrate how to register optimizer hooks based on Add Dynamic Model Import and ModelSpec Definition #814.

torchtitan/model_handler.py

balancap · 2025-02-07T16:23:16Z

Following @fegin suggestion, I'll do the renaming to ModelConverter. I hope you're fine with it too @tianyu-l

This model handler interface should cover most cases in quantization, fused layer optimization, ...

* Using `string_list` for `model.converters` argument; * Renaming to `ModelConverter`; * Basic unit test coverage for `ModelConvertersContainer`

balancap · 2025-02-07T17:50:25Z

@tianyu-l I have pushed a couple of improvements following your review, reverting the float8 flag back and adding basic unit test coverage.

@tianyu-l @fegin On the optimizer hook: it feels to me there is no obvious simple solution. The optimizer is getting created after the parallelism is applied to the model, whereas the converter/handler has to be built before (so can't receive the optimizer hook). In the current form, this PR is not adding anything new on this side, just doing a generic renaming of the float8 logic. I would be in favor of getting this feature merged as it is, and following the integration #814 on main, then open a PR to see which solution is the most elegant to integrate with PyTorch optimizer hook with ModelConverter. It is otherwise complicated to understand how these 2 PRs interacts with each other, may just delay things without necessarily finding a consensus.

balancap · 2025-02-07T21:23:34Z

In case you want to integrate with optimizer hooks, I believe the simpler way for now is to pass the model_converter to OptimizersContainer.__init__:

class OptimizersContainer(Stateful):
    """Util for calling step/zero_grad on multiple optimizers needed for virtual pipeline stages
    and saving/loading optimizer state_dict at checkpoint.
    """

    def __init__(
        self, model_parts: List[nn.Module], model_converter: ModelConverter, optimizer_kwargs: Dict[str, Any], name: str
    ) -> None:
          ...
          for model in self.model_parts:
                ...
                optimizer.register_step_post_hook(...) # lambda function calling model_converter.post_optimizer_hook

Happy to add that as a current solution, which may be still improved in the future (following #814 merge). I am not 100% myself what is the cleanest way, so that's why I feel a dedicated PR may be beneficial

tianyu-l

I would be in favor of getting this feature merged as it is, and following the integration #814 on main, then open a PR to see which solution is the most elegant to integrate with PyTorch optimizer hook with ModelConverter. It is otherwise complicated to understand how these 2 PRs interacts with each other, may just delay things without necessarily finding a consensus.

Sounds good to me. We can do the register_step_post_hook migration in another PR.

torchtitan/float8.py

train.py

torchtitan/float8.py

tianyu-l · 2025-02-07T23:32:19Z

tests/unit_tests/test_model_converter.py

Thanks for adding tests!

fegin · 2025-02-11T19:00:58Z

Can you also change the title to be converter?
https://github.com/pytorch/torchtitan/pull/814/files#r1951428540 demonstrates how to use ModeSpec to register the optimizer hooks.

tianyu-l · 2025-02-11T21:14:58Z

torchtitan/model_converter.py

+    job config, and apply them to the model sequentially.
+    """
+
+    def __init__(self, job_config: JobConfig, parallel_dims: ParallelDims):


Regarding #814 (comment), I think we can call apply_to_train_specs to register hooks to optimizers here

Could be indeed an option. Happy to discuss on a new PR, the small downside is my hunch that registers should be immutable, I have a bad feeling about modifying an existing entry! But maybe it would be an issue.

yeah, the intertwined logic is bad -- can we just specify it in the TrainSpec construction and let the spec handle the registration and usage?

…el-handler-interface

balancap · 2025-02-13T17:18:38Z

@tianyu-l I merge main to remove conflicts, reset to model.converters="float8" logic only as discussed, fixed the estimation.py to use ModelConverter as well.

balancap · 2025-02-13T17:20:59Z

Please tell me if there is any additional improvement in mind.

Once merged, I'll look at how integrating #814 ModelSpec with apply_to_train_specs (that are a couple of things around pipelining model parts, ... and so on where I prefer to make sure I have it right).

tianyu-l

lgtm! please address final comments :)

tianyu-l · 2025-02-13T21:17:39Z

torchtitan/float8.py

    def __init__(self, job_config: JobConfig, parallel_dims: ParallelDims):
        self.enabled = False

        float8_config = job_config.float8
-        if not float8_config.enable_float8_linear:


please remove this in config_manager.py too

My mistake, forgot to add it to the commit! Now fixed.

tianyu-l · 2025-02-13T21:20:17Z

torchtitan/model_converter.py

+    job config, and apply them to the model sequentially.
+    """
+
+    def __init__(self, job_config: JobConfig, parallel_dims: ParallelDims):


yeah, the intertwined logic is bad -- can we just specify it in the TrainSpec construction and let the spec handle the registration and usage?

tianyu-l · 2025-02-13T21:22:09Z

train_configs/llama3_70b.toml

@@ -54,4 +54,3 @@ async_mode = "disabled" # ["disabled", "async", "async_with_pinned_mem"]
 mode = 'full'

 [float8]


When float8 is disabled in a config (debugmodel, 8b, 70b), let's add the two float8 options (as False), and have a commented out line converters = "float8". The point is to make it easier to continue using float8, especially for people who are not aware of this PR.

Just added to debug, 8B and 70B models

…el-handler-interface

balancap · 2025-02-14T09:46:24Z

@tianyu-l Should be good hopefully :) I merged the latest main and fixed the small import conflicts.

Thanks for your feedback on the PR!

tianyu-l

Looks awesome, thank you for contributing!
Let's work together on the next PR to integrate ModelConverter to TrainSpec

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 6, 2025

balancap marked this pull request as draft February 6, 2025 15:14

balancap mentioned this pull request Feb 6, 2025

should we have an extension point for model transforms out of tree? #790

Open

balancap force-pushed the introduce-generic-model-handler-interface branch 4 times, most recently from 159f03b to 64a5338 Compare February 6, 2025 16:50

balancap marked this pull request as ready for review February 6, 2025 20:05

balancap marked this pull request as draft February 6, 2025 20:05

tianyu-l reviewed Feb 7, 2025

View reviewed changes

fegin reviewed Feb 7, 2025

View reviewed changes

torchtitan/model_handler.py Outdated Show resolved Hide resolved

balancap force-pushed the introduce-generic-model-handler-interface branch from 45e10b7 to 182a7ae Compare February 7, 2025 16:33

Introducing a generic ModelHandler interface.

9c227aa

This model handler interface should cover most cases in quantization, fused layer optimization, ...

balancap force-pushed the introduce-generic-model-handler-interface branch from 182a7ae to a3a20e4 Compare February 7, 2025 17:09

Improvements following first code review.

ed24d73

* Using `string_list` for `model.converters` argument; * Renaming to `ModelConverter`; * Basic unit test coverage for `ModelConvertersContainer`

balancap force-pushed the introduce-generic-model-handler-interface branch from a3a20e4 to ed24d73 Compare February 7, 2025 17:42

balancap marked this pull request as ready for review February 7, 2025 17:43

tianyu-l reviewed Feb 7, 2025

View reviewed changes

tianyu-l mentioned this pull request Feb 9, 2025

add configuration for float8 with rowwise scaling, via recipe lookup #808

Open

tianyu-l reviewed Feb 11, 2025

View reviewed changes

balancap changed the title ~~Introducing a generic ModelHandler interface.~~ Introducing a generic ModelConverter interface. Feb 13, 2025

Merge remote-tracking branch 'origin/main' into introduce-generic-mod…

abf3cb9

…el-handler-interface

balancap force-pushed the introduce-generic-model-handler-interface branch from db04aa4 to 211d8f9 Compare February 13, 2025 16:51

Float8Converter renaming and removing enable_float8_linear flag.

a210898

balancap force-pushed the introduce-generic-model-handler-interface branch from 211d8f9 to a210898 Compare February 13, 2025 16:52

Update estimation.py to ModelConverter interface.

dc8bb73

tianyu-l approved these changes Feb 13, 2025

View reviewed changes

tianyu-l added this to the torchtitan v1.0.0 release milestone Feb 14, 2025

balancap added 2 commits February 14, 2025 09:28

Fix float8 configs.

dc5f891

Merge remote-tracking branch 'origin/main' into introduce-generic-mod…

84860d6

…el-handler-interface

tianyu-l approved these changes Feb 15, 2025

View reviewed changes

tianyu-l merged commit 57387af into pytorch:main Feb 15, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing a generic `ModelConverter` interface. #823

Introducing a generic `ModelConverter` interface. #823

balancap commented Feb 6, 2025 •

edited

Loading

tianyu-l left a comment

tianyu-l Feb 7, 2025

balancap Feb 7, 2025

tianyu-l Feb 7, 2025 •

edited

Loading

balancap Feb 13, 2025

fegin left a comment

balancap commented Feb 7, 2025

balancap commented Feb 7, 2025

balancap commented Feb 7, 2025

tianyu-l left a comment

tianyu-l Feb 7, 2025

fegin commented Feb 11, 2025

tianyu-l Feb 11, 2025

balancap Feb 13, 2025

tianyu-l Feb 13, 2025

balancap commented Feb 13, 2025

balancap commented Feb 13, 2025 •

edited

Loading

tianyu-l left a comment

tianyu-l Feb 13, 2025

balancap Feb 14, 2025

tianyu-l Feb 13, 2025

tianyu-l Feb 13, 2025 •

edited

Loading

balancap Feb 14, 2025

balancap commented Feb 14, 2025

tianyu-l left a comment

		@@ -54,4 +54,3 @@ async_mode = "disabled" # ["disabled", "async", "async_with_pinned_mem"]
		mode = 'full'

		[float8]

Introducing a generic ModelConverter interface. #823

Introducing a generic ModelConverter interface. #823

Conversation

balancap commented Feb 6, 2025 • edited Loading

tianyu-l left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tianyu-l Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fegin left a comment

Choose a reason for hiding this comment

balancap commented Feb 7, 2025

balancap commented Feb 7, 2025

balancap commented Feb 7, 2025

tianyu-l left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fegin commented Feb 11, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

balancap commented Feb 13, 2025

balancap commented Feb 13, 2025 • edited Loading

tianyu-l left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tianyu-l Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

balancap commented Feb 14, 2025

tianyu-l left a comment

Choose a reason for hiding this comment

Introducing a generic `ModelConverter` interface. #823

Introducing a generic `ModelConverter` interface. #823

balancap commented Feb 6, 2025 •

edited

Loading

tianyu-l Feb 7, 2025 •

edited

Loading

balancap commented Feb 13, 2025 •

edited

Loading

tianyu-l Feb 13, 2025 •

edited

Loading