-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for LTX-Video model in ImageToVideo Pipeline #394
base: main
Are you sure you want to change the base?
Add support for LTX-Video model in ImageToVideo Pipeline #394
Conversation
self.ldm = StableVideoDiffusionPipeline.from_pretrained(model_id, **kwargs) | ||
except Exception as loading_error: | ||
logger.error("Failed to load %s : %s." %(self.pipeline_name,loading_error)) | ||
# Trying to load the LTXImageToVideoPipeline if the StableVideoDiffusionPipeline fails to load and there is a chance that model name doesn't match the if condition for LTX-Video |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not be retrying the LTXImageToVideoPipeline load again here.
Would switching to use DiffusionPipeline
make the loading generic? We are only passing model_id and kwargs so seems like most of it is setup to use generic loading if its possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ad-astra-video I added the extra try except
codeblock inside the primary except
block for the case when user tries to load a ltx-video model from model hub which they have created new in their own HF repo but the naming standards aren't matched. All the models for LTX-Video on the model hub passes the condition for LTXImageToVideoPipeline
and it is a good practice to name models based on their name so our code's if
condition should be sufficient enough to deal with LTX-video
model loading.
Regarding your second question, DiffusionPipeline
is the base class for most to all of the pipelines in diffusers library. This can be used for generic downloading, loading and inference but using it blocks the ability to utilise pipeline specific features. For ex for LTX-Video
there are two pipeline classes, ie., LTXImageToVideoPipeline
(I2V) and LTXPipeline
(T2V) and when we use generic pipeline class like DiffusionPipeline
to load pipeline with model_id for e.g. Lightricks/LTX-Video
so what DiffusionPipeline
does is that it accesses the model_index.json from the model's folder and loads the config to retrieve the pipeline class for that specific model. The pipeline class is in model_index.json
file in form of this KV pair: "_class_name": "LTXPipeline",
. So, basically if we have to use the LTX-Video
modelID for our I2V pipeline then either we have to create a separate model repo replicating it and modifying the _class_name
or we have to use the specific pipeline class which is LTXImageToVideoPipeline
. And this is just not for this specific model but for other tasks too where specific pipeline is needed to be mentioned.
Yeah if this PR was for the T2V pipeline. then for sure I would have used DiffusionPipeline
class to keep everything generic.
PS I plan to use DiffusionPipeline
class for a generic Diffusers
pipeline in our offerings as I discussed with Rick.
cc @rickstaa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the background. I was hoping we could use DiffusionPipeline to have a simple upgrade path by updating diffusers only to get new models. In this scenario when a model supports image and text input it gets more complicated.
Could we default to using DiffusionPipeline
and for these models that do not specifiy the specific image to video pipelines we can update the pipeline with from_pipe
to be the correct pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah from_pipe
can make DiffusionPipeline
work for specific cases like that of
LTXImageToVideoPipeline
. If we have to keep as DiffusionPipeline
default then let me make changes accordingly.
Update:pushed the necessary commit and thanks for the from_pipe
suggestion
@@ -113,6 +136,14 @@ def __call__( | |||
seed = kwargs.pop("seed", None) | |||
safety_check = kwargs.pop("safety_check", True) | |||
|
|||
if self.pipeline_name == "LTXImageToVideoPipeline": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do something more generic that looks at the pipeline class args and del the kwargs keys if not present in the pipeline? If a heavy lift we can do in separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point would love to make all pipelines more generic. We can do this in seperate PR if to heavy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ad-astra-video That's a nice mention. Lets remove this stone age hard coded block and get something more generic.
Pushed the changes in the latest commit in this PR itself. 👍 Now it can work even when we make the whole pipeline generic based on task
like i2i generic pipeline.
logger.error("Failed to load both LTXImageToVideoPipeline and StableVideoDiffusionPipeline: %s. Please ensure the model ID is compatible.", loading_error) | ||
raise loading_error | ||
|
||
|
||
self.ldm.to(get_torch_device()) | ||
|
||
sfast_enabled = os.getenv("SFAST", "").strip().lower() == "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to confirm if SFAST works on LTXImageToVideoPipeline. If it does not, only use it if StableVideoDiffusionPipeline
is loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SFAST
doesn't work on LTXImageToVideoPipeline
because it doesn't have unet attribute so can't compile the model. I did a test separately just for the pipeline on colab. And the same goes for DeepCacheSDHelper
with the same reason.
Also, made the commit to handle the case. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks good, have couple comments below and left suggestions/questions in the files:
- LTXVideo requires diffusers 0.32.0 right? Suggest we update to diffusers 0.32.1 with this PR.
- If fix the multipart.go suggested change I can build and test tomorrow
Co-authored-by: Brad | ad-astra <[email protected]>
Update: addressed all the suggested changes in the recent commits |
What does this PR do?
This PR adds support for the
LTX-Video
model in theImageToVideoPipeline
. It introduces a mechanism to dynamically load either theLTXImageToVideoPipeline
orStableVideoDiffusionPipeline
, based on the providedmodelID
. This change enhances the flexibility of theImageToVideoPipeline
, allowing it to handle bothLTX-Video
andStableVideoDiffusion
models and with addition ofLTXImageToVideoPipeline
, the video generation can now be prompt text guided.After building the Docker image, the setup has been tested by running it locally on a Uvicorn server.
cc @rickstaa