Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error no file named pytorch_model.bin, model.safetensors found in directory Lightricks/LTX-Video. #10321

Open
nitinmukesh opened this issue Dec 20, 2024 · 14 comments
Labels
bug Something isn't working

Comments

@nitinmukesh
Copy link

Describe the bug

(venv) C:\ai1\LTX-Video>python inference.py
Traceback (most recent call last):
  File "C:\ai1\LTX-Video\inference.py", line 23, in <module>
    text_encoder = T5EncoderModel.from_pretrained(
  File "C:\ai1\LTX-Video\venv\lib\site-packages\transformers\modeling_utils.py", line 3779, in from_pretrained
    raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory Lightricks/LTX-Video.

(venv) C:\ai1\LTX-Video>python inference.py
Traceback (most recent call last):
  File "C:\ai1\LTX-Video\inference.py", line 23, in <module>
    text_encoder = T5EncoderModel.from_pretrained(
  File "C:\ai1\LTX-Video\venv\lib\site-packages\transformers\modeling_utils.py", line 3779, in from_pretrained
    raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory Lightricks/LTX-Video.

Reproduction

Install diffusers from source and use the code mentioned here
https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video

Logs

C:\ai1\LTX-Video\Lightricks>tree /F
Folder PATH listing for volume Windows-SSD
Volume serial number is CE9F-A6AE
C:.
└───LTX-Video
    │   ltx-video-2b-v0.9.1.safetensors
    │   model_index.json
    │
    ├───text_encoder
    │       config.json
    │       model-00001-of-00004.safetensors
    │       model-00002-of-00004.safetensors
    │       model-00003-of-00004.safetensors
    │       model-00004-of-00004.safetensors
    │
    ├───tokenizer
    │       added_tokens.json
    │       special_tokens_map.json
    │       spiece.model
    │       tokenizer_config.json
    │
    ├───transformer
    │       config.json
    │       diffusion_pytorch_model-00001-of-00002.safetensors
    │       diffusion_pytorch_model-00002-of-00002.safetensors
    │       diffusion_pytorch_model.safetensors.index.json
    │
    └───vae
            config.json
            diffusion_pytorch_model.safetensors

System Info

Windows 11/ Python 3.10.11

(venv) C:\ai1\LTX-Video>pip list
Package            Version
------------------ ------------
accelerate         1.2.1
certifi            2024.12.14
charset-normalizer 3.4.0
colorama           0.4.6
diffusers          0.32.0.dev0
einops             0.8.0
filelock           3.16.1
fsspec             2024.12.0
gguf               0.13.0
huggingface-hub    0.25.2
idna               3.10
importlib_metadata 8.5.0
Jinja2             3.1.4
MarkupSafe         3.0.2
mpmath             1.3.0
networkx           3.4.2
numpy              2.2.0
packaging          24.2
pillow             11.0.0
pip                23.0.1
psutil             6.1.1
PyYAML             6.0.2
regex              2024.11.6
requests           2.32.3
safetensors        0.4.5
sentencepiece      0.2.0
setuptools         65.5.0
sympy              1.13.1
tokenizers         0.21.0
torch              2.5.1+cu124
torchvision        0.20.1+cu124
tqdm               4.67.1
transformers       4.47.1
typing_extensions  4.12.2
urllib3            2.2.3
wheel              0.45.1
zipp               3.21.0

Who can help?

import torch
from diffusers import LTXPipeline
from transformers import T5EncoderModel, T5Tokenizer

single_file_url = "Lightricks/LTX-Video/ltx-video-2b-v0.9.1.safetensors"
text_encoder = T5EncoderModel.from_pretrained(
  "Lightricks/LTX-Video", subfolder="text_encoder", torch_dtype=torch.bfloat16
)
tokenizer = T5Tokenizer.from_pretrained(
  "Lightricks/LTX-Video", subfolder="tokenizer", torch_dtype=torch.bfloat16
)
pipe = LTXPipeline.from_single_file(
  single_file_url, text_encoder=text_encoder, tokenizer=tokenizer, torch_dtype=torch.bfloat16
)


pipe.enable_model_cpu_offload()
prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

video = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=704,
    height=480,
    num_frames=161,
    num_inference_steps=50,
).frames[0]
export_to_video(video, "output_ltx.mp4", fps=24)
@nitinmukesh nitinmukesh added the bug Something isn't working label Dec 20, 2024
@hlky
Copy link
Collaborator

hlky commented Dec 20, 2024

Your local text_encoder directory seems to be missing model.safetensors.index.json

@nitinmukesh
Copy link
Author

@hlky

Thank you. Now sure why it was missing. But now getting

(venv) C:\ai1\LTX-Video>python inference.py
Loading checkpoint shards: 100%|████████████████████████████████████| 4/4 [00:21<00:00,  5.44s/it]
Loading pipeline components...:  80%|████████████████████████▊      | 4/5 [00:01<00:00,  3.36it/s]
Traceback (most recent call last):
  File "C:\ai1\LTX-Video\inference.py", line 29, in <module>
    pipe = LTXPipeline.from_single_file(
  File "C:\ai1\LTX-Video\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\ai1\LTX-Video\venv\lib\site-packages\diffusers\loaders\single_file.py", line 495, in from_single_file
    loaded_sub_model = load_single_file_sub_model(
  File "C:\ai1\LTX-Video\venv\lib\site-packages\diffusers\loaders\single_file.py", line 102, in load_single_file_sub_model
    loaded_sub_model = load_method(
  File "C:\ai1\LTX-Video\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\ai1\LTX-Video\venv\lib\site-packages\diffusers\loaders\single_file_model.py", line 349, in from_single_file
    unexpected_keys = load_model_dict_into_meta(
  File "C:\ai1\LTX-Video\venv\lib\site-packages\diffusers\models\model_loading_utils.py", line 230, in load_model_dict_into_meta
    raise ValueError(
ValueError: Cannot load  because decoder.conv_in.conv.bias expected shape tensor(..., device='meta', size=(512,)), but got torch.Size([1024]). If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.

@hlky
Copy link
Collaborator

hlky commented Dec 20, 2024

The VAE https://huggingface.co/Lightricks/LTX-Video/tree/main/vae has the expected shape listed in the error, can you check the contents of config.json matches what you have locally and the hash of the safetensors matches what's listed on the Hub?

@nitinmukesh
Copy link
Author

nitinmukesh commented Dec 20, 2024

Seems same to me. I am not a developer just an end user so sorry for asking something obvious.

PS C:\ai1\LTX-Video\Lightricks\LTX-Video\vae> Get-FileHash diffusion_pytorch_model.safetensors | Format-List
Algorithm : SHA256
Hash      : 265CA87CB5DFF5E37F924286E957324E282FE7710A952A7DAFC0DF43883E2010
Path      : C:\ai1\LTX-Video\Lightricks\LTX-Video\vae\diffusion_pytorch_model.safetensors

https://huggingface.co/Lightricks/LTX-Video/commit/05be24f065a268e3b7881b63a114dbf52ce68fd1

version https://git-lfs.github.com/spec/v1
oid sha256:265ca87cb5dff5e37f924286e957324e282fe7710a952a7dafc0df43883e2010
size 1676798532

image

and also checked Hash here
https://huggingface.co/Lightricks/LTX-Video/blob/main/vae/diffusion_pytorch_model.safetensors

Git LFS Details
SHA256: 265ca87cb5dff5e37f924286e957324e282fe7710a952a7dafc0df43883e2010
Pointer size: 135 Bytes
Size of remote file: 1.68 GB

Also compared config and both are same.
config.json

@DN6
Copy link
Collaborator

DN6 commented Dec 20, 2024

Oh looks like the ltx-video-2b-v0.9.1.safetensors version of the checkpoint has a different dimension for decoder.conv_in.conv.bias. The config we use to automatically setup LTXVideo was based on the 0.9.0 checkpoint. cc: @a-r-r-o-w we may need to set up a different repo for that version.

@nitinmukesh Can you try loading this checkpoint?
https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.safetensors

@nitinmukesh
Copy link
Author

nitinmukesh commented Dec 20, 2024

I checked the hash of all files and all seems good.


PS Lightricks\LTX-Video> .\Calculate-Hashes.ps1

FileName                                                                                             Hash
--------                                                                                             ----
Lightricks\LTX-Video\ltx-video-2b-v0.9.1.safetensors                                A23200896C5EDDF215C7CB9517820C5763A2B054EB62BA86CBCE6B871A4577E3
Lightricks\LTX-Video\model_index.json                                               7FEB8DA8AAA6606B5CE8552724B42B74FB09D499E00260B32105DD16EF74D68E
Lightricks\LTX-Video\scheduler\scheduler_config.json                                C8B56AB25A679BAB39C0F92D56FBA09C055A41EFA08745CA7A996D11CA7E76F5
Lightricks\LTX-Video\text_encoder\config.json                                       E8425C7FB4B09361C160EBA83F89F3B3186D69BB189A8D67AD6BEE8A160479A0
Lightricks\LTX-Video\text_encoder\model-00001-of-00004.safetensors                  7A68B2C8C080696A10109612A649BC69330991ECFEA65930CCFDFBDB011F2686
Lightricks\LTX-Video\text_encoder\model-00002-of-00004.safetensors                  B8ED6556D7507E38AF5B428C605FB2A6F2BDB7E80BD481308B865F7A40C551CA
Lightricks\LTX-Video\text_encoder\model-00003-of-00004.safetensors                  C831635F83041F83FAF0024B39C6ECB21B45D70DD38A63EA5BAC6C7C6E5E558C
Lightricks\LTX-Video\text_encoder\model-00004-of-00004.safetensors                  02A5F2D69205BE92AD48FE5D712D38C2FF55627969116AEFFC58BD75A28DA468
Lightricks\LTX-Video\text_encoder\model.safetensors.index.json                      A545BB25DC0F423D84BE7B577311BBA8BB7C6931F1EEFCEA65FC8B0A61A60A76
Lightricks\LTX-Video\tokenizer\added_tokens.json                                    EA5A91A3234F66EA642C8E672D67F0F493759A9BEE6910AE304EA9B9492118B5
Lightricks\LTX-Video\tokenizer\special_tokens_map.json                              7A1985A994C41886DB38C719D2A3D2F40606663CC19D7C5D6A85D349320E06D2
Lightricks\LTX-Video\tokenizer\spiece.model                                         D60ACB128CF7B7F2536E8F38A5B18A05535C9E14C7A355904270E15B0945EA86
Lightricks\LTX-Video\tokenizer\tokenizer_config.json                                E8C727FBDFBC495F9EC68FC4B89EDA5D7D0A800CC33D1A661AE208B20B50E17A
Lightricks\LTX-Video\transformer\config.json                                        F0FBA250EBD4E1C33ED035F1151AF44E069A3125FE1C3620E5196681F5F2B0CD
Lightricks\LTX-Video\transformer\diffusion_pytorch_model-00001-of-00002.safetensors 8ACD3E0BDA74F7434259A4543A324211DDD82580FCC727DF236B2414591EADC8
Lightricks\LTX-Video\transformer\diffusion_pytorch_model-00002-of-00002.safetensors 03B3C822C31E1A9E00F6F575AA1B6F3CC4CC3797F60DCCED537C8600BF1E9019
Lightricks\LTX-Video\transformer\diffusion_pytorch_model.safetensors.index.json     3A99763D8E06F985E6E4323A927B404C8354DADC41E1A693C5C5052DE591A630
Lightricks\LTX-Video\vae\config.json                                                E8F12AD5305B1508F6E456797D48AB8E9FDD091C686CCF814A293EE96CE89DBB
Lightricks\LTX-Video\vae\diffusion_pytorch_model.safetensors                        265CA87CB5DFF5E37F924286E957324E282FE7710A952A7DAFC0DF43883E2010

@nitinmukesh
Copy link
Author

nitinmukesh commented Dec 20, 2024

@DN6
Sorry I missed your message. Will check now.

@nitinmukesh
Copy link
Author

@DN6

yes it works with older version.

To make ltx-video-2b-v0.9.1.safetensors work, is it the config file need update or model changes. If it has to do with changes in JSON please let me know. I will make manual changes and check.

Thank you all for supporting with this issue.

@nitinmukesh
Copy link
Author

nitinmukesh commented Dec 20, 2024

Please also help how to use these in Diffusers for LTX-Video

  1. enhanced sampling via Spatiotemporal Skip Guidance (STG), and interpolation with precise frame settings.
  2. Image to video sample code
  3. As per above comment support 0.9.1 model

It will help users who don't come from developer background

@a-r-r-o-w
Copy link
Member

  1. Spatiotemporal guidance will be worked on soon and added to all video pipelines
  2. Image to video example is available in the example docstrings of the LTXImageToVideoPipeline and diffusers docs.
  3. We've opened a PR adding support for LTX 0.9.1 in [core] LTX Video 0.9.1 #10330. We're discussing with the Lightricks team to host the diffusers-formats weights, so it should be merged hopefully soon

LMK if we can help with anything 🤗

@nitinmukesh
Copy link
Author

nitinmukesh commented Dec 21, 2024

@a-r-r-o-w

Thank you for the updates.

  1. STGuidance will be supported in due time
    Support for LTX-Video diffusers pipeline junhahyung/STGuidance#13

  2. Image to video
    I referred to https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video which does not explains how to pass image in pipeline. I will refer to your link, thank you.

3.Appreciated. Will wait for the PR to merge and excited to use the new lightweight and better model.

@a-r-r-o-w
Copy link
Member

For 2, the same link has a reference to the image-to-video pipeline when scrolled lower: https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video#diffusers.LTXImageToVideoPipeline.__call__.example. Hope you're able to test it out now

@nitinmukesh
Copy link
Author

Regarding #2
I think documentation is incorrect
https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video#loading-single-files

The example is for LTXVideoPipeline and pipe is created using LTXImageToVideoPipeline. I think that confused me.

import torch
from diffusers import AutoencoderKLLTXVideo, LTXImageToVideoPipeline, LTXVideoTransformer3DModel

single_file_url = "https://huggingface.co/Lightricks/LTX-Video/ltx-video-2b-v0.9.safetensors"
transformer = LTXVideoTransformer3DModel.from_single_file(
  single_file_url, torch_dtype=torch.bfloat16
)
vae = AutoencoderKLLTXVideo.from_single_file(single_file_url, torch_dtype=torch.bfloat16)
pipe = LTXImageToVideoPipeline.from_pretrained(
  "Lightricks/LTX-Video", transformer=transformer, vae=vae, torch_dtype=torch.bfloat16
)

I figured using the example you provided and updated the code. Thank you.

@nitinmukesh
Copy link
Author

nitinmukesh commented Dec 21, 2024

@a-r-r-o-w
And TQVM for creating this project. Kudos to the team behind this.
Even a non-programmer like me can borrow code from examples and use different models (Sana and LTX for now) on low VRAM (8 GB). Thank you again.

Just waiting for more magic, I hope you guys have some up your sleeves. if HunyuanVideo can also work on 8 GB. Sorry for being greedy ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants