Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error reports duringload_from_checkpoint #114

Open
JackeyLee007 opened this issue Jul 22, 2024 · 3 comments
Open

Error reports duringload_from_checkpoint #114

JackeyLee007 opened this issue Jul 22, 2024 · 3 comments

Comments

@JackeyLee007
Copy link

JackeyLee007 commented Jul 22, 2024

First I call the model loading from checkpoint

model.load_from_checkpoint(repo_id="google/timesfm-1.0-200m")

While loading, it reports the following error

ERROR:absl:For checkpoint version > 1.0, we require users to provide
          `train_state_unpadded_shape_dtype_struct` during checkpoint
          saving/restoring, to avoid potential silent bugs when loading
          checkpoints to incompatible unpadded shapes of TrainState.
@JackeyLee007 JackeyLee007 changed the title Error reports wehen load_from_checkpoint Error reports when load_from_checkpoint Jul 23, 2024
@JackeyLee007 JackeyLee007 changed the title Error reports when load_from_checkpoint Error reports duringload_from_checkpoint Jul 23, 2024
@godcrying
Copy link

me too.

@siriuz42
Copy link
Collaborator

This error is not blocking. Can you wait and see if the jitting succeeds?

@guiyang882
Copy link

2024-09-11 14:50:46.515527: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
Constructing model weights.
Constructed model weights in 2.43 seconds.
Restoring checkpoint from /disk01/timesfm_repo/timesfm-1.0-200m/checkpoints.
WARNING:absl:No registered CheckpointArgs found for handler type: <class 'paxml.checkpoints.FlaxCheckpointHandler'>
WARNING:absl:Configured `CheckpointManager` using deprecated legacy API. Please follow the instructions at https://orbax.readthedocs.io/en/latest/api_refactor.html to migrate by May 1st, 2024.
WARNING:absl:train_state_unpadded_shape_dtype_struct is not provided. We assume `train_state` is unpadded.
ERROR:absl:For checkpoint version > 1.0, we require users to provide
          `train_state_unpadded_shape_dtype_struct` during checkpoint
          saving/restoring, to avoid potential silent bugs when loading
          checkpoints to incompatible unpadded shapes of TrainState.
Restored checkpoint in 3.91 seconds.
Jitting decoding.
Jitted decoding in 21.69 seconds.

how to fix the ERROR message train_state_unpadded_shape_dtype_struct ?
by the way, could you provide pytorch release version checkpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants