Description
🚀 The feature, motivation and pitch
Sometimes it's beneficial to directly stream checkpoint data to a cloud storage (rather than dump it localy and have some background process handle sync/upload/cleanup) or load weights from s3://
or gs://
path checkpoint weights. I wonder if this also can be related to the recent native HFStorageReader support #154518
I also found that torchsnapshot library (seems abandonned now - last commit 6 months ago) supports this: https://docs.pytorch.org/torchsnapshot/main/getting_started.html
Also there exist a fairly popular packge fsspec which itself wraps some cloud storage libraries and provides caching functionalities, there was some discussion in torchsnapshot on supporting in natively:
And some older (before HF) discussions:
- Proxy/cache server option/hooks for downloading model checkpoints and dataset archive files in cloud environment #91965
- Potential race conditions between multiple workers trying to download and cache the same file in torch.hub.load_state_dict_from_url and torch.hub.download_url_to_file <- duplicate dataset/model downloads across DDP workers #68320
I wonder if some HF utils on checkpointing / HF Hub blob caching structure could be upstreamed to PyTorch. E.g. for loading pretrained weights, this should be good. And maybe some hf://
management/caching could be made to plug into fsspec interface? Then hf://
could be used in all fsspec-using places.
Alternatives
No response
Additional context
No response
cc @mruberry @mikaylagawarecki @LucasLLC @pradeepfn @MeetVadakkanchery @mhorowitz @ekr0