Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastLanguageModel.from_pretrained fails validate_repo_id in huggingface_hub #1222

Open
AndreBremer opened this issue Oct 30, 2024 · 10 comments
Labels
currently fixing Am fixing now!

Comments

@AndreBremer
Copy link

This bug popped up post October-2024 tag. A glob file pattern makes it into HF's repo ID validator which causes it to fail.

Example:

    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name="unsloth/Llama-3.2-1B-Instruct-bnb-4bit"
        ...
    )

Result:

huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'unsloth/Llama-3.2-1B-Instruct-bnb-4bit\*.json'.

@danielhanchen
Copy link
Contributor

@Erland366 maybe related to the recent HF upload change maybe?

@Erland366
Copy link
Contributor

image

Hmmm I can't seem to replicate them. I'll investigate more but maybe you have any new things happening here?

@AndreBremer
Copy link
Author

Found the issue. It's Windows-specific. HfFileSystem doesn't handle backwards slashes in the path correctly and therefore mistakes it for a Repo ID. This only occurs on Windows with os.path.join.

Changing loader.py:213 from:

files = HfFileSystem(token = token).glob(os.path.join(model_name, "*.json"))

to something like:

files = HfFileSystem(token = token).glob(f"{model_name}/*.json")

fixes the issue.

@RealOfficialTurf
Copy link

Has anyone gotten around fixing this already? Am running on Windows and this also happens to me. The fix given by the author has worked for me, but I'd like to see this fix being implemented in the repo.

@rajan-fredrick-04
Copy link

Found the issue. It's Windows-specific. HfFileSystem doesn't handle backwards slashes in the path correctly and therefore mistakes it for a Repo ID. This only occurs on Windows with os.path.join.

Changing loader.py:213 from:

files = HfFileSystem(token = token).glob(os.path.join(model_name, "*.json"))

to something like:

files = HfFileSystem(token = token).glob(f"{model_name}/*.json")

fixes the issue.

Can anyone tell me where to do this?

@rajan-fredrick-04
Copy link

Has anyone gotten around fixing this already? Am running on Windows and this also happens to me. The fix given by the author has worked for me, but I'd like to see this fix being implemented in the repo.

@RealOfficialTurf Can you tell me where to change the needed?

@Erland366
Copy link
Contributor

It's here @rajan-fredrick-04

files = HfFileSystem(token = token).glob(os.path.join(model_name, "*.json"))

You can directly modify this until unsloth officially fix this .-.

@danielhanchen danielhanchen added the currently fixing Am fixing now! label Dec 12, 2024
@danielhanchen
Copy link
Contributor

@Erland366 Oh yes someone else make an issue about this - could you open a PR for this - thanks a lot!

@Erland366
Copy link
Contributor

#1307 FIx this in this PR

@SergioRubio01
Copy link

Same problem here. I have tried to do it but still am not able to fix it. Is it already fix by any chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
currently fixing Am fixing now!
Projects
None yet
Development

No branches or pull requests

6 participants