-
Notifications
You must be signed in to change notification settings - Fork 842
Description
Describe the bug
When using a custom Hugging Face Hub endpoint that includes an additional path segment (e.g. /hf),
the function repo_type_and_id_from_hf_id() incorrectly interprets that segment as a repo_type,
causing a ValueError: Unknown repo_type: 'hf'.
This happens because the function does not distinguish between the endpoint path suffix and the actual repository path.
Reproduction
from huggingface_hub.utils import repo_type_and_id_from_hf_id
hf_id = "http://localhost:3000/hf/models/test/my-model"
endpoint = "http://localhost:3000/hf"
repo_type_and_id_from_hf_id(hf_id, hub_url=endpoint)Result
ValueError: Unknown `repo_type`: 'hf' ('http://localhost:3000/hf/models/test/my-model')###Expected Result
The function should correctly extract:
("model", "test", "my-model")and recognize that /hf is part of the endpoint path, not a repo type.
Root Cause Analysis
Internally, the function does:
hub_url = re.sub(r"https?://", "", hub_url)
is_hf_url = hub_url in hf_id
url_segments = hf_id.split("/")
repo_type = url_segments[-3]Since hub_url is "localhost:3000/hf", and url_segments[-3] equals "hf" when the path is short (e.g. /hf/test/model),
the function misclassifies "hf" as repo_type instead of treating it as part of the base URL.
Suggested Fix
Strip the hub_url prefix before splitting into path segments,
so that only the portion after the endpoint is used for repo parsing.
Example patch (pseudo-code):
if hf_id.startswith(hub_url):
hf_id = hf_id[len(hub_url):].lstrip("/")
url_segments = hf_id.split("/")Logs
System info
- huggingface_hub version: 0.35.3