Skip to content

repo_type_and_id_from_hf_id() misinterprets custom endpoint paths as repo_type (e.g. "/hf") #3494

@pulltheflower

Description

@pulltheflower

Describe the bug

When using a custom Hugging Face Hub endpoint that includes an additional path segment (e.g. /hf),
the function repo_type_and_id_from_hf_id() incorrectly interprets that segment as a repo_type,
causing a ValueError: Unknown repo_type: 'hf'.

This happens because the function does not distinguish between the endpoint path suffix and the actual repository path.

Reproduction

from huggingface_hub.utils import repo_type_and_id_from_hf_id

hf_id = "http://localhost:3000/hf/models/test/my-model"
endpoint = "http://localhost:3000/hf"

repo_type_and_id_from_hf_id(hf_id, hub_url=endpoint)

Result

ValueError: Unknown `repo_type`: 'hf' ('http://localhost:3000/hf/models/test/my-model')

###Expected Result

The function should correctly extract:

("model", "test", "my-model")

and recognize that /hf is part of the endpoint path, not a repo type.

Root Cause Analysis

Internally, the function does:

hub_url = re.sub(r"https?://", "", hub_url)
is_hf_url = hub_url in hf_id
url_segments = hf_id.split("/")
repo_type = url_segments[-3]

Since hub_url is "localhost:3000/hf", and url_segments[-3] equals "hf" when the path is short (e.g. /hf/test/model),
the function misclassifies "hf" as repo_type instead of treating it as part of the base URL.

Suggested Fix

Strip the hub_url prefix before splitting into path segments,
so that only the portion after the endpoint is used for repo parsing.

Example patch (pseudo-code):

if hf_id.startswith(hub_url):
    hf_id = hf_id[len(hub_url):].lstrip("/")
url_segments = hf_id.split("/")

Logs

System info

- huggingface_hub version: 0.35.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions