Automatically infer the PyTorch index via `--torch-backend=auto` #12070

charliermarsh · 2025-03-09T01:47:30Z

Summary

This is a prototype that I'm considering shipping under --preview, based on light-the-torch.

light-the-torch patches pip to pull PyTorch packages from the PyTorch indexes automatically. And, in particular, light-the-torch will query the installed CUDA drivers to determine which indexes are compatible with your system.

This PR implements equivalent behavior under --torch-backend auto, though you can also set --torch-backend cpu, etc. for convenience. When enabled, the registry client will fetch from the appropriate PyTorch index when it sees a package from the PyTorch ecosystem (and ignore any other configured indexes, unless the package is explicitly pinned to a different index).

Right now, this is only implemented in the uv pip CLI, since it doesn't quite fit into the lockfile APIs given that it relies on feature detection on the currently-running machine.

konstin · 2025-03-10T20:39:11Z

crates/uv-python/python/get_interpreter_info.py

+        result = subprocess.run(
+            [
+                "nvidia-smi",
+                "--query-gpu=driver_version",
+                "--format=csv",
+            ],
+            check=True,
+            capture_output=True,
+            text=True,
+        )


This ties a machine-wide information (nvidia-smi output) to each interpreter, even though this information can changes through different operations than an interpreter (updating CUDA vs. updating the Python interpreter binary).

I think we should get this via /sys/module/nvidia/version, and also not cache it - it ought to be pretty cheap to do one file read whenever we need it, much faster than actually loading the CUDA libraries and doing stuff. (I don't think there's a good way to get proactively notified if it changes; you can of course invalidate on a reboot but you can also unload/load drivers without a reboot.) I think I've also run into cases where nvidia-smi isn't installed right but the actual kernel driver is fine.

That is to say, I think this logic should move out of the Python interpreter discovery code and into the Rust code at the point where we need it.

Minor point but I want to mention it because the terminology is confusing: the information we're specifically getting here is the driver version, not the CUDA version. PyTorch ships the relevant CUDA runtime (libcudart.so.12 or .11 or whatever) and it doesn't have to match the CUDA version installed systemwide (if any). libcudart, in turn, requires a libcuda.so.1 from either the systemwide driver installation or a "cuda-compat" package if libcudart.so.N is sufficiently newer than the system version of libcuda.so.1. (So you could come up with a scheme where libcuda.so.1 itself is also distributed via e.g. a wheel and so everything is decoupled from the system except the kernel driver, though I don't remember off hand whether NVIDIA's license allows redistributing it. This sort of setup is particularly helpful for containerized environments, where it's annoying that the "driver" installation is split between a kernel driver, which is trivially accessible in the container, and the userspace libcuda.so.1, which requires more effort to bind mount into the container.)

konstin · 2025-03-10T20:40:04Z

crates/uv-python/python/get_interpreter_info.py

+            capture_output=True,
+            text=True,
+        )
+        return result.stdout.splitlines()[-1]


This could raise an IndexError

DEKHTIARJonathan · 2025-03-10T21:42:13Z

@charliermarsh : Adapted from a few different sources - namely conda

I hope that illustrates my point better - why you need a plugin interface and you don't want to be the person responsible to maintain that 👍

# Copyright (C) 2012 Anaconda, Inc
# SPDX-License-Identifier: BSD-3-Clause
"""Detect CUDA version."""

import ctypes
import functools
import itertools
import multiprocessing
import os
import platform
from contextlib import suppress
from dataclasses import dataclass
from typing import Optional


@dataclass()
class CudaVersion:
    version: str
    architectures: list[str]


def cuda_version() -> Optional[CudaVersion]:
    # Do not inherit file descriptors and handles from the parent process.
    # The `fork` start method should be considered unsafe as it can lead to
    # crashes of the subprocess. The `spawn` start method is preferred.
    context = multiprocessing.get_context("spawn")
    queue = context.SimpleQueue()
    # Spawn a subprocess to detect the CUDA version
    detector = context.Process(
        target=_cuda_detector_target,
        args=(queue,),
        name="CUDA driver version detector",
        daemon=True,
    )
    try:
        detector.start()
        detector.join(timeout=60.0)
    finally:
        # Always cleanup the subprocess
        detector.kill()  # requires Python 3.7+

    if queue.empty():
        return None

    result = queue.get()
    if result:
        driver_version, architectures = result.split(";")
        result = CudaVersion(driver_version, architectures.split(","))
    return result


@functools.lru_cache(maxsize=None)
def cached_cuda_version():
    return cuda_version()


def _cuda_detector_target(queue):
    """
    Attempt to detect the version of CUDA present in the operating system in a
    subprocess.

    On Windows and Linux, the CUDA library is installed by the NVIDIA
    driver package, and is typically found in the standard library path,
    rather than with the CUDA SDK (which is optional for running CUDA apps).

    On macOS, the CUDA library is only installed with the CUDA SDK, and
    might not be in the library path.

    Returns: version string with CUDA version first, then a set of unique SM's for the GPUs present in the system
             (e.g., '12.4;8.6,9.0') or None if CUDA is not found.
             The result is put in the queue rather than a return value.
    """
    # Platform-specific libcuda location
    system = platform.system()
    if system == "Darwin":
        lib_filenames = [
            "libcuda.1.dylib",  # check library path first
            "libcuda.dylib",
            "/usr/local/cuda/lib/libcuda.1.dylib",
            "/usr/local/cuda/lib/libcuda.dylib",
        ]
    elif system == "Linux":
        lib_filenames = [
            "libcuda.so",  # check library path first
            "/usr/lib64/nvidia/libcuda.so",  # RHEL/Centos/Fedora
            "/usr/lib/x86_64-linux-gnu/libcuda.so",  # Ubuntu
            "/usr/lib/wsl/lib/libcuda.so",  # WSL
        ]
        # Also add libraries with version suffix `.1`
        lib_filenames = list(
            itertools.chain.from_iterable((f"{lib}.1", lib) for lib in lib_filenames)
        )
    elif system == "Windows":
        bits = platform.architecture()[0].replace("bit", "")  # e.g. "64" or "32"
        lib_filenames = [f"nvcuda{bits}.dll", "nvcuda.dll"]
    else:
        queue.put(None)  # CUDA not available for other operating systems
        return

    # Open library
    if system == "Windows":
        dll = ctypes.windll
    else:
        dll = ctypes.cdll
    for lib_filename in lib_filenames:
        with suppress(Exception):
            libcuda = dll.LoadLibrary(lib_filename)
            break
    else:
        queue.put(None)
        return

    # Empty `CUDA_VISIBLE_DEVICES` can cause `cuInit()` returns `CUDA_ERROR_NO_DEVICE`
    # Invalid `CUDA_VISIBLE_DEVICES` can cause `cuInit()` returns `CUDA_ERROR_INVALID_DEVICE`
    # Unset this environment variable to avoid these errors
    os.environ.pop("CUDA_VISIBLE_DEVICES", None)

    # Get CUDA version
    try:
        cuInit = libcuda.cuInit
        flags = ctypes.c_uint(0)
        ret = cuInit(flags)
        if ret != 0:
            queue.put(None)
            return

        cuDriverGetVersion = libcuda.cuDriverGetVersion
        version_int = ctypes.c_int(0)
        ret = cuDriverGetVersion(ctypes.byref(version_int))
        if ret != 0:
            queue.put(None)
            return

        # Convert version integer to version string
        value = version_int.value
        version_value = f"{value // 1000}.{(value % 1000) // 10}"

        count = ctypes.c_int(0)
        libcuda.cuDeviceGetCount(ctypes.pointer(count))

        architectures = set()
        for device in range(count.value):
            major = ctypes.c_int(0)
            minor = ctypes.c_int(0)
            libcuda.cuDeviceComputeCapability(
                ctypes.pointer(major),
                ctypes.pointer(minor),
                device)
            architectures.add(f"{major.value}.{minor.value}")
        queue.put(f"{version_value};{','.join(architectures)}")
    except Exception:
        queue.put(None)
        return

if __name__ == "__main__":
    print(cuda_version())

konstin · 2025-03-11T12:00:01Z

crates/uv-torch/src/lib.rs

+                | "torchserve"
+                | "torchtext"
+                | "torchvision"
+                | "pytorch-triton"


Can we add this list to some documentation? Reading the high-level overview I didn't realize we were hardcoding a package list.

Can we generate this by querying the PyTorch indices to see what they have? (Maybe a manually-run script that queries them and updates this list, or an automatically-run integration tests that makes sure this list is in sync with what's currently on their indices?)

Along those lines it would be helpful to have this list somewhere declarative. It might also be helpful to allow user-controlled overrides of this list if the set of packages changes.

geofft

I think this is a great idea.

Would it be worth naming this feature something like uv-specialized-index instead of uv-torch with an eye to extending it to other libraries in the future? (jaxlib and tensorflow, for instance, have current/popular versions on PyPI, but I think also have their own indees)?

geofft · 2025-03-11T17:12:36Z

crates/uv-python/python/get_interpreter_info.py

+        result = subprocess.run(
+            [
+                "nvidia-smi",
+                "--query-gpu=driver_version",
+                "--format=csv",
+            ],
+            check=True,
+            capture_output=True,
+            text=True,
+        )


I think we should get this via /sys/module/nvidia/version, and also not cache it - it ought to be pretty cheap to do one file read whenever we need it, much faster than actually loading the CUDA libraries and doing stuff. (I don't think there's a good way to get proactively notified if it changes; you can of course invalidate on a reboot but you can also unload/load drivers without a reboot.) I think I've also run into cases where nvidia-smi isn't installed right but the actual kernel driver is fine.

That is to say, I think this logic should move out of the Python interpreter discovery code and into the Rust code at the point where we need it.

Minor point but I want to mention it because the terminology is confusing: the information we're specifically getting here is the driver version, not the CUDA version. PyTorch ships the relevant CUDA runtime (libcudart.so.12 or .11 or whatever) and it doesn't have to match the CUDA version installed systemwide (if any). libcudart, in turn, requires a libcuda.so.1 from either the systemwide driver installation or a "cuda-compat" package if libcudart.so.N is sufficiently newer than the system version of libcuda.so.1. (So you could come up with a scheme where libcuda.so.1 itself is also distributed via e.g. a wheel and so everything is decoupled from the system except the kernel driver, though I don't remember off hand whether NVIDIA's license allows redistributing it. This sort of setup is particularly helpful for containerized environments, where it's annoying that the "driver" installation is split between a kernel driver, which is trivially accessible in the container, and the userspace libcuda.so.1, which requires more effort to bind mount into the container.)

geofft · 2025-03-11T17:23:36Z

crates/uv/tests/it/lock.rs

-      ╰─▶ Because anyio was not found in the provided package locations and your project depends on anyio, we can conclude that your project's requirements are unsatisfiable.
-
-          hint: Packages were unavailable because index lookups were disabled and no additional package locations were provided (try: `--find-links <uri>`)
+      ╰─▶ Because anyio was not found in the package registry and your project depends on anyio, we can conclude that your project's requirements are unsatisfiable.


Is this change fine? (As in, are there real-world users who would have benefited from the hint and are losing it?)

geofft · 2025-03-11T17:24:46Z

docs/reference/cli.md

+
+<p>The <code>auto</code> mode will attempt to detect the appropriate <code>PyTorch</code> index based on the currently installed CUDA drivers.</p>
+
+<p>Possible values:</p>


I wonder if it would be helpful to put the full giant list behind a <summary>...</summary>.

(Generally non-trivial because this is generated and then rendered via mkdocs)

geofft · 2025-03-11T17:26:11Z

crates/uv-torch/src/lib.rs

+                | "torchserve"
+                | "torchtext"
+                | "torchvision"
+                | "pytorch-triton"


Can we generate this by querying the PyTorch indices to see what they have? (Maybe a manually-run script that queries them and updates this list, or an automatically-run integration tests that makes sure this list is in sync with what's currently on their indices?)

Along those lines it would be helpful to have this list somewhere declarative. It might also be helpful to allow user-controlled overrides of this list if the set of packages changes.

samypr100 · 2025-03-12T01:34:11Z

I think this is a great idea.

Would it be worth naming this feature something like uv-specialized-index instead of uv-torch with an eye to extending it to other libraries in the future? (jaxlib and tensorflow, for instance, have current/popular versions on PyPI, but I think also have their own indees)?

I had a similar thought, I think this is one of many cases. Also considering when such indexes are mirrored or vendored internally. I was thinking what would be the right naming. I know some avenues refers to this as a suffixed index, so maybe uv-suffixed-index? Same with --torch-backend, maybe something more generic of it's intent would be more future proof, such as --index-suffix

samypr100 · 2025-03-12T01:50:16Z

though I don't remember off hand whether NVIDIA's license allows redistributing it

~~iirc this is no longer an issue with the new open source drivers (e.g. nvidia-driver-{ver}-open)~~

Nevermind, didn't notice you were referring to CUDA.

I think we should get this via /sys/module/nvidia/version

💯 In my experience nvidia-smi can also take a long time depending on gpu load.

Although there multiple locations depending on how (e.g. dkms) and environment (windows, osx) it's installed. For example, WSL 2 its even weirder due to the shared drivers with the host situation. So nvidia-smi might be the most sure-fire low risk way (assuming no issues with install).

charliermarsh · 2025-03-12T02:19:36Z

Definitely agree with moving this out of the interpreter query (and possibly reading it from outside nvidia-smi -- I need to do some research).

I'm a little wary of trying to brand this as something more general than torch, because I'll likely want to reconsider the mechanism and design entirely as we generalize it. So it seems nice to keep it as an experimental torch-specific feature, then modify it as we generalize.

charliermarsh temporarily deployed to release March 9, 2025 01:47 — with GitHub Actions Inactive

charliermarsh force-pushed the charlie/ltt branch from 7742e65 to 232dc9b Compare March 9, 2025 01:51

charliermarsh temporarily deployed to release March 9, 2025 01:51 — with GitHub Actions Inactive

charliermarsh added the no-build Disable building binaries in CI label Mar 9, 2025

charliermarsh force-pushed the charlie/ltt branch 3 times, most recently from 9d06dfb to 3e85795 Compare March 9, 2025 02:30

Demonstrate automatic GPU detection

15108f0

charliermarsh requested review from zanieb and konstin March 10, 2025 20:28

Fix up

5220a90

charliermarsh force-pushed the charlie/ltt branch from 3e85795 to 5220a90 Compare March 10, 2025 20:29

konstin reviewed Mar 10, 2025

View reviewed changes

konstin reviewed Mar 11, 2025

View reviewed changes

geofft reviewed Mar 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically infer the PyTorch index via `--torch-backend=auto` #12070

Automatically infer the PyTorch index via `--torch-backend=auto` #12070

charliermarsh commented Mar 9, 2025

konstin Mar 10, 2025

geofft Mar 11, 2025

konstin Mar 10, 2025

DEKHTIARJonathan commented Mar 10, 2025 •

edited

Loading

konstin Mar 11, 2025

geofft Mar 11, 2025

geofft left a comment

geofft Mar 11, 2025

geofft Mar 11, 2025

geofft Mar 11, 2025

zanieb Mar 11, 2025

geofft Mar 11, 2025

samypr100 commented Mar 12, 2025

samypr100 commented Mar 12, 2025 •

edited

Loading

charliermarsh commented Mar 12, 2025


		<p>The <code>auto</code> mode will attempt to detect the appropriate <code>PyTorch</code> index based on the currently installed CUDA drivers.</p>

		<p>Possible values:</p>

Automatically infer the PyTorch index via --torch-backend=auto #12070

Are you sure you want to change the base?

Automatically infer the PyTorch index via --torch-backend=auto #12070

Conversation

charliermarsh commented Mar 9, 2025

Summary

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DEKHTIARJonathan commented Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geofft left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samypr100 commented Mar 12, 2025

samypr100 commented Mar 12, 2025 • edited Loading

charliermarsh commented Mar 12, 2025

Automatically infer the PyTorch index via `--torch-backend=auto` #12070

Automatically infer the PyTorch index via `--torch-backend=auto` #12070

DEKHTIARJonathan commented Mar 10, 2025 •

edited

Loading

samypr100 commented Mar 12, 2025 •

edited

Loading