Incorrect GPU Specification and Machine Type Mapping for A100 in Vertex API #37

jeffhernandez1995 · 2023-12-12T18:21:31Z

Hello,

I'd like to express my appreciation for the xmanager tool! However, I've noticed a couple of issues regarding the specification of the A100 GPU and its associated machine types in the Vertex API, which I'd like to bring to your attention:

GPU Naming Discrepancy:
According to the Google Cloud resource documentation, the correct name for the A100 GPU with 80GB is A100_80GB, not A100_80GIB. This naming inconsistency leads to an error when requesting this resource. Reference: Google Cloud Documentation . Additionally, I've attached an image from the documentation.
Incorrect API Call Formation:
When the A100_80GIB is referenced in the Vertex API, it results in a string like 'NVIDIA_TESLA_A100_80GIB', whereas it should be NVIDIA_A100_80GB. I believe this error stems from the line: accelerator_type = 'NVIDIA_TESLA_' + str(resource).upper() in the vertex.py script .
Machine Type Mismatch:
The A100_80GB GPU should be associated with machine types such as 'a2-ultragpu-1g', 'a2-ultragpu-2g', 'a2-ultragpu-4g', and 'a2-ultragpu-8g'. However, the current specification only attempts to map A100 GPUs to the following machine types:

_A100_GPUS_TO_MACHINE_TYPE = {
    1: 'a2-highgpu-1g',
    2: 'a2-highgpu-2g',
    4: 'a2-highgpu-4g',
    8: 'a2-highgpu-8g',
    16: 'a2-megagpu-16g',
}

Thank you for your attention to this matter.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect GPU Specification and Machine Type Mapping for A100 in Vertex API #37

Incorrect GPU Specification and Machine Type Mapping for A100 in Vertex API #37

jeffhernandez1995 commented Dec 12, 2023

Incorrect GPU Specification and Machine Type Mapping for A100 in Vertex API #37

Incorrect GPU Specification and Machine Type Mapping for A100 in Vertex API #37

Comments

jeffhernandez1995 commented Dec 12, 2023