You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like to express my appreciation for the xmanager tool! However, I've noticed a couple of issues regarding the specification of the A100 GPU and its associated machine types in the Vertex API, which I'd like to bring to your attention:
GPU Naming Discrepancy:
According to the Google Cloud resource documentation, the correct name for the A100 GPU with 80GB is A100_80GB, not A100_80GIB. This naming inconsistency leads to an error when requesting this resource. Reference: Google Cloud Documentation . Additionally, I've attached an image from the documentation.
Incorrect API Call Formation:
When the A100_80GIB is referenced in the Vertex API, it results in a string like 'NVIDIA_TESLA_A100_80GIB', whereas it should be NVIDIA_A100_80GB. I believe this error stems from the line: accelerator_type = 'NVIDIA_TESLA_' + str(resource).upper() in the vertex.py script .
Machine Type Mismatch:
The A100_80GB GPU should be associated with machine types such as 'a2-ultragpu-1g', 'a2-ultragpu-2g', 'a2-ultragpu-4g', and 'a2-ultragpu-8g'. However, the current specification only attempts to map A100 GPUs to the following machine types:
Hello,
I'd like to express my appreciation for the xmanager tool! However, I've noticed a couple of issues regarding the specification of the A100 GPU and its associated machine types in the Vertex API, which I'd like to bring to your attention:
GPU Naming Discrepancy:
![Documentation Screenshot](https://private-user-images.githubusercontent.com/26931037/289958111-87f90633-208e-4b1e-a2b9-32f2c92e8773.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg4ODA0MzIsIm5iZiI6MTczODg4MDEzMiwicGF0aCI6Ii8yNjkzMTAzNy8yODk5NTgxMTEtODdmOTA2MzMtMjA4ZS00YjFlLWEyYjktMzJmMmM5MmU4NzczLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA2VDIyMTUzMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdjNGRkYWQ4YjA2OGU1MGRjYjc2YjEwODA1OTk4NTlmOGIwYjgzMzU2Nzg4M2EzZjNhODY3ZjU5MzdjMjgwMjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.FbhIRq0qMFZYIqZY6XUUTQVZcSGLARX1W4DwFiTppvg)
According to the Google Cloud resource documentation, the correct name for the A100 GPU with 80GB is
A100_80GB
, notA100_80GIB
. This naming inconsistency leads to an error when requesting this resource. Reference: Google Cloud Documentation . Additionally, I've attached an image from the documentation.Incorrect API Call Formation:
When the
A100_80GIB
is referenced in the Vertex API, it results in a string like'NVIDIA_TESLA_A100_80GIB'
, whereas it should beNVIDIA_A100_80GB
. I believe this error stems from the line:accelerator_type = 'NVIDIA_TESLA_' + str(resource).upper()
in the vertex.py script .Machine Type Mismatch:
The A100_80GB GPU should be associated with machine types such as
'a2-ultragpu-1g'
,'a2-ultragpu-2g'
,'a2-ultragpu-4g'
, and'a2-ultragpu-8g'
. However, the current specification only attempts to map A100 GPUs to the following machine types:Thank you for your attention to this matter.
The text was updated successfully, but these errors were encountered: