-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add device memory #2565
Add device memory #2565
Changes from 53 commits
8f08deb
5acdcb1
78324fd
3d0c20e
d8c07c8
e0e13bc
8a21f28
f7d3403
99148c7
b1cb7ab
ccd750f
96036de
7b00f1c
84609be
d232a82
a97cd62
0f29a5f
f14245a
077911c
e814a30
3304a31
608871c
da33d4f
a05cd33
202e663
a04a309
d27df87
13e1533
14fe045
a322b96
eb6f490
ad9f13f
3ae9b72
55b4b2e
dbae488
f56c2b1
65ad5db
753585f
6c9d215
b15d29c
163e4b5
c6ee6bc
4a0f8f3
92bec55
9177ec5
e00a3f9
6d3ada5
2f43fd7
940d864
b2732ff
a814e8c
8d4e9bd
45f02f3
2d621e2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,10 +58,10 @@ def get_regions() -> List[str]: | |
# We have to manually remove it. | ||
DEPRECATED_FAMILIES = ['standardNVSv2Family'] | ||
|
||
USEFUL_COLUMNS = [ | ||
USEFUL_COLUMNS = { | ||
'InstanceType', 'AcceleratorName', 'AcceleratorCount', 'vCPUs', 'MemoryGiB', | ||
'GpuInfo', 'Price', 'SpotPrice', 'Region', 'Generation' | ||
] | ||
'GpuInfo', 'Price', 'SpotPrice', 'Region', 'Generation', 'DeviceMemory' | ||
} | ||
|
||
|
||
def get_pricing_url(region: Optional[str] = None) -> str: | ||
|
@@ -244,11 +244,79 @@ def get_additional_columns(row): | |
axis='columns', | ||
) | ||
|
||
def create_gpu_map(df): | ||
# Map of Azure's machine with GPU to their corresponding memory | ||
# Result is hard-coded since Azure's API to not return such info | ||
# may be outdated so need to be maintained | ||
gpu_map = { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could we try to map instance type -> gpu name first and then calculate the resulting device memory later? this two-level approach might be cleaner. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree this approach is much cleaner as it uses much less hard-coding and utilize already fetched info. I will change the script to use this approach. |
||
'Standard_NC6': 12, | ||
'Standard_NC12': 24, | ||
'Standard_NC24': 48, | ||
'Standard_NC24r*': 48, | ||
'Standard_NC6s_v2': 16, | ||
'Standard_NC12s_v2': 32, | ||
'Standard_NC24s_v2': 64, | ||
'Standard_NC24rs_v2*': 64, | ||
'Standard_NC6s_v3': 16, | ||
'Standard_NC12s_v3': 32, | ||
'Standard_NC24s_v3': 32, | ||
'Standard_NC4as_T4_v3': 16, | ||
'Standard_NC8as_T4_v3': 16, | ||
'Standard_NC16as_T4_v3': 16, | ||
'Standard_NC64as_T4_v3': 64, | ||
'Standard_NC24ads_A100_v4': 80, | ||
'Standard_NC48ads_A100_v4': 160, | ||
'Standard_NC96ads_A100_v4': 320, | ||
'Standard_ND96asr_v4': 40, | ||
'Standard_ND96amsr_A100_v4': 80, | ||
'Standard_ND6s': 24, | ||
'Standard_ND12s': 48, | ||
'Standard_ND24s': 96, | ||
'Standard_ND24rs*': 96, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is there a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Its one of the instance type offered by azure. |
||
'Standard_ND40rs_v2': 32, | ||
'Standard_NG8ads_V620_v1': 8, | ||
'Standard_NG16ads_V620_v1': 16, | ||
'Standard_NG32ads_V620_v1': 32, | ||
'Standard_NG32adms_V620_v1': 32, | ||
'Standard_NV6': 8, | ||
'Standard_NV12': 16, | ||
'Standard_NV24': 32, | ||
'Standard_NV12s_v3': 8, | ||
'Standard_NV24s_v3': 16, | ||
'Standard_NV48s_v3': 32, | ||
'Standard_NV4as_v4': 2, | ||
'Standard_NV8as_v4': 4, | ||
'Standard_NV16as_v4': 8, | ||
'Standard_NV32as_v4': 16, | ||
'Standard_NV6ads_A10_v5': 4, | ||
'Standard_NV12ads_A10_v5': 8, | ||
'Standard_NV18ads_A10_v5': 12, | ||
'Standard_NV36ads_A10_v5': 24, | ||
'Standard_NV36adms_A10_v5': 24, | ||
'Standard_NV72ads_A10_v5': 48, | ||
'Standard_NV6_Promo': 16, | ||
'Standard_NV12_Promo': 32, | ||
'Standard_NV24_Promo': 48 | ||
} | ||
|
||
all_instance = df.InstanceType.unique() | ||
|
||
for instance in all_instance: | ||
if instance not in gpu_map: | ||
gpu_map[instance] = '' | ||
return gpu_map | ||
|
||
def map_device_memory(row, dic): | ||
return dic[row] | ||
|
||
before_drop_len = len(df_ret) | ||
df_ret.dropna(subset=['InstanceType'], inplace=True, how='all') | ||
after_drop_len = len(df_ret) | ||
print(f'Dropped {before_drop_len - after_drop_len} duplicated rows') | ||
|
||
df_ret['DeviceMemory'] = df_ret.InstanceType.apply( | ||
map_device_memory, args=(create_gpu_map(df_ret),)) | ||
|
||
# Filter out deprecated families | ||
df_ret = df_ret.loc[~df_ret['family'].isin(DEPRECATED_FAMILIES)] | ||
df_ret = df_ret[USEFUL_COLUMNS] | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -513,6 +513,23 @@ def get_catalog_df(region_prefix: str) -> pd.DataFrame: | |
# Round the prices. | ||
df['Price'] = df['Price'].round(PRICE_ROUNDING) | ||
df['SpotPrice'] = df['SpotPrice'].round(PRICE_ROUNDING) | ||
gpu_map = { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add a reference link with comment above? |
||
'L4': 24, | ||
'A100': 40, | ||
'A100-80GB': 80, | ||
'A100-40GB': 40, | ||
'T4': 16, | ||
'P4': 8, | ||
'V100': 16, | ||
'P100': 16, | ||
'K80': 12, | ||
'': '' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry why this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was a mistake on my end, I will have this removed. |
||
} | ||
|
||
df['DeviceMemory'] = df.apply( | ||
lambda row: gpu_map[row['AcceleratorName']] * row['AcceleratorCount'] | ||
if pd.notnull(row['AcceleratorName']) else np.nan, | ||
axis=1) | ||
return df | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a reference link on how these information are found? also, how did we make sure we cover all the instance types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked through the azure documentation to ensure that all instance type have been included and also ran the script on --all-regions and the result looked fine to me.
However, I think the approach you mentioned below makes more sense in which we map instance type -> gpu name then from gpu name -> gpu memory. There is already a mapping from instance type -> gpu name in the script. Assuming this mapping is complete, we can easily map the gpu name to their corresponding memory.