-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix - nvbandwidth benchmark need to handle N/A value #675
base: main
Are you sure you want to change the base?
Conversation
This reverts commit 3459eac.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #675 +/- ##
==========================================
- Coverage 85.61% 85.45% -0.16%
==========================================
Files 99 99
Lines 7165 7210 +45
==========================================
+ Hits 6134 6161 +27
- Misses 1031 1049 +18
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@polarG , do we also handle when we run few tests that are not valid for the underlying system. Such tests results in output as "waived".
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add test cases for the "N/A" and "Waived" cases
'Specify the test case(s) to run, either by name or index. By default, all test cases are executed. ' | ||
'Example: --test_cases 0,1,2,19,20' | ||
'Specify the test case(s) to execute, either by name or index. ' | ||
'To view the available test case names or indices, run the command nvbandwidth on the host. ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you provide the names directly or use a link to nvbandwidth document instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the test cases are not listed in the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
./nvbandwidth -l
nvbandwidth Version: v0.6
Built from Git version: v0.6
Index, Name:
Description
=======================
0, host_to_device_memcpy_ce:
Host to device CE memcpy using cuMemcpyAsync
1, device_to_host_memcpy_ce:
Device to host CE memcpy using cuMemcpyAsync
2, host_to_device_bidirectional_memcpy_ce:
A host to device copy is measured while a device to host copy is run simultaneously.
Only the host to device copy bandwidth is reported.
3, device_to_host_bidirectional_memcpy_ce:
A device to host copy is measured while a host to device copy is run simultaneously.
Only the device to host copy bandwidth is reported.
4, device_to_device_memcpy_read_ce:
Measures bandwidth of cuMemcpyAsync between each pair of accessible peers.
Read tests launch a copy from the peer device to the target using the target's context.
5, device_to_device_memcpy_write_ce:
Measures bandwidth of cuMemcpyAsync between each pair of accessible peers.
Write tests launch a copy from the target device to the peer using the target's context.
6, device_to_device_bidirectional_memcpy_read_ce:
Measures bandwidth of cuMemcpyAsync between each pair of accessible peers.
A copy in the opposite direction of the measured copy is run simultaneously but not measured.
Read tests launch a copy from the peer device to the target using the target's context.
7, device_to_device_bidirectional_memcpy_write_ce:
Measures bandwidth of cuMemcpyAsync between each pair of accessible peers.
A copy in the opposite direction of the measured copy is run simultaneously but not measured.
Write tests launch a copy from the target device to the peer using the target's context.
8, all_to_host_memcpy_ce:
Measures bandwidth of cuMemcpyAsync between a single device and the host while simultaneously
running copies from all other devices to the host.
9, all_to_host_bidirectional_memcpy_ce:
A device to host copy is measured while a host to device copy is run simultaneously.
Only the device to host copy bandwidth is reported.
All other devices generate simultaneous host to device and device to host interferring traffic.
10, host_to_all_memcpy_ce:
Measures bandwidth of cuMemcpyAsync between the host to a single device while simultaneously
running copies from the host to all other devices.
11, host_to_all_bidirectional_memcpy_ce:
A host to device copy is measured while a device to host copy is run simultaneously.
Only the host to device copy bandwidth is reported.
All other devices generate simultaneous host to device and device to host interferring traffic.
12, all_to_one_write_ce:
Measures the total bandwidth of copies from all accessible peers to a single device, for each
device. Bandwidth is reported as the total inbound bandwidth for each device.
Write tests launch a copy from the target device to the peer using the target's context.
13, all_to_one_read_ce:
Measures the total bandwidth of copies from all accessible peers to a single device, for each
device. Bandwidth is reported as the total outbound bandwidth for each device.
Read tests launch a copy from the peer device to the target using the target's context.
14, one_to_all_write_ce:
Measures the total bandwidth of copies from a single device to all accessible peers, for each
device. Bandwidth is reported as the total outbound bandwidth for each device.
Write tests launch a copy from the target device to the peer using the target's context.
15, one_to_all_read_ce:
Measures the total bandwidth of copies from a single device to all accessible peers, for each
device. Bandwidth is reported as the total inbound bandwidth for each device.
Read tests launch a copy from the peer device to the target using the target's context.
16, host_to_device_memcpy_sm:
Host to device SM memcpy using a copy kernel
17, device_to_host_memcpy_sm:
Device to host SM memcpy using a copy kernel
18, device_to_device_memcpy_read_sm:
Measures bandwidth of a copy kernel between each pair of accessible peers.
Read tests launch a copy from the peer device to the target using the target's context.
19, device_to_device_memcpy_write_sm:
Measures bandwidth of a copy kernel between each pair of accessible peers.
Write tests launch a copy from the target device to the peer using the target's context.
20, device_to_device_bidirectional_memcpy_read_sm:
Measures bandwidth of a copy kernel between each pair of accessible peers. Copies are run
in both directions between each pair, and the sum is reported.
Read tests launch a copy from the peer device to the target using the target's context.
21, device_to_device_bidirectional_memcpy_write_sm:
Measures bandwidth of a copy kernel between each pair of accessible peers. Copies are run
in both directions between each pair, and the sum is reported.
Write tests launch a copy from the target device to the peer using the target's context.
22, all_to_host_memcpy_sm:
Measures bandwidth of a copy kernel between a single device and the host while simultaneously
running copies from all other devices to the host.
23, all_to_host_bidirectional_memcpy_sm:
A device to host bandwidth of a copy kernel is measured while a host to device copy is run simultaneously.
Only the device to host copy bandwidth is reported.
All other devices generate simultaneous host to device and device to host interferring traffic using copy kernels.
24, host_to_all_memcpy_sm:
Measures bandwidth of a copy kernel between the host to a single device while simultaneously
running copies from the host to all other devices.
25, host_to_all_bidirectional_memcpy_sm:
A host to device bandwidth of a copy kernel is measured while a device to host copy is run simultaneously.
Only the host to device copy bandwidth is reported.
All other devices generate simultaneous host to device and device to host interferring traffic using copy kernels.
26, all_to_one_write_sm:
Measures the total bandwidth of copies from all accessible peers to a single device, for each
device. Bandwidth is reported as the total inbound bandwidth for each device.
Write tests launch a copy from the target device to the peer using the target's context.
27, all_to_one_read_sm:
Measures the total bandwidth of copies from all accessible peers to a single device, for each
device. Bandwidth is reported as the total outbound bandwidth for each device.
Read tests launch a copy from the peer device to the target using the target's context.
28, one_to_all_write_sm:
Measures the total bandwidth of copies from a single device to all accessible peers, for each
device. Bandwidth is reported as the total outbound bandwidth for each device.
Write tests launch a copy from the target device to the peer using the target's context.
29, one_to_all_read_sm:
Measures the total bandwidth of copies from a single device to all accessible peers, for each
device. Bandwidth is reported as the total inbound bandwidth for each device.
Read tests launch a copy from the peer device to the target using the target's context.
30, host_device_latency_sm:
Host - device SM copy latency using a ptr chase kernel
31, device_to_device_latency_sm:
Measures latency of a pointer derefernce operation between each pair of accessible peers.
Memory is allocated on a GPU and is accessed by the peer GPU to determine latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is from v0.6, The list might change though with the next version having multinode test cases. for eg: multinode_device_to_device_...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the list might change in the future, using the index to select test cases is not a reliable choice.
I will let the benchmark accept the testcase name only.
Good point! I will try to catch this in the code. |
Its better to show the waived test in the report in line with how other failed benchmarks are treated. |
Looks good to me. One more question for docs: Any documents need to be updated to align this change? |
Description