fabtests/rdm_atomic, ubertest: move atomic verification to common code and use in functional test #10155

aingerson · 2024-07-08T20:51:18Z

Ubertest copies the software implementation of atomics from libfabric in order to add atomic operation data validation. This is done by saving the sent data and performing a local atomic in order to validate against the actual data calculated by the libfabric provider.
This patch set moves that implementation from ubertest to the common code for other tests (specifically the fi_rdm_atomic test) to leverage it.
Ubertest atomics with validation is not run in the CI by default but he rdm_atomic test is. Because of the lack of validation in the functional fabtest, an incorrect fetched buffer pointer was never caught. This patch set also enables running rdm_atomic with validation in the standard and short test sets in runfabtests.sh so we can catch this in the future.
During testing of these changes, it properly caught the previously missed bug and properly passes when it was fixed.

j-xiong · 2024-07-08T22:58:32Z

fabtests/common/shared.c

+		ret = ft_hmem_copy_from(opts.iface, opts.device, dev_host_comp,
+					cmp, count * datatype_to_size(type));
+		if (ret) {
+			FT_ERR("Failed to copy from atomic buffer\n");
+			return ret;
+		}
+
+		check_comp = dev_host_comp;


Is check_comp needed when atomic != FI_ATOMIC_COMPARE?

No, I think I left it because coverity gets really confused by the unused variables with the atomic cases. But I'm going to refactor this whole check which is looking really messy. I'll push an update when I figure out the other failures.

j-xiong · 2024-07-10T05:04:02Z

Looks like MS compiler doesn't like the complex types:

 dgram_pingpong.c
C:\projects\libfabric\fabtests\include\ofi_atomic.h(44): error C2061: syntax error: identifier 'ofi_complex_float' [C:\projects\libfabric\fabtests\fabtests.vcxproj]
C:\projects\libfabric\fabtests\include\ofi_atomic.h(44): error C2059: syntax error: ';' [C:\projects\libfabric\fabtests\fabtests.vcxproj]
C:\projects\libfabric\fabtests\include\ofi_atomic.h(45): error C2061: syntax error: identifier 'ofi_complex_double' [C:\projects\libfabric\fabtests\fabtests.vcxproj]
C:\projects\libfabric\fabtests\include\ofi_atomic.h(45): error C2059: syntax error: ';' [C:\projects\libfabric\fabtests\fabtests.vcxproj]
C:\projects\libfabric\fabtests\include\ofi_atomic.h(46): error C2061: syntax error: identifier 'ofi_complex_long_double' [C:\projects\libfabric\fabtests\fabtests.vcxproj]
C:\projects\libfabric\fabtests\include\ofi_atomic.h(46): error C2059: syntax error: ';'

j-xiong · 2024-09-30T17:39:38Z

fabtests/common/ofi_atomic.c

-		OFI_UNUSED(src);					\
-		for (i = 0; i < cnt; i++)				\
-			r[i] = d[i];					\
+		memcpy(res, dst, sizeof(type) * cnt);			\


should use ofi_complex_##type here.

j-xiong · 2024-09-30T17:40:28Z

fabtests/common/ofi_atomic.c

-		ofi_complex_##type *r = (res);				\
-		for (i = 0; i < cnt; i++) {				\
-			r[i] = d[i];					\
+		memcpy(res, dst, sizeof(type) * cnt);			\


ofi_complex_##type

j-xiong · 2024-09-30T17:40:56Z

fabtests/common/ofi_atomic.c

-		ofi_complex_##type *r = (res);				\
-		for (i = 0; i < cnt; i++) {				\
-			r[i] = d[i];					\
+		memcpy(res, dst, sizeof(type) * cnt);			\


aingerson · 2024-10-04T04:37:13Z

@acgoldma This PR is adding atomic verification to the regular fabtests runs. psm3 was failing a lot of them so I've added a few patches to fix most of them. The last patch disables the remaining failing combinations instead of fixing them because I couldn't completely root cause - long double and complex long double reads/swaps are failing and I'm not sure why. Switching the sets to a memcpy of the datatype size also fixes the issue but it takes a performance hit (and also affects all other data types for those ops) so I opted to disable instead of adding that fix. I did this by having all the reads and compares go through the PSM_CUDA | PSM_ONEAPI implementation which does the psmx3_memcpy (which internally switches to a memcpy for the same ifdef). Overall, something is off when doing r[i] = d[i] with those two datatypes but I'm not sure why. Let me know if you want me to take a different approach!

acgoldma · 2024-10-07T16:46:40Z

@acgoldma This PR is adding atomic verification to the regular fabtests runs. psm3 was failing a lot of them so I've added a few patches to fix most of them. The last patch disables the remaining failing combinations instead of fixing them because I couldn't completely root cause - long double and complex long double reads/swaps are failing and I'm not sure why. Switching the sets to a memcpy of the datatype size also fixes the issue but it takes a performance hit (and also affects all other data types for those ops) so I opted to disable instead of adding that fix. I did this by having all the reads and compares go through the PSM_CUDA | PSM_ONEAPI implementation which does the psmx3_memcpy (which internally switches to a memcpy for the same ifdef). Overall, something is off when doing r[i] = d[i] with those two datatypes but I'm not sure why. Let me know if you want me to take a different approach!

We will take a look, but if you have time, it might be better to also adjust psmx3 to use memcpy() for complex types?

Also, can you give the exact commands to replicate this so we can reproduce internally.

brooksmi · 2024-10-09T18:34:24Z

@aingerson Can you provide some details on the failing PSM test cases, such as the specific commands executed and any additional details on the failure symptom?

For example, it seems like fi_rdm_atomic -p psm3 -o read -z long_double would fail according to your workaround commit, but it's passing for me so far (ice lake, e810, non-gpu path).

aingerson · 2024-10-09T18:48:52Z

@acgoldma @brooksmi Hello! Sorry for the late response, our clusters have all been down for the last few days so I haven't had any systems to test things on.
The failing test cases are with the new test changes in this PR and with the -v (data verification) flag.
I can switch the PR to use the memcpy instead as soon as I get systems available. Sorry about the delay.

brooksmi · 2024-10-09T18:57:29Z

The failing test cases are with the new test changes in this PR and with the -v (data verification) flag.

Ah! -v reproduces it... sorry for missing that.

NOTE: I don't see -v mentioned in the --help output.

aingerson · 2024-10-09T19:00:30Z

@brooksmi Ah shoot, missed adding that. Will add, thank you!

brooksmi · 2024-10-09T19:51:09Z

@aingerson long double floating point is only 80 bits, and at least experimentally, I see that long double CPU instructions only touch 80 bits, leaving the other 48 bits as they were. If you initialize memory with some pattern, such as all 1's, then do long double assignment, then do mcmcmp() to validate, that will fail (NOTE: haven't actually looked at what -v does yet...).

I'm curious if this is compliant or not, i.e. whether any spec says whether the remaining 48 bits are undefined or must be zero'd.

But I also notice that, at least in trivial benchmarking, memcpy() seems to be (very slightly) more performant than floating point loads/stores, so it's probably fine to just use memcpy() in all cases.

aingerson · 2024-10-09T20:27:33Z

@brooksmi Thanks for that context, very helpful. Where do you see that a long double is 8 bits? (Sorry I'm not familiar with long double CPU operations).
The failure I'm seeing is similar to what you're finding in regards to the remaining 48 bits - the verification code fills 128 bits and then checks sizeof(long double) which is 128 bits but psm3 only filled 80 so the check fails. That's why the memcpy fixes it because it does a memcpy(dst, src, sizeof(long double) so it copies all 16 bytes instead of the 10 that the set does. It seems like the verification code needs to only check 10 but I'm wondering how we come to that conclusion if sizeof(long double) returns 16?

brooksmi · 2024-10-09T21:46:54Z

@brooksmi Where do you see that a long double is 80 [corrected] bits?

Both language standards and compiler documentation are flexible on the actual implementation. The C language simply states that long double must be conformant to IEC 60559 extended.

GCC documentation has quite a few mentions of x86_64 using x87 80-bit in several places (as defined by Intel SDM vol 1), but ultimately it's target-specific, and could change if/when a CPU is released with native 128-bit FPU registers.

I'm wondering how we come to that conclusion if sizeof(long double) returns 16?

You could check LDBL_MAX or LDBL_DECIMAL_DIG.

If you're using GCC, you can do __builtin_types_compatible_p(long double, __float80) (or similarly with _Float128). But even then, I'm not sure that's sufficient to know which bytes to compare on any given target.

I suspect a language lawyer would argue that you should not be relying on any sort of memcmp() operation which assumes you've inferred how a long double is persisted. Rather, you should read the value back into a long double and use logical operators.

The memcpy() is safe though, as it assumes only that preserving all sizeof bytes is sufficient to copy (but not interpret) a value.

brooksmi · 2024-10-10T14:56:00Z

I suspect a language lawyer would argue that you should not be relying on any sort of memcmp() operation which assumes you've inferred how a long double is persisted. Rather, you should read the value back into a long double and use logical operators.

Note that logic would also imply avoiding memset() on fill, or at least, avoiding memset() to a nonsense pattern that would decode as NAN, since NAN is always != to NAN, so a typed floating point buffer check would always fail and the atomic ops would be operating on NANs (valid, but... perhaps not the intention).

It might be better to iterate elementwise, and write random valid typed floating point values in the fill, then iterate on a typed != in the check.

aingerson · 2024-10-10T15:04:13Z

@brooksmi Yeah, that's what I'm going to move to. It was originally that way but I moved to the memset/memcmp because it was more performant and simplified the implementation for complex types. Back to the original method it is... thanks for your help!

aingerson · 2024-10-10T22:41:06Z

@brooksmi Thanks for the context with the long double issues, Michael! I've repushed a version that does an element by element set and check to be explicit with the datatype. I've kept the three psm3 patches though to resolve a few actual psm3 issues regarding complex compares, missing ops, and missing error checks. Let me know if those look ok to you. Thanks!

brooksmi · 2024-10-11T13:55:41Z

Let me know if those look ok to you.

@aingerson The other PSM changes look fine.

Fix incorrect atomic LOR on complex numbers. The values were incorrectly getting ANDed together instead of ORed. This went unnoticed because the code was very difficult to read. This also refactors the logical checks with a helper function to make it more readible and less prone to errors. Signed-off-by: Alexia Ingerson <[email protected]>

This allows fabtests to make use of atomic validation code There were many Windows atomics bugs, inconsistencies, and missing definitions. This patch also cleans up the entire ofi_atomic.c implementation for unix and windows The following changes are included: - Separate fill and check based on real or complex types as setting and reading complexes on windows is not allowed (not native datatype, abstracted). Complex versions use eq and set functions specific for complexes defined in osd.h - Remove duplicated ofi_complex definitions in ofi_atomic (already in osd.h file) - Add general check_atomic and fill_atomic calls and use them in ubertest - Add EXPAND ( x ) x define to work nicely with windows VA_ARGS handling - Fix inconsistency with ofi_complex_type/or naming ('complex' always should come first) - Fix inconsistency with op names "equ" and "mul" -> "eq" and "prod" - Add missing lxor complex op definitions on Windows Signed-off-by: Alexia Ingerson <[email protected]>

To properly validate atomic data, we need host bounce buffers for the result and compare buffers in addition to the regular bounce buffer for the tx/rx bufs. This adds two extra bufs allocated only for atomic purposes and adds hmem support to the common atomic validation path. It also renames the alloc/free_tx_buf calls to generic alloc/free_host_bufs which allocates all three buffers at once. Signed-off-by: Alexia Ingerson <[email protected]>

ft_post_atomic posted "buf" which is the base address for the entire send and recv buffer allocation. The first half of the allocation is the receive buffer and the second half is the send buffer. Posting just "buf" meant it was sending the receive buffer. This changes it to send the tx buf and do an atomic on the rx buf which allows us to properly do atomic validation Signed-off-by: Alexia Ingerson <[email protected]>

This allows us to post the rx buf without corrupting memory in case its needed for validation Signed-off-by: Alexia Ingerson <[email protected]>

Match the behavior of memset() where the value passed in is an int, but it is interpreted as a char. While ZE can technically handle this scenario, others may not so we need to standardize across ifaces Signed-off-by: Alexia Ingerson <[email protected]>

Add data validation to the atomic test by using the newly added atomic fill and check support imported from ubertest. This code uses a macro that switches on datatype for filling and checking the buffer contents. The atomic validation path requires an extra buffer to copy the contents of the original atomic buffer in order to recreate the atomic function locally and check the buffer against the simulated atomic operation. This patch also refactors the entire test to remove the extremely confusing macros used for the base/fetch/compare operations. The macros made the code extremely difficult to read and debug and also made it difficult to add data validation. Separating it into three explicit functions is about the same amount of code and significantly more readable Synchronization messages are added in the validation case to ensure the atomic operation completed on both sides before validation occurs. This requires the addition of the FI_ORDER_SAW and FI_ORDER_SAR message ordering to ensure that we get the completion for the send/recv sync after the atomic message is processed Signed-off-by: Alexia Ingerson <[email protected]>

Run fi_rdm_atomic with data validation in standard and short test suites Signed-off-by: Alexia Ingerson <[email protected]>

Signed-off-by: Alexia Ingerson <[email protected]>

Comparison of complex numbers is undefined and not a valid combination of atomic ops. Disable in psm3 Signed-off-by: Alexia Ingerson <[email protected]>

Report atomic op errors back to the application. Some datatype/op combinations were falsely being reported to the application but failing when the atomic was being performed. These failures were silently treated as successful because the errors were not passed back. Check the error code to catch future issues Signed-off-by: Alexia Ingerson <[email protected]>

psm3 advertises support for logical ops (lor, land, lxor) with all datatypes but the functions are only defined for integer types. When the atomic op is called with a non-integer type, it drops down to the default case and returns an error (FI_ENOTSUPP) Signed-off-by: Alexia Ingerson <[email protected]>

aingerson · 2024-10-15T15:55:00Z

@j-xiong This is finally good to go! I added testing for all the atomics on windows to make sure the implementation was correct on both platforms. Intel CI failure is a known issue Zach is fixing soon and unrelated (MPI rxm IMB slurm timeout). Had to add a few more patches because of psm3 and windows bugs. Review or merge if it looks good to you, thanks!

aingerson requested a review from j-xiong July 8, 2024 20:53

j-xiong reviewed Jul 8, 2024

View reviewed changes

j-xiong added ⚠️ Do not merge and removed ⚠️ Do not merge labels Jul 11, 2024

aingerson force-pushed the fabtests branch 2 times, most recently from 39730b5 to be4ef93 Compare September 19, 2024 15:26

aingerson force-pushed the fabtests branch from be4ef93 to daf4b58 Compare September 30, 2024 14:24

j-xiong reviewed Sep 30, 2024

View reviewed changes

aingerson force-pushed the fabtests branch from daf4b58 to fc9cf4e Compare October 4, 2024 04:23

aingerson force-pushed the fabtests branch from fc9cf4e to 1a4f1f7 Compare October 10, 2024 22:27

aingerson force-pushed the fabtests branch 4 times, most recently from c4f8c71 to 9e5b90d Compare October 11, 2024 18:50

aingerson added 2 commits October 14, 2024 12:05

aingerson added 10 commits October 14, 2024 12:05

fabtests/common: change sync message to be 0 bytes instead of 1 byte

5a0f865

This allows us to post the rx buf without corrupting memory in case its needed for validation Signed-off-by: Alexia Ingerson <[email protected]>

fabtests/runfabtests.sh: add rdm_atomic validation tests

2e47ddd

Run fi_rdm_atomic with data validation in standard and short test suites Signed-off-by: Alexia Ingerson <[email protected]>

fabtests/runfabtests.cmd: add atomic tests to windows testing

9aec0fc

Signed-off-by: Alexia Ingerson <[email protected]>

prov/psm3: disable complex comparison combinations

50aa464

Comparison of complex numbers is undefined and not a valid combination of atomic ops. Disable in psm3 Signed-off-by: Alexia Ingerson <[email protected]>

aingerson force-pushed the fabtests branch from 9e5b90d to 09a6a50 Compare October 14, 2024 19:05

j-xiong approved these changes Oct 15, 2024

View reviewed changes

j-xiong merged commit 1c75d4b into ofiwg:main Oct 15, 2024
12 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fabtests/rdm_atomic, ubertest: move atomic verification to common code and use in functional test #10155

fabtests/rdm_atomic, ubertest: move atomic verification to common code and use in functional test #10155

aingerson commented Jul 8, 2024

j-xiong Jul 8, 2024

aingerson Jul 9, 2024

j-xiong commented Jul 10, 2024

j-xiong Sep 30, 2024

j-xiong Sep 30, 2024

j-xiong Sep 30, 2024

aingerson commented Oct 4, 2024

acgoldma commented Oct 7, 2024 •

edited

Loading

brooksmi commented Oct 9, 2024

aingerson commented Oct 9, 2024

brooksmi commented Oct 9, 2024

aingerson commented Oct 9, 2024

brooksmi commented Oct 9, 2024

aingerson commented Oct 9, 2024

brooksmi commented Oct 9, 2024

brooksmi commented Oct 10, 2024

aingerson commented Oct 10, 2024

aingerson commented Oct 10, 2024

brooksmi commented Oct 11, 2024

aingerson commented Oct 15, 2024

fabtests/rdm_atomic, ubertest: move atomic verification to common code and use in functional test #10155

fabtests/rdm_atomic, ubertest: move atomic verification to common code and use in functional test #10155

Conversation

aingerson commented Jul 8, 2024

j-xiong Jul 8, 2024

Choose a reason for hiding this comment

aingerson Jul 9, 2024

Choose a reason for hiding this comment

j-xiong commented Jul 10, 2024

j-xiong Sep 30, 2024

Choose a reason for hiding this comment

j-xiong Sep 30, 2024

Choose a reason for hiding this comment

j-xiong Sep 30, 2024

Choose a reason for hiding this comment

aingerson commented Oct 4, 2024

acgoldma commented Oct 7, 2024 • edited Loading

brooksmi commented Oct 9, 2024

aingerson commented Oct 9, 2024

brooksmi commented Oct 9, 2024

aingerson commented Oct 9, 2024

brooksmi commented Oct 9, 2024

aingerson commented Oct 9, 2024

brooksmi commented Oct 9, 2024

brooksmi commented Oct 10, 2024

aingerson commented Oct 10, 2024

aingerson commented Oct 10, 2024

brooksmi commented Oct 11, 2024

aingerson commented Oct 15, 2024

acgoldma commented Oct 7, 2024 •

edited

Loading