Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Assertion on final ack message with RNDV #10529

Merged
merged 1 commit into from
Mar 7, 2025

Conversation

iyastreb
Copy link
Contributor

@iyastreb iyastreb commented Mar 4, 2025

What?

When ucx_perftest -t ucp_am_bw is executed in debug mode, then we hit an assertion that verifies the received RNDV message length = configured message size.

Why?

This check fails on final sync message of size 1

How?

Adjust assertion depending on the role: receiver asserts on the payload size, sender may receive only 1 byte final ack

@iyastreb iyastreb requested review from yosefe and brminich March 4, 2025 09:37
@iyastreb iyastreb force-pushed the perftest-final-ack-rndv-assert branch from 23c6fa1 to 5d82ad2 Compare March 4, 2025 09:46
ucs_assert(length == ucx_perf_get_message_size(&m_perf.params));
} else if (my_index == 1) {
/* Sender may only receive final ack */
ucs_assert(length == 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you pls remind why we send an ack with rndv?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure
When we discussed the ACK message we took a decision to reuse existing communication channel and buffers, and just send the message in the opposite direction (receiver -> sender). So apparently the protocol is not fixed, it matches the test communication pattern. In this particular case with ucp_am_bw it happens to be RNDV

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we convert it to ucs_assertv that prints the actual, expected length and my_index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with forced push due to tiny size

@iyastreb iyastreb force-pushed the perftest-final-ack-rndv-assert branch 2 times, most recently from 9f35d0a to abc0782 Compare March 4, 2025 14:45
Comment on lines 282 to 287
ucs_assertv(length == expected_length, "length=%zu, expected=%zu,"
" index=%u", length, expected_length, my_index);
} else if (my_index == 1) {
/* Sender may only receive final ack */
ucs_assertv(length == 1, "length=%zu, expected=1, index=%u", length,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove "," from the format strings
1st assert: "length=%zu expected_length=%zu my_index=%zu"
2nd assert: "length=%zu my_index=%zu" (no need to print 1, it already would appear in the assert message)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@iyastreb iyastreb force-pushed the perftest-final-ack-rndv-assert branch 2 times, most recently from d64c8c1 to a6a3a2f Compare March 4, 2025 15:07
@@ -275,11 +275,20 @@ class ucp_perf_test_runner {
const ucp_am_recv_param_t *rx_params)
{
ucs_assert(!(rx_params->recv_attr & UCP_AM_RECV_ATTR_FLAG_DATA));
ucs_assert(length == ucx_perf_get_message_size(&m_perf.params));
unsigned my_index = rte_call(&m_perf, group_index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid calling it on data-path?
e. g. can do a single assert like
(length == expected_length || length == 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That call is not expensive, but ok. We can make it static, so that it's evaluated just once:

static unsigned my_index = rte_call(&m_perf, group_index);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we define a single assert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's less strict, but not a big deal. Ok, we can also do with a single assert

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also cache group_index as member var, or define 2 different am callbacks (on sender and on receiver)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, caching as member var is easier option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I implemented it with a single assertion like Mikhail proposed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but seems a single assertion does not really cover all the possible errors, for example if server gets length of 1 instead of actual data length?

@iyastreb iyastreb force-pushed the perftest-final-ack-rndv-assert branch from a6a3a2f to dfd13d8 Compare March 4, 2025 16:44
brminich
brminich previously approved these changes Mar 4, 2025
ucs_assert(length == ucx_perf_get_message_size(&m_perf.params));
/* Data length can be either 1 (on sender side when receive a final ack)
* or expected payload size on the receiver side. */
ucs_assertv((length == 1) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we can change it like this

Suggested change
ucs_assertv((length == 1) ||
ucs_assertv((length == 1) && (rte_call(&m_perf, group_index) == 0) ||

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we cannot avoid rte_call for every data packet to determine that we are the server and the payload size should be >1. unless we cache it in the class or provide different AM callback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be inside assert as i mentioned in this comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it is hidden in release build?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go this way, then I would just cache this index inside a member var

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally I implemented it in a slightly different manner: I adjust m_am_rx_length on the sender side, and use this existing variable for assertions. Btw, there was one more place where similar assertion was failing, so this change fixes both of them

@iyastreb iyastreb force-pushed the perftest-final-ack-rndv-assert branch from dfd13d8 to 478d780 Compare March 5, 2025 07:41
@iyastreb iyastreb force-pushed the perftest-final-ack-rndv-assert branch from 478d780 to 7608569 Compare March 5, 2025 07:44
@yosefe yosefe enabled auto-merge March 6, 2025 15:55
@yosefe yosefe merged commit 52026fd into openucx:master Mar 7, 2025
151 checks passed
@iyastreb iyastreb deleted the perftest-final-ack-rndv-assert branch March 7, 2025 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants