-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/RNDV: Fixed assertion when scaled end offset > max_frag #10554
UCP/RNDV: Fixed assertion when scaled end offset > max_frag #10554
Conversation
virtual void init() override | ||
{ | ||
/* Device with high BW and lower latency */ | ||
add_mock_iface("mock_1:1", [](uct_iface_attr_t &iface_attr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any generic way to abstract numbers and have a set of parameters that would allow:
add_mock_iface("moc1", iface_high_bw_low_lat));
add_mock_iface("moc1", iface_mid_bw_high_lat));
...
...
this would be used in all tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, good idea, we can think of something like that
But for this particular test what matters most is very specific iface_attr.cap.get.max_zcopy = 16384
setting, which I think we don't want to encapsulate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then below, or augment add_mock_iface
to also support vector of lambda?
add_mock_iface("mock1", [](uct_iface_attr &iface_attr) {
iface_high_bw_low_lat(iface_attr);
iface_attr.cap.get_max_zcopy = 16384;
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done
But we don't save anything in terms of lines of code, because now we have 2 opposite cases:
test_ucp_proto_mock_rcx: 1 dev with high BW and high lat + 1 dev with low BW and low lat
test_ucp_proto_mock_rcx2: 1 dev with high BW and low lat + 1 dev with low BW and high lat
So currently we don't reuse much if we add these 4 new functions.
Maybe we can add those wrappers if we have more similar use cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regardless of line count of if we want to do it in this pr, it could be helpful if we can have "profiles" that we can use instead of direct numbers that will augment when we add tests. for instance we already have the numbers below plus the ones this PR is adding.
grep -r mock test/gtest/ucp/ | grep mock.*\} | cut -d: -f 2
{0, 200, "short", "rc_mlx5/mock_1
{201, 6650, "copy-in", "rc_mlx5/mock_1
{6651, 8246, "zero-copy", "rc_mlx5/mock_1
{8247, 21991, "multi-frag zero-copy", "rc_mlx5/mock_1
"rc_mlx5/mock_0
{0, 200, "short", "rc_mlx5/mock_1
{201, 6650, "copy-in", "rc_mlx5/mock_1
{6651, 8246, "zero-copy", "rc_mlx5/mock_1
{8247, 22502, "multi-frag zero-copy", "rc_mlx5/mock_1
"rc_mlx5/mock_0
{0, 200, "short", "rc_mlx5/mock_1
{201, 6650, "copy-in", "rc_mlx5/mock_1
{6651, 8246, "zero-copy", "rc_mlx5/mock_1
{8247, 19883, "multi-frag zero-copy", "rc_mlx5/mock_1
"47% on rc_mlx5/mock_1
{5346, INF, "rendezvous zero-copy read from remote", "cma/mock"},
{0, 8184, "short", "tcp/mock"},
{8185, 65528, "zero-copy", "tcp/mock"},
{65529, 366864, "multi-frag zero-copy", "tcp/mock"},
{366865, INF, "rendezvous zero-copy fenced write to remote", "tcp/mock"},
{0, 2284, "short", "self/mock"},
{2285, INF, "rendezvous copy from mapped remote memory", "self/mock"},
{0, 0, "short", "rc_mlx5/mock"},
{1, 8246, "copy-in", "rc_mlx5/mock"},
"rc_mlx5/mock"},
"rc_mlx5/mock"},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, so you are proposing to generalize not only config part, but also verification?
I don't mind to do this refactoring later, it might not be that straightforward - especially on the verification side.
And this PR I would prefer to keep focused on the bugfix
UCS_TEST_P(test_ucp_proto_mock_rcx2, rndv_send_recv_small_frag, | ||
"IB_NUM_PATHS?=2", "MAX_RNDV_LANES=2", "RNDV_THRESH=0") | ||
{ | ||
for (size_t i = 1024; i <= 65536; i += 1024) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i -> msg_size
also can use UCS_KBYTE constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
for (size_t msg_size = UCS_KBYTE; msg_size <= 64 * UCS_KBYTE; | ||
msg_size += UCS_KBYTE) { | ||
send_recv_am(msg_size); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add helper function for this loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, added
9fe2922
to
1bc4bb5
Compare
What?
The max_frag assertion that was fixed with #10407 is still possible to happen with slightly different combination:
when ucp_proto_multi_scaled_length > lpriv->min_end_offset = max_frag
The root cause of this issue is weight calculation.
We calculate weights according to the lane BW/total BW. And we expect that max_frag ratio is the same, but it's not always the case. For example, there is a corner case when
max_frag
of the maximal BW lane is smaller thanmax_frag
of the slow lane.In theory we should calculate lane weight according to lane max_frag/total max_frag, but this requires quite some changes in proto_multi and might be risky (tried that).