Skip to content

verbs: Query QP data in order on non x86 platforms #1606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dkkranz
Copy link
Contributor

@dkkranz dkkranz commented May 4, 2025

EFA can support 128 bytes blocks write in-order for some Grace platforms. Move the check on x86 architecture to the mlx5 provider implementation.

Reviewed-by: Michael Margolin [email protected]
Reviewed-by: Yonatan Nachum [email protected]

@dkkranz dkkranz force-pushed the query_data_in_order_fix branch from 70679e1 to b4b0368 Compare May 5, 2025 07:46
@jgunthorpe
Copy link
Member

I don't think this is right, it would break things like PPC. The platform check is supposed to be in the common code, and the driver check is only about how the device works on PCI.

You should probably just add ARM to the inclusion list, though I imagine there are some wonky ARMs that don't work, they may not matter in practice.

@dkkranz dkkranz force-pushed the query_data_in_order_fix branch 2 times, most recently from 3e995d5 to fc236b7 Compare May 7, 2025 11:56
@dkkranz
Copy link
Contributor Author

dkkranz commented May 7, 2025

Applied your suggestion, thanks

@dkkranz
Copy link
Contributor Author

dkkranz commented May 20, 2025

@jgunthorpe Kind reminder

@jgunthorpe
Copy link
Member

I don't know.. I just feel uneasy about this. ARM is such a wide and varied architecture I don't know if we can really say it is actually reliably in order. Grace might be just because it has 128 byte lines, but that doesn't mean a different ARM chip will be. This could break some of the MPIs running on ARM supercomputers.. Do you think you could limit it just to Grace? Did someone from NVIDIA confirm that this is actually true on Grace?

@jgunthorpe
Copy link
Member

I consulted with people here and unfortunately I don't think using ARM platform as a condition is sufficient. Even looking at NVIDIA CPUs this only works on certain CPUs, with certain configurations.

I guess EFA could be special since it really only exists in AWS instances and you guys will do the validation. I'd still strongly recommend that you query your device FW to see if it is supported as I'm not sure you will find all future systems will work this way.

So maybe the right thing is to add a new op query_qp_data_in_order_force() or something like that that skips the architecture check in the core code. I don't really want to have drivers doing architecture checks..

@dkkranz dkkranz force-pushed the query_data_in_order_fix branch from fc236b7 to 4ced708 Compare May 28, 2025 07:42
Add a flag to query directly the device support for QP data-in-order
semantics without enforcing host CPU architecture restrictions. It is
particularly useful in scenarios where the GPU performs data polling
directly.

Reviewed-by: Michael Margolin <[email protected]>
Reviewed-by: Yonatan Nachum <[email protected]>
Signed-off-by: Daniel Kranzdorf <[email protected]>
@dkkranz dkkranz force-pushed the query_data_in_order_fix branch from 4ced708 to a7f0291 Compare May 28, 2025 07:46
@dkkranz
Copy link
Contributor Author

dkkranz commented May 28, 2025

@jgunthorpe Please review the last revision that adds a flag to skip the architecture check.

@jgunthorpe
Copy link
Member

I didn't mean a flag to userspace, there is nothing userspace can do with this. I ment some kind of flag from EFA to the core code to disable the arch checks in the core code just for EFA.

@mrgolin
Copy link
Contributor

mrgolin commented Jun 3, 2025

I don't really want to have drivers doing architecture checks..

But this is exactly what you are suggesting now, and I'm not sure why we need to have architecture checks in libibverbs either.

I think there is use for a common way to query whether data is written in order from device perspective only, for instance data polling from GPU isn't necessarily affected by the CPU architecture. Don't you agree?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants