You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@biddisco just to make sure you were getting -FI_EAVAIL back from the fi_cq_read or equivalent? It looks like the GNI provider should normally have something set for the flags field,
although it looks like there is a path through the provider for receives that may end up getting posted to the receive error cq with flags set to 0. Do you use the same CQ for both send/write and receive operations?
We are using separate CQs for Tx and RX, so there should not be a possibility of a receive completion event getting through.
The code has been heavily changed over the last few weeks by several people and so I'm not 100% certain of my facts. But ... there may have been a chance that an EAGAIN got through and triggered this error when in fact it was not an EAVAIL. I'm tempted to suggest that you close this issue as "not a bug" and I will reopoen it if we get further errors of the same kind.
[Additionally, we've implemented a resend of oor message in the event that a FI_MSG SEND error occurs so we should be able to recover from that. We have not implemented recovery of RMA, though we can/should do if we get errors of that kind.]
A number of large runs of our code failed due to errors when polling the send queue.
when an error is reported we use
the error returned reported
so we do not have much information to use to debug.
Should the flags be set to SEND or RMA or is flags of 0 a valid value?
The text was updated successfully, but these errors were encountered: