-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better error message for rdma connection error #1474
Comments
In the client, we wouldn't necessarily know that the other end only allows root, right? I think if the connection fails for authentication reason what we really should have is something like an "Authentication failed" message, rather than just assuming things about the reason for the authentication failure. Knowing that the connection was denied for authentication reasons would be much more helpful than "Connection failed/rejected", and would likely put most people on the right path to addressing the problem. |
the root case here is the local device cannot be opened by the transport on the client, rather than it being a server end issue. |
Are you certain that the problem is entirely on the client side? Maybe it is on your system, I don't know. But non-root processes absolutely can connect to remote root-owned daemons via rdma. And as long as your non-root user is in the uid or gid of the metric set, you'll be able to see the metric set. So since that does work, putting out a blanket statement about using root doesn't seem appropriate. If you are correct that your issue is entirely client side, then I would agree that some better error message is in order, but maybe not the one you are suggesting. |
@morrone if i elevate first on the client side, then all goes smoothly with the rdma connection. If i configure/use sock instead of rdma, then all goes smoothly. This is about not even being able to start the connection; auth and perm on the server end aren't yet involved when it failed. |
@morrone, @baallan I think the problem is that the ulimit set for the user isn't large enough for the requested ibv_reg_mr which is going to pin the memory. When you changed to root, it changed the ulimit and it worked. The limit in question is "max locked memory". @baallan, perhaps you should confirm my thesis and then submit a pull request with a "better" error message. |
@tom95858 some testing results as a regular user:
I'll see if i can better test your thesis when I can get in as root and lower the ulimit to provoke the same error. |
connecting (attempting) as a user other than root to rdma ldmsd port needs a better error message. currently:
It should in handling the failed/rejected check the current UID and. if not uid==0, print a hint that the user must have privileges/be root to connect over rdma.
If there actually is a way for a non-root process to connect to remote root-owned daemon via rdma, then we need to make it work by default. (I'm guessing there isn't).
The text was updated successfully, but these errors were encountered: