Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KRCore kernel module throws an error #6

Open
snowzjx opened this issue Jul 24, 2024 · 6 comments
Open

KRCore kernel module throws an error #6

snowzjx opened this issue Jul 24, 2024 · 6 comments

Comments

@snowzjx
Copy link

snowzjx commented Jul 24, 2024

Dear authors,

I am using the 7ba3bf6 as indicated in your paper for evaluation.

I have successfully installed the patched OFED driver.

I have compiled the KRdmaKitSyscall.ko successfully.

However, when I tried to install the module, there are some errors:

insmod: ERROR: could not insert module KRdmaKitSyscall.ko: Operation not permitted

The dmesg has the following output:

[ 5760.285891] KRdma kernel module init start
[ 5760.287625] Fail to start KRdma kernel module

Moreover, I am confused by the gid in the Makefile in the KRdmaKit-syscall folder and have no idea which gid to fill.

Could you help me with the above problems?

Thanks!

@wxdwfc
Copy link
Contributor

wxdwfc commented Jul 24, 2024

It seems that the RPC client's failed init (https://github.com/SJTU-IPADS/krcore-artifacts/blob/develop/KRdmaKit-syscall/src/client.rs#L143), can you add some log to see which step goes wrong?

The meta gid can be set as the same gid as the RNIC you use for inserting the krcore module.

@snowzjx
Copy link
Author

snowzjx commented Jul 24, 2024

It seems the function returns here.

@wxdwfc
Copy link
Contributor

wxdwfc commented Jul 26, 2024

It seems that no available RNIC is found on your device. Can you check that the NIC is available, i.e., what is the hardware configuration you use? You can use ibstatus to check that.

@snowzjx
Copy link
Author

snowzjx commented Jul 26, 2024

Hey, I have done some checks and found the error is raised from the query_gid: function at

let err = unsafe { hca.query_gid(hca_ptr, port as u8, 0, &mut gid as *mut ib_gid) };

Since I am using RoCEv2, the gids seems not available from query_gid function so I tried to manually set the gid. The client can be successfully created.

However, I now got a new error:
[35236.576172] [dct] err to re-bring to rtr.
[35236.576173] [dct] err to bring to ready.

I checked the code and found the error is thrown from ib_modify_qp function and it returns an error code of -22, seems the parameters used are not correct.

Could you help me with the new problem? Thanks!

@wxdwfc
Copy link
Contributor

wxdwfc commented Jul 26, 2024

Hi, the problem means that DCT QP has not been successfully created. Have you verified your NIC supports DCT?

Besides, I strongly recommended using IB NIC for running, I have not tested on RoCE NICs.

@snowzjx
Copy link
Author

snowzjx commented Jul 26, 2024

The strange thing is that the function bring_dc_to_init does not throw errors.

I am using Mellanox CX5 RNIC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants