control/server.py: Enable DSA to offload and accelerate CRC calculation #704

CongMinYin · 2024-06-06T08:20:13Z

Intel® DSA can generate and test CRC checksum. This feature has already been implemented in SPDK. Enabling this feature allows for offloading and accelerating CRC calculations in NVMe-oF.

CongMinYin · 2024-06-06T08:32:05Z

Hi, this is a PR that enables DSA to offload and accelerate CRC calculation. I have previously tested performance data using DSA acceleration in an SPDK+ceph environment. My manager and colleagues should have discussions with IBM colleagues about using DSA acceleration and show the performance data. In the container environment, I will test the performance of the gateway in the future.

Since this feature has been implemented in SPDK, I just called the interface in the gateway. But how to map DSA devices into the container, I wrote it in README. If the users want to enable DSA, they need to manually change docker-compose.yaml. I'm not sure if this is appropriate.

CongMinYin · 2024-06-18T03:20:53Z

Hi @caroav , this one.

caroav · 2024-06-18T09:13:49Z

It looks ok to me. Is this feature supported on any "x86-64-v2" architecture? Do you know what is the performance benefit of it? @pcuzner @oritwas @gbregman can you review and say if you have any concern here, or we can approve. thx.

gbregman · 2024-06-18T09:20:11Z

@caroav I don't really know that code in spdk. So, I can't say whether or not it's OK.

pcuzner · 2024-06-19T03:02:34Z

@caroav I like the idea of offloading where we can, but I'd prefer to see performance comparison results before a merge is considered. We'd also need to consider the impact this has in the orchestration layer in ceph since any potential change to container CAPS or bindings would need corresponding changes planned there too.

caroav · 2024-06-20T12:17:08Z

@CongMinYin the default is that this feature is Disabled right? If that's true, then @pcuzner why do we need to be concerned? @CongMinYin do you have any comments on Paul's concerns?

CongMinYin · 2024-06-21T08:33:19Z

@CongMinYin the default is that this feature is Disabled right? If that's true, then @pcuzner why do we need to be concerned? @CongMinYin do you have any comments on Paul's concerns?

Yes, this feature is disabled by default. I know that changing container's CAP need to be careful. So I just wrote a update example in README. We can manually update docker-compose.yaml when need to enable it. But this will add much README content.

CongMinYin · 2024-06-21T09:56:55Z

It looks ok to me. Is this feature supported on any "x86-64-v2" architecture? Do you know what is the performance benefit of it?

Sorry for the late reply. I was asking my manager earlier if it's possible to display performance data on GitHub. Firstly, to answer the first question, DSA is a kind of DMA engine. It is an accelerator integrated into the CPU. Currently only supported on the 4th Gen. It is not an instruction set.
Abouting performance data, I have previously tested SPDK+ceph with DSA on bare metal. I can only show some relative performance data. This is a randread test that compares performance data with and without enabling DSA. SPDK started with one core, and the IO pressure was sufficient to fill this SPDK core. At this point, the offload CRC calculation will allow the CPU to free up some resources to process more IO. When the block size of IO is larger(16KB~128KB), the effect is better.

block size(kb)	Performance Gain(%) with DSA
128	52.96%
64	41.68%
32	25.21%
16	17.45%

oritwas · 2024-06-23T11:00:18Z

.env

@@ -50,7 +50,7 @@ SPDK_URL="https://spdk.io"

 SPDK_PKGDEP_ARGS="--rbd"
 # check spdk/configure --help
-SPDK_CONFIGURE_ARGS="--with-rbd --disable-tests --disable-unit-tests --disable-examples --enable-debug"
+SPDK_CONFIGURE_ARGS="--with-rbd --with-idxd --disable-tests --disable-unit-tests --disable-examples --enable-debug"


This won't work on older CPU, please detect it and only is supported enable it

As we build right now, it might not work on any CPU. To eliminate crashes on some machine we build the SPDK using x86-64-v2 architecture. This instructs the compiler not to use instructions which are not available on v2. We need to verify that the instructions used here are supported on v2 CPUs.

@CongMinYin @gbregman the question is what happens if one one hand we build to x86-64-v2, but on the other hand we specify this flag (--with-idxd). I assume that the SPDK build will ignore the --with-idxd because it is not supported on x86-64-v2?

@caroav this can easily verified. We can just disassemble the SPDK binary and see if this instruction was used or not. This is what I did when we had the issue with RDSEED.

@caroav I think we shuold check enable_dsa in ceph-nvmeof.conf, then configure with --with-idxd flag. But I don't how to write something like "if else" in .env or Dockerfile. It seems like you're saying the x86-64-v2 instruction set. But in reality, DSA is a DMA engine, an accelerator only integrated into the new 4th Gen CPU now.

idxd is a kernel driver. I have tried to configure idxd on server with AMD CPU, it will report an error and exit. So it's best to check the enable_dsa option in the ceph-nvmeof.conf. Provide different parameters in .env or Dockerfile. But I couldn't find a similar code block. Do you know what to do?
Run configure in a server with AMD CPU:

root@amd-server4:~/spdk# ./configure --with-idxd ERROR: IDXD cannot be used due to CPU incompatibility.

SPDK configure code to detect CPU:
https://github.com/spdk/spdk/blob/dfb2950fd5b13ba0a2a2b533e3e534202a89de74/configure#L741

I think the IDXD code itself is safe for all hardware platforms:

It is scanning for a specific PCIe device (PCI_DEVICE_ID_INTEL_IDXD) and will not initialize if this device is not found. Older Intel CPUs and AMD CPUs will not have this device.

It needs to use the movdir64b instruction to pass work to the hardware, this is only supported on new Intel CPUs. However this instruction is only ever executed if the PCIe device is found. Due to lack of compiler support this instruction is being generated using a sequence of bytes in assembly code:

static inline void movdir64b(void *dst, const void *src) { asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02" : "=m"(*(char *)dst) : "d"(src), "a"(dst)); }

So building the code with --with-idxd will include a new assembly instruction, but it will only be executed on CPUs that support it because of the PCIe device check. Its very hard to prove this other than by code review.

The configure script for SPDK is only permitting the code to be compiled with --with-idxd if it is compiled on an Intel CPU. This means you can't use an AMD server as the build machine. The script is not checking the CPU version - so you can build the code with support for IDXD on an old Intel CPU.

I suggest that the check in the configure script is overly restrictive, and perhaps should be reduce to a warning rather than an error as the code may be built on a system with different capabilities to the system it is run on. It might also be better to check for the presence of the PCIe device rather than the CPU vendor to determine whether the build server supports idxd.

Good catch, we need support for the code to be compiled for non-Intel CPU

@caroav Is our solution to try submitting a PR to modify the SPDK's config script, or check the enable_dsa parameter in this ceph-nvmeof project?

Hi @caroav @oritwas, I have submit a pr to solve the platforms errors: https://review.spdk.io/gerrit/c/spdk/spdk/+/24172. It merged. SPDK master branch and later versions won't met this platform errors.

oritwas · 2024-06-23T12:05:35Z

@bill-scales FYI

CongMinYin · 2024-08-09T08:41:21Z

Rebase to solve confilct. Hi @caroav, cherrypick spdk commit to fix CI error?

caroav · 2024-08-09T15:10:19Z

yes we will take care soon.

baum · 2024-08-11T14:52:43Z

@CongMinYin thankl you for your help!

pushed spdk configure patch to https://github.com/ceph/spdk/tree/gerrit-24172
please update the PR spdk submodule to include gerrit-24172 patch.
regarding enable_dsa conf option, I would use run time capability detection instead. as far as I understand the requirements are (a) running in privileged mode and (b) presence of idxd kernel module so the code could be:

def is_privileged_and_idxd_loaded():
    # Check if the container is running in privileged mode
    privileged = False
    try:
        with open('/proc/1/status', 'r') as f:
            for line in f:
                if line.startswith('CapEff'):
                    # CapEff shows the effective capabilities of the process
                    # If CapEff is all Fs, it indicates privileged mode
                    if 'ffffffffffffffff' in line.split()[1]:
                        privileged = True
                    break
    except FileNotFoundError:
        pass  # This might indicate the code is not running in a container
    
    # Check if the idxd module is loaded
    idxd_loaded = False
    try:
        with open('/proc/modules', 'r') as f:
            for line in f:
                if line.startswith('idxd'):
                    idxd_loaded = True
                    break
    except FileNotFoundError:
        pass  # This could indicate an issue with the file system or running environment
    
    return privileged and idxd_loaded

wdyt?

CongMinYin · 2024-08-12T10:10:46Z

Hi @baum , thank you for your feedback. I have replied below. I will delete the section #### Enable DSA in containers using privileged mode, just keep #### Enable DSA in containers using vfio mode. privileged mode and kernel idxd module is not needed.

pushed spdk configure patch to https://github.com/ceph/spdk/tree/gerrit-24172
please update the PR spdk submodule to include gerrit-24172 patch.

Sorry， I don't understand what operation I need to do. How to update this branch or patch to the current submodule?

regarding enable_dsa conf option, I would use run time capability detection instead. as far as I understand the requirements are (a) running in privileged mode and (b) presence of idxd kernel module so the code could be:

Regarding the privilege mode, it may have been a misunderstanding caused by README I worte. The relationship between using privileged mode and using vfio mode is OR, and choosing one of them is sufficient. I didn't make it clear in the privilege section of the README that it needs to be paired with the default uio_pci_generic mode to be used in the container. But we recommend to use vfio mode. Perhaps I can delete the section privilege later, as it may cause misunderstandings and is not the preferred recommendation. We recommend using vfio-pci mode(driver), which is our most commonly used method and it is dedicated. It is userspace driver and does not require privilege mode. So it also answered the second question, there is no need to check the idxd kernel module. SPDK prefer to uses userspace libraries and drivers of DSA. It is vfio-pci. Especially in containers, the vfio driver should be used, the kernel's idxd driver can not run in container which I tried. And the doc is https://doc.dpdk.org/guides/dmadevs/idxd.html?highlight=driver. Chapter 5.3.1 and 5.3.2 show that idxd kernel driver and vfio driver. Vfio is the first choice, especially in containers.

DSA 3 drivers:

DMA devices using DPDK-compatible driver
========================================
0000:6a:01.0 'Device 0b25' drv=uio_pci_generic unused=idxd,vfio-pci
0000:e7:01.0 'Device 0b25' drv=uio_pci_generic unused=idxd,vfio-pci

CongMinYin · 2024-08-23T06:42:04Z

ping @baum

baum · 2024-08-25T07:16:06Z

Sorry @CongMinYin missed your message.

In short:

Navigate to the ceph-nvmeof repository.
Update the spdk submodule to the gerrit-24172 branch.
Commit and push the changes to your PR branch.
Verify the update, and the PR will reflect the changes on GitHub.

More details:

Navigate to Your Local Repository
First, make sure you're in the root directory of your ceph-nvmeof repository.

cd /path/to/ceph-nvmeof

Update the Submodule URL
Change the spdk submodule to point to the new branch gerrit-24172 in the ceph/spdk repository:

cd spdk
git fetch origin
git checkout gerrit-24172
cd ..

Commit the Submodule Update
After updating the submodule, you need to commit this change in the ceph-nvmeof repository:

git add spdk
git commit -m "Update spdk submodule to gerrit-24172"

Push the Changes to Your PR Branch
Push the changes to the branch associated with your PR:

git push your_fork wip-dsa

Verify the Submodule Update
You can verify that the submodule is correctly pointing to the desired commit/branch by checking the .gitmodules and the submodule reference:

git submodule status

This should show the updated commit hash corresponding to the gerrit-24172 branch in the spdk submodule.

CongMinYin · 2024-08-27T07:46:00Z

@baum done. But CI failed in building nvmeof, ".645 Error: Unable to find a match: ceph-mon-client-nvmeof". Is this related to my changes? Did I rerun the CI command?

caroav · 2024-08-27T07:49:07Z

@CongMinYin we will fix this issue probably in few hours.

baum · 2024-09-01T10:13:44Z

@CongMinYin could you please rebase, using latest .env ceph image? 🙏

CongMinYin · 2024-09-02T12:10:22Z

@baum Rebased. 1 failed in CI / pytest (cli_change_lb).
FAILED tests/test_cli_change_lb.py::test_change_namespace_lb_group - assert '...
Is this related to the submodule spdk branch? I noticed that the spdk submodule branch has been switched. I cannot directly rebase spdk, so I continued to execute:git checkout gerrit-24172. Is this failure related to spdk?

Signed-off-by: Yin Congmin <[email protected]>

CongMinYin · 2024-09-03T09:59:27Z

ping @baum already rebase to the latest code.

baum · 2024-09-09T11:33:11Z

@CongMinYin 🖖

Created an integration branch for CongMinYin:wip-dsa + gerrit 24172 - see #857. The branch includes the two required changes, based on the current ceph-nvmeof and ceph/spdk.

spdk: https://review.spdk.io/gerrit/c/spdk/spdk/+/24172 on top of ceph-nvmeof - https://github.com/ceph/spdk/tree/ceph-nvmeof-v24.01-gerrit-24172
ceph-nvmeof: CongMinYin:wip-dsa - control/server.py: Enable DSA to offload and accelerate CRC calculation

Let me know if it looks OK to you. In case everything is as expected, please approve, so I could merge it.

baum · 2024-09-09T12:40:15Z

@CongMinYin 🖖

it seems the arm spdk build is not happy - see here, wdyt?
since a new conf option spdk.enable_dsa was introduced, a patch for NvmeofServiceSpec is required to allow this option usage in cephadm env.

CongMinYin · 2024-09-10T16:22:52Z

@baum #857 It looks good to me.

Hi @bill-scales , sorry, I haven't actually tried compiling idxd on an ARM platform without an ARM machine. As you said IDXD is code safe itself, and instruction is being generated using a sequence of bytes in assembly code. Maybe you konw some about this error about 'movdir64b'?

#19 1327.3 In file included from idxd.c:16:
#19 1327.3 idxd_internal.h: In function 'movdir64b':
#19 1327.3 idxd_internal.h:24:9: error: impossible constraint in 'asm'
#19 1327.3    24 |         asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
#19 1327.3       |         ^~~

CongMinYin force-pushed the wip-dsa branch from 340787d to c78d9ce Compare June 7, 2024 08:01

oritwas reviewed Jun 23, 2024

View reviewed changes

CongMinYin force-pushed the wip-dsa branch from c78d9ce to 5cf1fa9 Compare August 9, 2024 08:10

CongMinYin force-pushed the wip-dsa branch 2 times, most recently from 676683e to d94eaa9 Compare August 26, 2024 11:05

CongMinYin force-pushed the wip-dsa branch from d94eaa9 to 1deabab Compare September 2, 2024 08:58

CongMinYin added 2 commits September 3, 2024 15:59

control/server.py: Enable DSA to offload and accelerate CRC calculation

1ec57bf

Signed-off-by: Yin Congmin <[email protected]>

Update spdk submodule to gerrit-24172

610e744

Signed-off-by: Yin Congmin <[email protected]>

CongMinYin force-pushed the wip-dsa branch from 1deabab to 610e744 Compare September 3, 2024 08:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

control/server.py: Enable DSA to offload and accelerate CRC calculation #704

control/server.py: Enable DSA to offload and accelerate CRC calculation #704

CongMinYin commented Jun 6, 2024

CongMinYin commented Jun 6, 2024 •

edited

Loading

CongMinYin commented Jun 18, 2024

caroav commented Jun 18, 2024

gbregman commented Jun 18, 2024

pcuzner commented Jun 19, 2024

caroav commented Jun 20, 2024

CongMinYin commented Jun 21, 2024 •

edited

Loading

CongMinYin commented Jun 21, 2024

oritwas Jun 23, 2024

gbregman Jun 23, 2024

caroav Jun 23, 2024

gbregman Jun 23, 2024

CongMinYin Jun 24, 2024 •

edited

Loading

bill-scales Jun 24, 2024

oritwas Jun 24, 2024

CongMinYin Jun 25, 2024

CongMinYin Jul 30, 2024

oritwas commented Jun 23, 2024

CongMinYin commented Aug 9, 2024

caroav commented Aug 9, 2024

baum commented Aug 11, 2024

CongMinYin commented Aug 12, 2024

CongMinYin commented Aug 23, 2024

baum commented Aug 25, 2024

CongMinYin commented Aug 27, 2024 •

edited

Loading

caroav commented Aug 27, 2024

baum commented Sep 1, 2024

CongMinYin commented Sep 2, 2024 •

edited

Loading

CongMinYin commented Sep 3, 2024

baum commented Sep 9, 2024

baum commented Sep 9, 2024

CongMinYin commented Sep 10, 2024

control/server.py: Enable DSA to offload and accelerate CRC calculation #704

Are you sure you want to change the base?

control/server.py: Enable DSA to offload and accelerate CRC calculation #704

Conversation

CongMinYin commented Jun 6, 2024

CongMinYin commented Jun 6, 2024 • edited Loading

CongMinYin commented Jun 18, 2024

caroav commented Jun 18, 2024

gbregman commented Jun 18, 2024

pcuzner commented Jun 19, 2024

caroav commented Jun 20, 2024

CongMinYin commented Jun 21, 2024 • edited Loading

CongMinYin commented Jun 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CongMinYin Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oritwas commented Jun 23, 2024

CongMinYin commented Aug 9, 2024

caroav commented Aug 9, 2024

baum commented Aug 11, 2024

CongMinYin commented Aug 12, 2024

CongMinYin commented Aug 23, 2024

baum commented Aug 25, 2024

CongMinYin commented Aug 27, 2024 • edited Loading

caroav commented Aug 27, 2024

baum commented Sep 1, 2024

CongMinYin commented Sep 2, 2024 • edited Loading

CongMinYin commented Sep 3, 2024

baum commented Sep 9, 2024

baum commented Sep 9, 2024

CongMinYin commented Sep 10, 2024

CongMinYin commented Jun 6, 2024 •

edited

Loading

CongMinYin commented Jun 21, 2024 •

edited

Loading

CongMinYin Jun 24, 2024 •

edited

Loading

CongMinYin commented Aug 27, 2024 •

edited

Loading

CongMinYin commented Sep 2, 2024 •

edited

Loading