Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

control/server.py: Enable DSA to offload and accelerate CRC calculation #704

Open
wants to merge 2 commits into
base: devel
Choose a base branch
from

Conversation

CongMinYin
Copy link
Contributor

Intel® DSA can generate and test CRC checksum. This feature has already been implemented in SPDK. Enabling this feature allows for offloading and accelerating CRC calculations in NVMe-oF.


@CongMinYin
Copy link
Contributor Author

CongMinYin commented Jun 6, 2024

Hi, this is a PR that enables DSA to offload and accelerate CRC calculation. I have previously tested performance data using DSA acceleration in an SPDK+ceph environment. My manager and colleagues should have discussions with IBM colleagues about using DSA acceleration and show the performance data. In the container environment, I will test the performance of the gateway in the future.

Since this feature has been implemented in SPDK, I just called the interface in the gateway. But how to map DSA devices into the container, I wrote it in README. If the users want to enable DSA, they need to manually change docker-compose.yaml. I'm not sure if this is appropriate.

@CongMinYin
Copy link
Contributor Author

Hi @caroav , this one.

@caroav
Copy link
Collaborator

caroav commented Jun 18, 2024

It looks ok to me. Is this feature supported on any "x86-64-v2" architecture? Do you know what is the performance benefit of it? @pcuzner @oritwas @gbregman can you review and say if you have any concern here, or we can approve. thx.

@gbregman
Copy link
Contributor

@caroav I don't really know that code in spdk. So, I can't say whether or not it's OK.

@pcuzner
Copy link
Contributor

pcuzner commented Jun 19, 2024

@caroav I like the idea of offloading where we can, but I'd prefer to see performance comparison results before a merge is considered. We'd also need to consider the impact this has in the orchestration layer in ceph since any potential change to container CAPS or bindings would need corresponding changes planned there too.

@caroav
Copy link
Collaborator

caroav commented Jun 20, 2024

@CongMinYin the default is that this feature is Disabled right? If that's true, then @pcuzner why do we need to be concerned? @CongMinYin do you have any comments on Paul's concerns?

@CongMinYin
Copy link
Contributor Author

CongMinYin commented Jun 21, 2024

@CongMinYin the default is that this feature is Disabled right? If that's true, then @pcuzner why do we need to be concerned? @CongMinYin do you have any comments on Paul's concerns?

Yes, this feature is disabled by default. I know that changing container's CAP need to be careful. So I just wrote a update example in README. We can manually update docker-compose.yaml when need to enable it. But this will add much README content.

@CongMinYin
Copy link
Contributor Author

It looks ok to me. Is this feature supported on any "x86-64-v2" architecture? Do you know what is the performance benefit of it?

Sorry for the late reply. I was asking my manager earlier if it's possible to display performance data on GitHub. Firstly, to answer the first question, DSA is a kind of DMA engine. It is an accelerator integrated into the CPU. Currently only supported on the 4th Gen. It is not an instruction set.
Abouting performance data, I have previously tested SPDK+ceph with DSA on bare metal. I can only show some relative performance data. This is a randread test that compares performance data with and without enabling DSA. SPDK started with one core, and the IO pressure was sufficient to fill this SPDK core. At this point, the offload CRC calculation will allow the CPU to free up some resources to process more IO. When the block size of IO is larger(16KB~128KB), the effect is better.

block size(kb) Performance Gain(%) with DSA
128 52.96%
64 41.68%
32 25.21%
16 17.45%

@@ -50,7 +50,7 @@ SPDK_URL="https://spdk.io"

SPDK_PKGDEP_ARGS="--rbd"
# check spdk/configure --help
SPDK_CONFIGURE_ARGS="--with-rbd --disable-tests --disable-unit-tests --disable-examples --enable-debug"
SPDK_CONFIGURE_ARGS="--with-rbd --with-idxd --disable-tests --disable-unit-tests --disable-examples --enable-debug"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work on older CPU, please detect it and only is supported enable it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we build right now, it might not work on any CPU. To eliminate crashes on some machine we build the SPDK using x86-64-v2 architecture. This instructs the compiler not to use instructions which are not available on v2. We need to verify that the instructions used here are supported on v2 CPUs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CongMinYin @gbregman the question is what happens if one one hand we build to x86-64-v2, but on the other hand we specify this flag (--with-idxd). I assume that the SPDK build will ignore the --with-idxd because it is not supported on x86-64-v2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caroav this can easily verified. We can just disassemble the SPDK binary and see if this instruction was used or not. This is what I did when we had the issue with RDSEED.

Copy link
Contributor Author

@CongMinYin CongMinYin Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caroav I think we shuold check enable_dsa in ceph-nvmeof.conf, then configure with --with-idxd flag. But I don't how to write something like "if else" in .env or Dockerfile. It seems like you're saying the x86-64-v2 instruction set. But in reality, DSA is a DMA engine, an accelerator only integrated into the new 4th Gen CPU now.

idxd is a kernel driver. I have tried to configure idxd on server with AMD CPU, it will report an error and exit. So it's best to check the enable_dsa option in the ceph-nvmeof.conf. Provide different parameters in .env or Dockerfile. But I couldn't find a similar code block. Do you know what to do?
Run configure in a server with AMD CPU:

root@amd-server4:~/spdk# ./configure  --with-idxd
ERROR: IDXD cannot be used due to CPU incompatibility.

SPDK configure code to detect CPU:
https://github.com/spdk/spdk/blob/dfb2950fd5b13ba0a2a2b533e3e534202a89de74/configure#L741

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the IDXD code itself is safe for all hardware platforms:

  1. It is scanning for a specific PCIe device (PCI_DEVICE_ID_INTEL_IDXD) and will not initialize if this device is not found. Older Intel CPUs and AMD CPUs will not have this device.
  2. It needs to use the movdir64b instruction to pass work to the hardware, this is only supported on new Intel CPUs. However this instruction is only ever executed if the PCIe device is found. Due to lack of compiler support this instruction is being generated using a sequence of bytes in assembly code:
static inline void movdir64b(void *dst, const void *src)
{
        asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
                     : "=m"(*(char *)dst)
                     : "d"(src), "a"(dst));
}

So building the code with --with-idxd will include a new assembly instruction, but it will only be executed on CPUs that support it because of the PCIe device check. Its very hard to prove this other than by code review.

The configure script for SPDK is only permitting the code to be compiled with --with-idxd if it is compiled on an Intel CPU. This means you can't use an AMD server as the build machine. The script is not checking the CPU version - so you can build the code with support for IDXD on an old Intel CPU.

I suggest that the check in the configure script is overly restrictive, and perhaps should be reduce to a warning rather than an error as the code may be built on a system with different capabilities to the system it is run on. It might also be better to check for the presence of the PCIe device rather than the CPU vendor to determine whether the build server supports idxd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, we need support for the code to be compiled for non-Intel CPU

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caroav Is our solution to try submitting a PR to modify the SPDK's config script, or check the enable_dsa parameter in this ceph-nvmeof project?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @caroav @oritwas, I have submit a pr to solve the platforms errors: https://review.spdk.io/gerrit/c/spdk/spdk/+/24172. It merged. SPDK master branch and later versions won't met this platform errors.

@oritwas
Copy link
Member

oritwas commented Jun 23, 2024

@bill-scales FYI

@CongMinYin
Copy link
Contributor Author

Rebase to solve confilct. Hi @caroav, cherrypick spdk commit to fix CI error?

@caroav
Copy link
Collaborator

caroav commented Aug 9, 2024

yes we will take care soon.

@baum
Copy link
Collaborator

baum commented Aug 11, 2024

@CongMinYin thankl you for your help!

  1. pushed spdk configure patch to https://github.com/ceph/spdk/tree/gerrit-24172
    please update the PR spdk submodule to include gerrit-24172 patch.

  2. regarding enable_dsa conf option, I would use run time capability detection instead. as far as I understand the requirements are (a) running in privileged mode and (b) presence of idxd kernel module so the code could be:

def is_privileged_and_idxd_loaded():
    # Check if the container is running in privileged mode
    privileged = False
    try:
        with open('/proc/1/status', 'r') as f:
            for line in f:
                if line.startswith('CapEff'):
                    # CapEff shows the effective capabilities of the process
                    # If CapEff is all Fs, it indicates privileged mode
                    if 'ffffffffffffffff' in line.split()[1]:
                        privileged = True
                    break
    except FileNotFoundError:
        pass  # This might indicate the code is not running in a container
    
    # Check if the idxd module is loaded
    idxd_loaded = False
    try:
        with open('/proc/modules', 'r') as f:
            for line in f:
                if line.startswith('idxd'):
                    idxd_loaded = True
                    break
    except FileNotFoundError:
        pass  # This could indicate an issue with the file system or running environment
    
    return privileged and idxd_loaded

wdyt?

@CongMinYin
Copy link
Contributor Author

Hi @baum , thank you for your feedback. I have replied below. I will delete the section #### Enable DSA in containers using privileged mode, just keep #### Enable DSA in containers using vfio mode. privileged mode and kernel idxd module is not needed.

  1. pushed spdk configure patch to https://github.com/ceph/spdk/tree/gerrit-24172
    please update the PR spdk submodule to include gerrit-24172 patch.

Sorry, I don't understand what operation I need to do. How to update this branch or patch to the current submodule?

  1. regarding enable_dsa conf option, I would use run time capability detection instead. as far as I understand the requirements are (a) running in privileged mode and (b) presence of idxd kernel module so the code could be:

Regarding the privilege mode, it may have been a misunderstanding caused by README I worte. The relationship between using privileged mode and using vfio mode is OR, and choosing one of them is sufficient. I didn't make it clear in the privilege section of the README that it needs to be paired with the default uio_pci_generic mode to be used in the container. But we recommend to use vfio mode. Perhaps I can delete the section privilege later, as it may cause misunderstandings and is not the preferred recommendation. We recommend using vfio-pci mode(driver), which is our most commonly used method and it is dedicated. It is userspace driver and does not require privilege mode. So it also answered the second question, there is no need to check the idxd kernel module. SPDK prefer to uses userspace libraries and drivers of DSA. It is vfio-pci. Especially in containers, the vfio driver should be used, the kernel's idxd driver can not run in container which I tried. And the doc is https://doc.dpdk.org/guides/dmadevs/idxd.html?highlight=driver. Chapter 5.3.1 and 5.3.2 show that idxd kernel driver and vfio driver. Vfio is the first choice, especially in containers.

DSA 3 drivers:

DMA devices using DPDK-compatible driver
========================================
0000:6a:01.0 'Device 0b25' drv=uio_pci_generic unused=idxd,vfio-pci
0000:e7:01.0 'Device 0b25' drv=uio_pci_generic unused=idxd,vfio-pci

@CongMinYin
Copy link
Contributor Author

ping @baum

@baum
Copy link
Collaborator

baum commented Aug 25, 2024

Sorry @CongMinYin missed your message.

In short:

  • Navigate to the ceph-nvmeof repository.
  • Update the spdk submodule to the gerrit-24172 branch.
  • Commit and push the changes to your PR branch.
  • Verify the update, and the PR will reflect the changes on GitHub.

More details:

  1. Navigate to Your Local Repository
    First, make sure you're in the root directory of your ceph-nvmeof repository.
cd /path/to/ceph-nvmeof
  1. Update the Submodule URL
    Change the spdk submodule to point to the new branch gerrit-24172 in the ceph/spdk repository:
cd spdk
git fetch origin
git checkout gerrit-24172
cd ..
  1. Commit the Submodule Update
    After updating the submodule, you need to commit this change in the ceph-nvmeof repository:
git add spdk
git commit -m "Update spdk submodule to gerrit-24172"
  1. Push the Changes to Your PR Branch
    Push the changes to the branch associated with your PR:
git push your_fork wip-dsa
  1. Verify the Submodule Update
    You can verify that the submodule is correctly pointing to the desired commit/branch by checking the .gitmodules and the submodule reference:
git submodule status

This should show the updated commit hash corresponding to the gerrit-24172 branch in the spdk submodule.

@CongMinYin CongMinYin force-pushed the wip-dsa branch 2 times, most recently from 676683e to d94eaa9 Compare August 26, 2024 11:05
@CongMinYin
Copy link
Contributor Author

CongMinYin commented Aug 27, 2024

@baum done. But CI failed in building nvmeof, ".645 Error: Unable to find a match: ceph-mon-client-nvmeof". Is this related to my changes? Did I rerun the CI command?

@caroav
Copy link
Collaborator

caroav commented Aug 27, 2024

@CongMinYin we will fix this issue probably in few hours.

@baum
Copy link
Collaborator

baum commented Sep 1, 2024

@CongMinYin could you please rebase, using latest .env ceph image? 🙏

@CongMinYin
Copy link
Contributor Author

CongMinYin commented Sep 2, 2024

@baum Rebased. 1 failed in CI / pytest (cli_change_lb).
FAILED tests/test_cli_change_lb.py::test_change_namespace_lb_group - assert '...
Is this related to the submodule spdk branch? I noticed that the spdk submodule branch has been switched. I cannot directly rebase spdk, so I continued to execute:git checkout gerrit-24172. Is this failure related to spdk?

@CongMinYin
Copy link
Contributor Author

ping @baum already rebase to the latest code.

@baum
Copy link
Collaborator

baum commented Sep 9, 2024

@CongMinYin 🖖

Created an integration branch for CongMinYin:wip-dsa + gerrit 24172 - see #857. The branch includes the two required changes, based on the current ceph-nvmeof and ceph/spdk.

Let me know if it looks OK to you. In case everything is as expected, please approve, so I could merge it.

@baum
Copy link
Collaborator

baum commented Sep 9, 2024

@CongMinYin 🖖

  • it seems the arm spdk build is not happy - see here, wdyt?

  • since a new conf option spdk.enable_dsa was introduced, a patch for NvmeofServiceSpec is required to allow this option usage in cephadm env.

@CongMinYin
Copy link
Contributor Author

@baum #857 It looks good to me.

Hi @bill-scales , sorry, I haven't actually tried compiling idxd on an ARM platform without an ARM machine. As you said IDXD is code safe itself, and instruction is being generated using a sequence of bytes in assembly code. Maybe you konw some about this error about 'movdir64b'?

#19 1327.3 In file included from idxd.c:16:
#19 1327.3 idxd_internal.h: In function 'movdir64b':
#19 1327.3 idxd_internal.h:24:9: error: impossible constraint in 'asm'
#19 1327.3    24 |         asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
#19 1327.3       |         ^~~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

Successfully merging this pull request may close these issues.

7 participants