Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rapidyenc on RISC-V with RVV 1.0 (Armbian Ubuntu Noble, GCC 14) #5

Open
sanderjo opened this issue May 18, 2024 · 5 comments
Open

rapidyenc on RISC-V with RVV 1.0 (Armbian Ubuntu Noble, GCC 14) #5

sanderjo opened this issue May 18, 2024 · 5 comments

Comments

@sanderjo
Copy link

Following your advice on sabnzbd/sabctools#116 (comment) ... work with rapidyenc ... and ...

Bingo! v-commands in the deassembled library

sander@bananapif3:~/git/rapidyenc/build$ objdump -d librapidyenc.so | awk '{ print $3 }' | sort -u | grep -E "^v"
vadd.vi
vadd.vv
vcompress.vm
vcpop.m
viota.m
vle8.v
vmadc.vi
vmadc.vv
vmand.mm
vmandn.mm
vmerge.vxm
vmnor.mm
vmnot.m
vmor.mm
vmsbf.m
vmseq.vi
vmseq.vv
vmseq.vx
vmsltu.vx
vmv1r.v
vmv2r.v
vmv.s.x
vmv.v.i
vmv.v.x
vmv.x.s
vmxnor.mm
vmxor.mm
vor.vx
vrgather.vv
vse8.v
vsetivli
vsetvli
vslide1down.vx
vslide1up.vx
vslidedown.vx
vsll.vi
vsrl.vi
vsrl.vv
vsub.vv
vsub.vx
vwmulu.vx
vzext.vf2
sander@bananapif3:~/git/rapidyenc/build$ cmake ..
-- The C compiler identification is GNU 14.0.1
-- The CXX compiler identification is GNU 14.0.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for posix_memalign
-- Looking for posix_memalign - found
-- Performing Test COMPILER_SUPPORTS_RVV
-- Performing Test COMPILER_SUPPORTS_RVV - Success
-- Performing Test COMPILER_SUPPORTS_ZBKC
-- Performing Test COMPILER_SUPPORTS_ZBKC - Success
-- Configuring done (5.9s)
-- Generating done (0.1s)
-- Build files have been written to: /home/sander/git/rapidyenc/build


sander@bananapif3:~/git/rapidyenc/build$ cmake --build . --config Release
[  2%] Building CXX object CMakeFiles/rapidyenc.dir/src/platform.cc.o
[  5%] Building CXX object CMakeFiles/rapidyenc.dir/src/encoder.cc.o
[  7%] Building CXX object CMakeFiles/rapidyenc.dir/src/encoder_sse2.cc.o
[ 10%] Building CXX object CMakeFiles/rapidyenc.dir/src/encoder_ssse3.cc.o
[ 13%] Building CXX object CMakeFiles/rapidyenc.dir/src/encoder_avx.cc.o
[ 15%] Building CXX object CMakeFiles/rapidyenc.dir/src/encoder_avx2.cc.o
[ 18%] Building CXX object CMakeFiles/rapidyenc.dir/src/encoder_vbmi2.cc.o
[ 21%] Building CXX object CMakeFiles/rapidyenc.dir/src/encoder_neon.cc.o
[ 23%] Building CXX object CMakeFiles/rapidyenc.dir/src/encoder_rvv.cc.o
[ 26%] Building CXX object CMakeFiles/rapidyenc.dir/src/decoder.cc.o
[ 28%] Building CXX object CMakeFiles/rapidyenc.dir/src/decoder_sse2.cc.o
[ 31%] Building CXX object CMakeFiles/rapidyenc.dir/src/decoder_ssse3.cc.o
[ 34%] Building CXX object CMakeFiles/rapidyenc.dir/src/decoder_avx.cc.o
[ 36%] Building CXX object CMakeFiles/rapidyenc.dir/src/decoder_avx2.cc.o
[ 39%] Building CXX object CMakeFiles/rapidyenc.dir/src/decoder_vbmi2.cc.o
[ 42%] Building CXX object CMakeFiles/rapidyenc.dir/src/decoder_neon.cc.o
[ 44%] Building CXX object CMakeFiles/rapidyenc.dir/src/decoder_rvv.cc.o
[ 47%] Building CXX object CMakeFiles/rapidyenc.dir/src/crc.cc.o
[ 50%] Building CXX object CMakeFiles/rapidyenc.dir/src/crc_folding.cc.o
[ 52%] Building CXX object CMakeFiles/rapidyenc.dir/src/crc_folding_256.cc.o
[ 55%] Building CXX object CMakeFiles/rapidyenc.dir/src/crc_arm.cc.o
[ 57%] Building CXX object CMakeFiles/rapidyenc.dir/src/crc_arm_pmull.cc.o
[ 60%] Building CXX object CMakeFiles/rapidyenc.dir/src/crc_riscv.cc.o
[ 63%] Building CXX object CMakeFiles/rapidyenc.dir/crcutil-1.0/code/crc32c_sse4.cc.o
[ 65%] Building CXX object CMakeFiles/rapidyenc.dir/crcutil-1.0/code/multiword_64_64_cl_i386_mmx.cc.o
[ 68%] Building CXX object CMakeFiles/rapidyenc.dir/crcutil-1.0/code/multiword_64_64_gcc_amd64_asm.cc.o
[ 71%] Building CXX object CMakeFiles/rapidyenc.dir/crcutil-1.0/code/multiword_64_64_gcc_i386_mmx.cc.o
[ 73%] Building CXX object CMakeFiles/rapidyenc.dir/crcutil-1.0/code/multiword_64_64_intrinsic_i386_mmx.cc.o
[ 76%] Building CXX object CMakeFiles/rapidyenc.dir/crcutil-1.0/code/multiword_128_64_gcc_amd64_sse2.cc.o
[ 78%] Building CXX object CMakeFiles/rapidyenc.dir/crcutil-1.0/examples/interface.cc.o
/home/sander/git/rapidyenc/crcutil-1.0/examples/interface.cc: In static member function ‘static crcutil_interface::CRC* crcutil_interface::CRC::Create(crcutil_interface::UINT64, crcutil_interface::UINT64, size_t, bool, crcutil_interface::UINT64, crcutil_interface::UINT64, size_t, bool, const void**)’:
/home/sander/git/rapidyenc/crcutil-1.0/examples/interface.cc:232:23: warning: unused parameter ‘use_sse4_2’ [-Wunused-parameter]
  232 |                  bool use_sse4_2,
      |                  ~~~~~^~~~~~~~~~
[ 78%] Built target rapidyenc
[ 81%] Building CXX object CMakeFiles/rapidyenc_shared.dir/rapidyenc.cc.o
[ 84%] Linking CXX shared library librapidyenc.so
[ 84%] Built target rapidyenc_shared
[ 86%] Building CXX object CMakeFiles/rapidyenc_static.dir/rapidyenc.cc.o
[ 89%] Linking CXX static library rapidyenc_static/librapidyenc.a
[ 89%] Built target rapidyenc_static
[ 92%] Building C object CMakeFiles/rapidyenc_cli.dir/tool/cli.c.o
[ 94%] Linking CXX executable rapidyenc_cli
[ 94%] Built target rapidyenc_cli
[ 97%] Building CXX object CMakeFiles/rapidyenc_bench.dir/tool/bench.cc.o
[100%] Linking CXX executable rapidyenc_bench
[100%] Built target rapidyenc_bench
sander@bananapif3:~/git/rapidyenc/build$ 


@sanderjo
Copy link
Author

sander@bananapif3:~/git/rapidyenc/build$ ./rapidyenc_bench 
Encode (unknown): 608.665 MB/s
Decode (unknown): 778.601 MB/s
CRC32 (generic): 418.045 MB/s
CRC32 256^n: 0.413567 Mop/s

For reference: on my laptop with 11th Gen Intel(R) Core(TM) i3-1115G4 @ 3.00GHz:

(base) sander@zwart2204:~/git/rapidyenc/build$ ./rapidyenc_bench 
Encode (VBMI2): 8118.54 MB/s
Decode (VBMI2): 14876.9 MB/s
CRC32 (VPCLMUL): 20696.9 MB/s
CRC32 256^n: 63.5324 Mop/s

@animetosho
Copy link
Owner

animetosho commented May 18, 2024

Thanks for testing.

I think they compare the K1 against a Cortex A55 - so that seems to be respectable. This page says it should be around a 1.3GHz A55.

Testing on a 1.8GHz Kryo Silver (A55 derivative):

Encode (NEON): 837.907 MB/s
Decode (NEON): 850.552 MB/s
CRC32 (generic): 5196.66 MB/s
CRC32 256^n: 9.89022 Mop/s

So roughly in the same ballpark. Not quite their "2x NEON" claim, though the RVV code probably has some optimisation opportunities.
(also, do you know the clockspeed? they don't seem to advertise that)

I think the CRC32 kernel displayed above is incorrect as it should be using ARM-CRC acceleration.
I don't think your CPU supports scalar crypto (Zbc/Zbkc), so your CRC32 result is likely for the generic implementation.

@sanderjo
Copy link
Author

(also, do you know the clockspeed? they don't seem to advertise that)

CPU: Spacemit X60 (8) @ 1.600GHz

image

sander@bananapif3:~$ neofetch 
                                 sander@bananapif3 
                                 ----------------- 
      █ █ █ █ █ █ █ █ █ █ █      OS: Armbian (24.5.0-trunk) riscv64 
     ███████████████████████     Host: spacemit k1-x deb1 board 
   ▄▄██                   ██▄▄   Kernel: 6.1.15-legacy-k1 
   ▄▄██    ███████████    ██▄▄   Uptime: 2 hours, 13 mins 
   ▄▄██   ██         ██   ██▄▄   Packages: 1321 (dpkg) 
   ▄▄██   ██         ██   ██▄▄   Shell: bash 5.2.21 
   ▄▄██   ██         ██   ██▄▄   Resolution: 1920x1080 
   ▄▄██   █████████████   ██▄▄   Terminal: /dev/pts/1 
   ▄▄██   ██         ██   ██▄▄   CPU: Spacemit X60 (8) @ 1.600GHz 
   ▄▄██   ██         ██   ██▄▄   Memory: 224MiB / 3809MiB 
   ▄▄██   ██         ██   ██▄▄
   ▄▄██                   ██▄▄                           
     ███████████████████████                             
      █ █ █ █ █ █ █ █ █ █ █

sander@bananapif3:~$ 

@sanderjo
Copy link
Author

I think the CRC32 kernel displayed above is incorrect as it should be using ARM-CRC acceleration. I don't think your CPU supports scalar crypto (Zbc/Zbkc), so your CRC32 result is likely for the generic implementation.

Some other z-options, but not zbc nor zbkc

sander@bananapif3:~$ cat /proc/cpuinfo 
processor	: 0
hart		: 0
model name	: Spacemit(R) X60
isa		: rv64imafdcv_sscofpmf_sstc_svpbmt_zicbom_zicboz_zicbop_zihintpause
mmu		: sv39
mvendorid	: 0x710
marchid		: 0x8000000058000001
mimpid		: 0x1000000049772200

@sanderjo sanderjo changed the title rapidyenc on RISC-V with RVV 1.0 rapidyenc on RISC-V with RVV 1.0 (Armbian Ubuntu Noble, GCC 14) May 18, 2024
animetosho added a commit that referenced this issue May 18, 2024
@animetosho
Copy link
Owner

Thanks for the info!

By the way, if there's something else you want to test with your new board, there is RVV code in ParPar as well (which is also used as a base for par2cmdline-turbo).
If you want to test, you'll need the dev branch of the code, do a cmake in the test/bench folder, then run ./bench-gf16 -fmuladdmp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants