-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RVV Bench on ARA #17
Comments
Yes, I tried about half a year ago, but ara still had some bugs, which broke most of the benchmarks. |
@camel-cdr I do have a docker container for the ARA. Here is the repo for it. https://github.com/rseac/pulp-ara-docker |
@rseac Thanks a lot! I'll try to use it. |
@rseac You docker worked great, ara unfortunately didn't. I tried running my instruction throughout benchmark, but about 5% of instructions hang the simulation with certain valid vtype configurations. The most problematic examples were the integer comparisons (vmseq,...), except for vmsgtu/vmsgt, which hung with SEW=16 and logical mask instructions. Otheres are vrgather.vx, vcompress.vm, vredsum.vs, vmadc, zext vrgather.vv works, but is extremely slow at 6-8 cycles per element. It's still measuring the last handful of instructions, but if you want, I can share the results of the instructions that worked once it completed. |
@camel-cdr Yes, I'd be happy to take a look at the results. I could even try to post some of these problems you notice on the ARA issues once I understand them myself. |
@camel-cdr Did you use the same scripts as linked earlier? Or specific ones to target the ARA core? |
@rseac Sorry for the late response. Here are the measurements I managed to run: log.txt I used your buildscript with small modifications: FROM ubuntu:22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt -y update && apt -y upgrade && apt-get --no-install-recommends -y install \
build-essential git zlib1g zlib1g-dev pkg-config cmake vim \
ninja-build python3 texinfo device-tree-compiler \
autoconf automake bc bison clang flex \
ca-certificates ccache libfl2 libfl-dev help2man \
curl libelf-dev python3-numpy \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN git clone https://github.com/pulp-platform/ara.git
WORKDIR /ara
RUN git config --global url."https://github.com/".insteadOf "[email protected]:";
RUN git submodule update --init --recursive
RUN make toolchain-llvm
RUN make riscv-isa-sim
RUN make verilator
RUN /usr/bin/install -c /ara/install/verilator/bin/verilator_bin /ara/install/verilator/share/verilator/
RUN cd hardware && make checkout && make apply-patches && make verilate
WORKDIR /ara/apps
RUN git clone https://github.com/camel-cdr/rvv-bench \
&& cp rvv-bench/nolibc.h . \
&& mkdir rvv \
&& cp rvv-bench/instructions/rvv/gen.S . \
&& cp rvv-bench/instructions/rvv/config.h rvv-bench/instructions/rvv/main.c rvv \
&& sed -e '2a#define CUSTOM_HOST 1' -e '2a#include "printf.h"' -e '2a#include <string.h>' -i nolibc.h \
&& sed 's/main/nolibc_main/g;s/_start/main/g;s/nolibc_main();/\0\n#define main nolibc_main/g' -i nolibc.h \
&& sed 's/\(memwrite(.*\)}/\1printf("%.*s",len,ptr);}/g' -i nolibc.h \
&& sed 's/WARMUP.*$/WARMUP 1/g;s/UNROLL.*$/UNROLL 4/g;s/LOOP.*$/LOOP 8/1;s/RUNS.*$/RUNS 1/g' -i rvv/config.h \
&& sed 's/\.\.\/nolibc/nolibc/g' -i rvv/main.c
RUN echo 'echo "vim gen.S && m4 gen.S > rvv/gen.S && ( make clean; make bin/rvv ) 2>&1 >/dev/null && app=rvv make -C /ara/hardware simv"' > ~/.bashrc You can just execute the command printed once you run the container. The following were the ones that failed to execute correctly: m_mask($1, bench_vrgathervx, T_A, m_mod_t0_vl, vrgather.vx, v8, v16, t0)
m_mask($1, bench_vrgathervi, T_A, m_nop, vrgather.vi, v8, v16, 3)
m_mask($1, bench_vredsumvs, T_A, m_nop, vredsum.vs, v8, v16, v24)
m_mask($1, bench_vredandvs, T_A, m_nop, vredand.vs, v8, v16, v24)
m_mask($1, bench_vredorvs, T_A, m_nop, vredor.vs, v8, v16, v24)
m_mask($1, bench_vredxorvs, T_A, m_nop, vredxor.vs, v8, v16, v24)
m_mask($1, bench_vredminuvs, T_A, m_nop, vredminu.vs, v8, v16, v24)
m_mask($1, bench_vredminvs, T_A, m_nop, vredmin.vs, v8, v16, v24)
m_mask($1, bench_vredmaxuvs, T_A, m_nop, vredmaxu.vs, v8, v16, v24)
m_mask($1, bench_vredmaxvs, T_A, m_nop, vredmax.vs, v8, v16, v24)
m_bench_vxim($1, T_A, vmadc)
m_bench_vxm($1, T_A, vmsbc)
m_bench_vxi($1, T_A, vmseq)
m_bench_vxi($1, T_A, vmsne)
m_bench_vx($1, T_A, vmsltu)
m_bench_vx($1, T_A, vmslt)
m_bench_vxi($1, T_A, vmsleu)
m_bench_vxi($1, T_A, vmsle)
m_$1(bench_vcompressvm, T_A, m_nop, vcompress.vm, v8, v16, v24)
m_$1(bench_vmandnmm, T_m1, m_nop, vmandn.mm, v8, v16, v24)
m_$1(bench_vmandmm, T_m1, m_nop, vmand.mm, v8, v16, v24)
m_$1(bench_vmormm, T_m1, m_nop, vmor.mm, v8, v16, v24)
m_$1(bench_vmxormm, T_m1, m_nop, vmxor.mm, v8, v16, v24)
m_$1(bench_vmornmm, T_m1, m_nop, vmorn.mm, v8, v16, v24)
m_$1(bench_vmnandmm, T_m1, m_nop, vmnand.mm, v8, v16, v24)
m_$1(bench_vmnormm, T_m1, m_nop, vmnor.mm, v8, v16, v24)
m_$1(bench_vmxnormm, T_m1, m_nop, vmxnor.mm, v8, v16, v24)
m_mask($1, bench_vfredosumvs, T_F, m_nop, vfredosum.vs, v8, v16, v24)
m_mask($1, bench_vwredsumuvs, T_WR, m_nop, vwredsumu.vs, v8, v16, v24)
m_mask($1, bench_vwredsumvs, T_WR, m_nop, vwredsum.vs, v8, v16, v24)
m_mask($1, bench_vfwredosumvs, T_FWR, m_nop, vfwredosum.vs, v8, v16, v24)
m_mask($1, bench_vfwredusumvs, T_FWR, m_nop, vfwredusum.vs, v8, v16, v24)
m_mask($1, bench_vfirstm, T_m1, m_1bit, vfirst.m, t0, v8)
m_mask($1, bench_vzextvf2, T_E2, m_1bit, vzext.vf2, v8, v16)
m_mask($1, bench_vsextvf2, T_E2, m_1bit, vsext.vf2, v8, v16)
m_mask($1, bench_vzextvf4, T_E4, m_1bit, vzext.vf4, v8, v16)
m_mask($1, bench_vsextvf4, T_E4, m_1bit, vsext.vf4, v8, v16)
m_mask($1, bench_vzextvf8, T_E8, m_1bit, vzext.vf8, v8, v16)
m_mask($1, bench_vsextvf8, T_E8, m_1bit, vsext.vf8, v8, v16) |
Hey @camel-cdr, @rseac, I should have fixed many of the old bugs you initially reported and open-sourced the missing RVV instruction support. I am almost done with basic Linux support and my next priority is verification. I will start from these instructions, thanks for reporting! |
@mp-17 Thanks, thats great to hear, I closed the old issue. Is the vrgather performance I measured expected and/or how does the current implementation work/is supposed to work? I saw the new AraXL paper, is that a fork of Ara or further development? |
@camel-cdr Thanks for providing this. These are single instruction tests, is that correct? I'm assuming that you'd want these single instruction tests to pass before trying the rev-bench benchmarks themselves? Do any of the benchmarks in bench/ go through? Or has that not been tried yet. |
@rseac Yes, I prefer to get the instructions themselves working. I tried a few of the benchmarks, here are results from a 4-lane configuration, for the things that worked:
It's interesting to see that alignment matters a lot. |
Is there interest in trying to get these benchmarks work on the ARA vector engine? Has this already been started?
The text was updated successfully, but these errors were encountered: