Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about RVV instruction throughput #13

Open
zhongjuzhe opened this issue May 22, 2024 · 6 comments
Open

Question about RVV instruction throughput #13

zhongjuzhe opened this issue May 22, 2024 · 6 comments

Comments

@zhongjuzhe
Copy link

Hi, I saw each RVV instruction throughput result here:
https://camel-cdr.github.io/rvv-bench-results/bpi_f3/index.html

If I want to test the execution throughput of each RVV instructions in other RISC-V board, could you give me guides ?

And I wonder whether how you measure the execution throughput ?

Thanks,

@camel-cdr
Copy link
Owner

Hi @zhongjuzhe, if you click on "Example measurement code for vadd.vx" you can see an example of what code I use to measure throughput.

To use this repo your self you need to:

  • clone the repo
  • modify ./config.mk to fit your platform (there are a few suggestions commented out)
  • modify ./run.sh to fit your platform (there are a few suggestions commented out)
  • cd instructions/rvv
  • make will build the executable, you can either execute it your self or run make run to automatically execute it with run.sh

Since the linux has disabled user level performance counter access in later versions you need to re enable them:

  • kernel version >=v6.5-rc1: enable the sysctl perf_user_access, see this article
  • kernel version <v6.5-rc1: add -DENABLE_RDCYCLE_HACK to the CFLAGS in ./config.mk. This works by using perf_event_open to measure the cycle count, because this somehow also enables user level access to the performance counter.

If you are on a more obscure platform you may need to modify ./nolibc.h to work for it.

I'll try to update the README soon, and add a wiki page for instructions on different configurations.

Please tell me if you still run into problems.

@zhongjuzhe
Copy link
Author

Is is possible to run intructions/rvv in baremetal ?

I tried this following command:
Clang -march=rv64gcv -O3 main.c

but failed to compile it with several undefined referenced:

undefined reference to 'bench_types'.
....

etc

@camel-cdr
Copy link
Owner

Yes it is, you'll have to replace the rdcycle rd instructions with csrr rd, mcycle and implement memwrite and the proper entry to main in nolibc.h.

Your command doesn't work, because you also need to preprocess (with m4) and build main.S, just look at how the Makefile does it.

I'll add some examples this weekend, including one for running baremetal on the t1 rtl simulation. That should help.

@camel-cdr
Copy link
Owner

I've updated the README, but didn't get to writing the wiki, because the new t1 image doesn't work as expected. I'll create it once that has been fixed.

For now, here is how I build the baremetal benchmark for it before.
You probably need different compiler configuration and memwrite implementation, but this should be roughly what you need to modify for a baremetal system.

You should already have a linker configuration and entry point if you run on bare metal, so use those instead of the t1 specific ones here.

# config.mk
WARN=-Wall -Wextra -Wno-unused-function -Wno-unused-parameter
CC=clang
CFLAGS=--target=riscv32 -march=rv32gc_zve32f -mabi=ilp32 -mno-relax -static -mcmodel=medany -fvisibility=hidden -nostdlib -fno-builtin -ffreestanding -fno-PIC ${WARN} -T /t1.ld /t1_main.S -DCUSTOM_HOST  -DREAD_MCYCLE
# t1_main.S
# from: https://github.com/chipsalliance/t1/blob/master/tests/t1_main.S
.globl _start
_start:
    li a0, 0x2200 # VS&FS
    csrs mstatus, a0
    csrwi vcsr, 0
    #csrwi mcounteren,7
    li a0, -8
    csrw  mcountinhibit,a0
    #csrr a0, mcycle

    la sp, __stacktop

    // no ra to save
    call nolibc_start

    // exit
    li a0, 0x10000000
    li a1, -1
    sw a1, 4(a0)
    csrwi 0x7cc, 0

    .p2align 2
// t1.ld
// from https://github.com/chipsalliance/t1/blob/master/tests/t1.ld
OUTPUT_ARCH(riscv)
ENTRY(_start)

MEMORY {
  SCALAR (RWX) : ORIGIN = 0x20000000, LENGTH = 512M /* put first to set it as default */
  MMIO   (RW)  : ORIGIN = 0x00000000, LENGTH = 512M
  DDR    (RW)  : ORIGIN = 0x40000000, LENGTH = 2048M
  SRAM   (RW)  : ORIGIN = 0xc0000000, LENGTH = 4M /* TODO: read from config */
}

SECTIONS {
  . = ORIGIN(SCALAR);
  .text           : { *(.text .text.*) }
  . = ALIGN(0x1000);

  .data           : { *(.data .data.*) }
  . = ALIGN(0x1000);

  .sdata          : { *(.sdata .sdata.*) }
  . = ALIGN(0x1000);

  .srodata          : { *(.srodata .srodata.*) }
  . = ALIGN(0x1000);

  .bss            : { *(.bss .bss.*) }
  _end = .; PROVIDE (end = .);

  . = ORIGIN(SRAM);
  .vdata : { *(.vdata .vdata.*) } >SRAM

  .vbss (TYPE = SHT_NOBITS) : { *(.vbss .vbss.*) } >SRAM

  __stacktop = ORIGIN(SCALAR) + LENGTH(SCALAR);  /* put stack on the top of SCALAR */
  __heapbegin = ORIGIN(DDR);  /* put heap on the begin of DDR */
}
// nolibc.h
...
#ifdef CUSTOM_HOST

#define IFHOSTED(...)
#define EXIT_FAILURE 1
#define EXIT_SUCCESS 0

/* customize me */

// output to t1 uart
static void
memwrite(void const *ptr, size_t len) {
	struct uartlite_regs {
		unsigned int rx_fifo;
		unsigned int tx_fifo;
		unsigned int status;
		unsigned int control;
	};
	volatile struct uartlite_regs *const ttyUL0 = (struct uartlite_regs *)0x10000000;
	unsigned char *p = ptr;
	while (len--) {
		while (ttyUL0->status & (1<<3));
		ttyUL0->tx_fifo = *p++;
	}
}

// static size_t /* only needed for vector-utf/bench.c */
// memread(void *ptr, size_t len) { }

static void
exit(int x) { __asm volatile("unimp\n"); }

int main(void);
void nolibc_start(void) {
	int x = main();
	flush();
}

#elif __STDC_HOSTED__
...

@zhongjuzhe
Copy link
Author

Is it possible to disable FP16 vector testcase ?

@camel-cdr
Copy link
Owner

Yes, they shouldn't be enabled default.

rvv/config.h should exclude them with the mask by default, but maybe I've missed something. Can you share after which instruction you get an illegal instruction/where the problem is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants