Skip to content

Commit

Permalink
Add vector load
Browse files Browse the repository at this point in the history
  • Loading branch information
jiegec committed Dec 11, 2023
1 parent d2aaf99 commit 6e89431
Show file tree
Hide file tree
Showing 3 changed files with 155 additions and 16 deletions.
12 changes: 0 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,26 +6,14 @@ Arranged from QEMU implementation and [GCC Intrinsics](https://gcc.gnu.org/onlin

TODO List:

### vld

Vector Load

### vst

Vector Store

### vldrepl.d/w/h/b

Vector Load Replicate

### vstelm.d/w/h/b

Vector Store Element

### vldx

Vector Load with Register Offset

### vstx

Vector Store with Register Offset
Expand Down
139 changes: 139 additions & 0 deletions docs/lsx_memory/vld.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Memory Load

## __m128i __lsx_vld (void * addr, imm_n2048_2047 offset)

### Synopsis

```c++
__m128i __lsx_vld (void * addr, imm_n2048_2047 offset)
#include <lsxintrin.h>
Instruction: vld vr, r, imm
CPU Flags: LSX
```
### Description
Read 128-bit data from memory address `addr + offset`, save the data into `dst`.
### Operation
```c++
dst = memory_load(128, addr + offset);
```

## __m128i __lsx_vldx (void * addr, long int offset)

### Synopsis

```c++
__m128i __lsx_vldx (void * addr, long int offset);
#include <lsxintrin.h>
Instruction: vldx vr, r, r
CPU Flags: LSX
```
### Description
Read 128-bit data from memory address `addr + offset`, save the data into `dst`.
### Operation
```c++
dst = memory_load(128, addr + offset);
```

## __m128i __lsx_vldrepl_b (void * addr, imm_n2048_2047 offset)

### Synopsis

```c++
__m128i __lsx_vldrepl_b (void * addr, imm_n2048_2047 offset)
#include <lsxintrin.h>
Instruction: vldrepl.b vr, r, imm
CPU Flags: LSX
```
### Description
Read 8-bit data from memory address `addr + offset`, replicate the data to all vector lanes and save into `dst`.
### Operation
```c++
u8 data = memory_load(8, addr + offset);
for (int i = 0;i < 16;i++) {
dst.byte[i] = data;
}
```

## __m128i __lsx_vldrepl_h (void * addr, imm_n1024_1023 offset)

### Synopsis

```c++
__m128i __lsx_vldrepl_h (void * addr, imm_n1024_1023 offset)
#include <lsxintrin.h>
Instruction: vldrepl.h vr, r, imm
CPU Flags: LSX
```
### Description
Read 16-bit data from memory address `addr + (offset << 1)`, replicate the data to all vector lanes and save into `dst`.
### Operation
```c++
u16 data = memory_load(16, addr + (offset << 1));
for (int i = 0;i < 8;i++) {
dst.half[i] = data;
}
```

## __m128i __lsx_vldrepl_w (void * addr, imm_n512_511 offset)

### Synopsis

```c++
__m128i __lsx_vldrepl_w (void * addr, imm_n512_511 offset)
#include <lsxintrin.h>
Instruction: vldrepl.w vr, r, imm
CPU Flags: LSX
```
### Description
Read 32-bit data from memory address `addr + (offset << 2)`, replicate the data to all vector lanes and save into `dst`.
### Operation
```c++
u32 data = memory_load(32, addr + (offset << 2));
for (int i = 0;i < 4;i++) {
dst.word[i] = data;
}
```

## __m128i __lsx_vldrepl_d (void * addr, imm_n256_255 offset)

### Synopsis

```c++
__m128i __lsx_vldrepl_d (void * addr, imm_n256_255 offset)
#include <lsxintrin.h>
Instruction: vldrepl.d vr, r, imm
CPU Flags: LSX
```
### Description
Read 64-bit data from memory address `addr + (offset << 3)`, replicate the data to all vector lanes and save into `dst`.
### Operation
```c++
u64 data = memory_load(64, addr + (offset << 3));
for (int i = 0;i < 2;i++) {
dst.dword[i] = data;
}
```
20 changes: 16 additions & 4 deletions docs/lsx_misc/vshuf.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,10 @@ Caveat: the indices are placed in `c`, while in other `vshuf` intrinsics they ar
```c++
for (int i = 0;i < 16;i++) {
if ((c.byte[i] % 32) < 16) {
if (c.byte[i] >= 64) {
// Caveat: observed in 3C5000, but not in QEMU
dst.byte[i] = 0;
} else if ((c.byte[i] % 32) < 16) {
dst.byte[i] = b.byte[c.byte[i] % 16];
} else {
dst.byte[i] = a.byte[c.byte[i] % 16];
Expand All @@ -48,7 +51,10 @@ Shuffle half words from `b` and `c` with indices from `a`.
```c++
for (int i = 0;i < 8;i++) {
if ((a.half[i] % 16) < 8) {
if (c.byte[i] >= 64) {
// Caveat: observed in 3C5000, but not in QEMU
dst.byte[i] = 0;
} else if ((a.half[i] % 16) < 8) {
dst.half[i] = c.half[a.half[i] % 8];
} else {
dst.half[i] = b.half[a.half[i] % 8];
Expand All @@ -75,7 +81,10 @@ Shuffle words from `b` and `c` with indices from `a`.
```c++
for (int i = 0;i < 4;i++) {
if ((a.word[i] % 8) < 4) {
if (c.byte[i] >= 64) {
// Caveat: observed in 3C5000, but not in QEMU
dst.byte[i] = 0;
} else if ((a.word[i] % 8) < 4) {
dst.word[i] = c.word[a.word[i] % 4];
} else {
dst.word[i] = b.word[a.word[i] % 4];
Expand All @@ -102,7 +111,10 @@ Shuffle words from `b` and `c` with indices from `a`.
```c++
for (int i = 0;i < 2;i++) {
if ((a.word[i] % 4) < 2) {
if (c.byte[i] >= 64) {
// Caveat: observed in 3C5000, but not in QEMU
dst.byte[i] = 0;
} else if ((a.word[i] % 4) < 2) {
dst.word[i] = c.word[a.word[i] % 2];
} else {
dst.word[i] = b.word[a.word[i] % 2];
Expand Down

0 comments on commit 6e89431

Please sign in to comment.