Skip to content

Commit

Permalink
Add a README
Browse files Browse the repository at this point in the history
  • Loading branch information
gatesn committed Mar 1, 2024
1 parent 4f71524 commit 4e692ed
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 4 deletions.
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# ZIMD

Additional cross-platform SIMD support for Zig.

Based loosely on [Google Highway](https://github.com/google/highway)

## Why?

Zig has builtin support for SIMD operations using `@Vector`. However this only supports a few
basic operations. This library aims to fill in some of the blanks.

## Operators

### [TableLookupBytesOr0](https://google.github.io/highway/en/master/quick_reference.html#blockwise)

Architectures: Scalar, X86_SSE3, Arm_Neon

Similar to Zig's `@shuffle` operator, except doesn't require the shuffle mask to be comptime known.
8 changes: 4 additions & 4 deletions src/tblz.zig
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,11 @@ const builtin = @import("builtin");
const std = @import("std");
const zimd = @import("zimd.zig");

const TableLookupBytesOr0 = fn (bytes: @Vector(16, u8), indices: @Vector(16, i8)) callconv(.Inline) @Vector(16, u8);
pub const TableLookupBytesOr0 = fn (bytes: @Vector(16, u8), indices: @Vector(16, i8)) callconv(.Inline) @Vector(16, u8);

pub fn GetTableLookupBytesOr0(comptime cpu: std.Target.Cpu) TableLookupBytesOr0 {
pub const tableLookupBytesOr0 = GetTableLookupBytesOr0(builtin.cpu);

fn GetTableLookupBytesOr0(comptime cpu: std.Target.Cpu) TableLookupBytesOr0 {
if (comptime cpu.arch.isAARCH64() and std.Target.aarch64.featureSetHas(cpu.features, .neon)) {
return Aarch64_Neon;
}
Expand All @@ -33,8 +35,6 @@ pub fn GetTableLookupBytesOr0(comptime cpu: std.Target.Cpu) TableLookupBytesOr0
return Scalar;
}

pub const tableLookupBytesOr0 = GetTableLookupBytesOr0(builtin.cpu);

// For all vector widths; Arm anyway zeroes if >= 0x10.
inline fn Aarch64_Neon(bytes: @Vector(16, u8), indices: @Vector(16, i8)) @Vector(16, u8) {
return asm ("tbl.16b %[ret], { %[v0] }, %[v1]"
Expand Down

0 comments on commit 4e692ed

Please sign in to comment.