Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero masked arithmetic operations #7

Open
wants to merge 115 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
115 commits
Select commit Hold shift + click to select a range
216a960
Allow compilation of AVX2 on x86
malaterre May 30, 2024
620fec7
Added MinMagnitude and MaxMagnitude ops
johnplatts Oct 11, 2024
57a6c9b
add Get/Set for vectors and use them to implement Concat* operators
lsrcz Oct 23, 2024
bea0da5
Bump step-security/harden-runner from 2.10.1 to 2.10.2
dependabot[bot] Nov 19, 2024
5c7a693
Merge pull request #2381 from google:dependabot/github_actions/step-s…
copybara-github Nov 19, 2024
3da3328
Revert to previous logic for GatherIndexN
Mousius Nov 19, 2024
3cb3a91
detect cache parameters
jan-wassenberg Nov 20, 2024
8a0602d
replace non-test/trace fprintf with new hwy::Warn/HWY_WARN
jan-wassenberg Nov 20, 2024
89b2678
topology fixes for M3
jan-wassenberg Nov 21, 2024
d22ccd0
Add topology support for Windows and Apple
jan-wassenberg Nov 25, 2024
7e01a07
add LSX/LASX targets. Refs #2386
jan-wassenberg Nov 25, 2024
2b565e8
Merge pull request #2383 from Mousius:revert-gatherindexn
copybara-github Nov 25, 2024
02253c8
Remove VQSORT_SKIP workaround for compiler bug
ZequanWu Nov 26, 2024
68b0fde
Add BitsFromMask, promoting from detail::.
jan-wassenberg Nov 27, 2024
bcf564e
Add BitsFromMask, promoting from detail::.
jan-wassenberg Nov 28, 2024
9b39ef2
Added AVX10_2 and AVX10_2_512 targets
johnplatts Dec 1, 2024
b13a46f
Merge pull request #2395 from johnplatts:hwy_avx10_120124
copybara-github Dec 2, 2024
62c0a79
(v2 of) Add BitsFromMask, promoting from detail::.
jan-wassenberg Dec 2, 2024
07396f9
Fix for RVV CMake detection if cross-compiling with Clang
johnplatts Dec 2, 2024
914cb69
Made changes to RVV Concat, Combine, ZeroExtendVector, and UpperHalf ops
johnplatts Dec 2, 2024
9689e2c
Merge pull request #2396 from johnplatts:hwy_rvv_cmake_fix_120224
copybara-github Dec 2, 2024
80839b5
Enable tuples on RVV with Clang 17 or later
johnplatts Dec 2, 2024
fca5363
Merge pull request #2397 from johnplatts:hwy_rvv_tuple_120224
copybara-github Dec 3, 2024
fccc82d
Bump actions/cache from 4.0.2 to 4.2.0
dependabot[bot] Dec 6, 2024
b9fa960
Merge pull request #2401 from google:dependabot/github_actions/action…
copybara-github Dec 6, 2024
83b81ab
add perf_counters
jan-wassenberg Dec 11, 2024
a076ade
fix topology detection for some CPUs being offline (e.g. SMT off)
jan-wassenberg Dec 12, 2024
eb4dc59
Fixes for ZSeries with GCC 9 or earlier or Clang 18 or earlier
johnplatts Dec 13, 2024
065050e
Merge pull request #2411 from johnplatts:jep_s390x_fix_121324
copybara-github Dec 14, 2024
49674e1
update thresholds for test failure
jan-wassenberg Dec 16, 2024
06d80cf
Merge branch 'master' into hwy_min_max_mag_101024
johnplatts Dec 16, 2024
5cde138
fix build in case building for loongarch already (not yet supported)
jan-wassenberg Dec 16, 2024
09f8b6e
Make tests runnable with Bazel8
eugeneo Dec 19, 2024
e8b3825
Unroller: allow const input
eugeneo Dec 19, 2024
6aee0bc
Merge pull request #2415 from eugeneo:test-bazel-module
copybara-github Dec 19, 2024
d21b729
Merge branch 'google:master' into unroller-const-input
eugeneo Dec 19, 2024
6e6a429
Make the intention behind test more explicit
eugeneo Dec 19, 2024
571e9b2
Merge pull request #2416 from eugeneo:unroller-const-input
copybara-github Dec 19, 2024
9aa447e
update test thresholds
jan-wassenberg Dec 20, 2024
a7ee535
Merge branch 'google:master' into hwy_min_max_mag_101024
johnplatts Dec 29, 2024
f754bd6
Merge pull request #2353 from johnplatts:hwy_min_max_mag_101024
copybara-github Dec 29, 2024
306e46d
Merge pull request #2362 from lsrcz:concat
copybara-github Dec 31, 2024
e892ab4
Fixes to RVV Concat/Combine ops
johnplatts Dec 31, 2024
3a28dcb
Add VQSORT_COMPILER_COMPATIBLE, split from VQSORT_ENABLED
Mousius Jan 2, 2025
ac31826
Merge pull request #2213 from malaterre:allow-avx2-x86
copybara-github Jan 6, 2025
a219783
Merge pull request #2421 from Mousius:vqsort-compiler-compatible-check
copybara-github Jan 6, 2025
2056a41
Merge pull request #2420 from johnplatts:hwy_rvv_concat_fix_123124
copybara-github Jan 6, 2025
fdfce1f
fix warnings "unused parameter 'd'"
eustas Jan 7, 2025
fecd465
No longer require highway.h for profiler.h
jan-wassenberg Jan 7, 2025
c8c3f5e
update thresholds to account for a possible L4. Thanks @miladfarca, f…
jan-wassenberg Jan 9, 2025
bcf0155
Bump step-security/harden-runner from 2.10.2 to 2.10.3
dependabot[bot] Jan 10, 2025
4a0a5b5
fix emu128 reduction with infinities. thanks @yohanchatelain, fixes #…
jan-wassenberg Jan 13, 2025
84b9f61
Merge pull request #2438 from google:dependabot/github_actions/step-s…
copybara-github Jan 14, 2025
cd56bbc
Defer the call to get timer resolution until needed
kccqzy Jan 13, 2025
87848c4
add warning if TSC is not invariant
jan-wassenberg Jan 14, 2025
7cccd1b
Merge pull request #2440 from kccqzy:timer-resolution-defer-2
copybara-github Jan 15, 2025
fdf177d
hwy-contrib/thread_pool: Replace size check assert with skip.
a-googler Jan 17, 2025
a811732
add HWY_UNREACHABLE and add documentation for related macros
jan-wassenberg Jan 17, 2025
dcc0ca1
Fix for GCC 15 compiler error on PPC8/PPC9/PPC10
johnplatts Jan 17, 2025
070bc1f
Added PositiveInfOrHighestValue and NegativeInfOrLowestValue
johnplatts Jan 17, 2025
758ec70
Fix for compiler error with GCC 9 or earlier
johnplatts Jan 17, 2025
a372d95
Merge pull request #2447 from johnplatts:hwy_gcc9_fix_011725
copybara-github Jan 20, 2025
cf6a122
Bump step-security/harden-runner from 2.10.3 to 2.10.4
dependabot[bot] Jan 20, 2025
6c6b289
Merge pull request #2445 from johnplatts:hwy_ppc_gcc15_fix_011725
copybara-github Jan 20, 2025
f2209b9
Merge pull request #2448 from google:dependabot/github_actions/step-s…
copybara-github Jan 20, 2025
fc384ee
warning fix (unused param)
jan-wassenberg Jan 20, 2025
21a6bb0
Copybara import of the project:
scuzqy Jan 21, 2025
aec1978
Merge branch 'master' into hwy_reduce_enh_011725
johnplatts Jan 21, 2025
b5dd1c4
Merge pull request #2446 from johnplatts:hwy_reduce_enh_011725
copybara-github Jan 22, 2025
0b69663
fix incompatibility with Windows macro, fixes #2450, thanks @scuzqy
jan-wassenberg Jan 22, 2025
9481efb
SVE can load a uint8 pointer directly into a mask through casting
wbb-ccl Jan 27, 2025
6f92a4f
Restrict compiler versions
wbb-ccl Jan 27, 2025
9c8e963
no longer require opt-in for AVX3_DL
jan-wassenberg Jan 28, 2025
960f74d
no longer require opt-in for AVX3_DL
Jan 28, 2025
ce464bc
Simplify compiler check
wbb-ccl Jan 28, 2025
9e30869
Promote and round operations
mazimkhan Nov 10, 2024
ade9ee9
Add quick reference for MaskedConvertToOrZero
mazimkhan Nov 22, 2024
5c9c45b
MaskedConvertToOrZero implementation for Arm SVE
mazimkhan Nov 22, 2024
7d9079b
Fix review comments
wbb-ccl Jan 24, 2025
f3facdd
Expand condition to avoid unreachable statements
wbb-ccl Jan 28, 2025
bb045cc
Float operations SqrtLower, MulSubAdd, GetExponent etc.
mazimkhan Nov 15, 2024
e3c6c3b
Fix review comments
wbb-ccl Jan 24, 2025
83f183d
Remove new ops that only have a generic implementation
wbb-ccl Jan 28, 2025
d547f91
Merge pull request #2425 from cambridgeconsultants:cc_up_float_operat…
copybara-github Jan 28, 2025
acd4f09
Fix bool_lanes typing
wbb-ccl Jan 28, 2025
4d81fed
Handle unused parameter
wbb-ccl Jan 29, 2025
9cb44bc
Merge pull request #2424 from cambridgeconsultants:cc_up_convert_prom…
copybara-github Jan 29, 2025
cdd64d2
MulRound, MulLower and MulAddLower ops
mazimkhan Nov 16, 2024
c264695
Fix review comments
wbb-ccl Jan 28, 2025
ec5b0aa
Remove redundant macro
wbb-ccl Jan 29, 2025
b18b71d
Merge pull request #2429 from cambridgeconsultants:cc_up_mul_and_arit…
copybara-github Jan 29, 2025
42feb2f
Made changes to SVE_256 and SVE2_128 target detection
johnplatts Jan 29, 2025
df90905
Masked compare and floating point classifications
mazimkhan Nov 15, 2024
41ff090
Fix review comments
wbb-ccl Jan 28, 2025
e8a840c
Remove MaskedIsInf & MaskedIsFinite
wbb-ccl Jan 29, 2025
ecb2f36
Load/Store, masked set and counting operations
mazimkhan Nov 15, 2024
2965981
Merge pull request #2456 from johnplatts:hwy_sve_detect_enh_012925
copybara-github Jan 30, 2025
6b90d90
Fix review comments
wbb-ccl Jan 30, 2025
4ef10e1
Improve handling float_16 in TestInsertIntoUpper
wbb-ccl Jan 30, 2025
a74a04d
Remove OrZero suffix
wbb-ccl Jan 30, 2025
6fe29f9
Clarify AllBits0/1
wbb-ccl Jan 30, 2025
ec0da10
Correct typo
wbb-ccl Jan 30, 2025
8e1476f
Add fake MulOdd to scalar
eustas Jan 30, 2025
e96d4d3
Merge pull request #2430 from cambridgeconsultants:cc_up_set_load_sto…
copybara-github Jan 30, 2025
a74db0c
Merge pull request #2427 from cambridgeconsultants:cc_up_masked_compare
copybara-github Jan 31, 2025
45a3804
Added GetBiasedExponent op and AVX3 implementation of GetExponent
johnplatts Jan 31, 2025
140f307
Merge pull request #2453 from cambridgeconsultants:cc_up_LoadMaskBits…
copybara-github Jan 31, 2025
e99df68
Merge pull request #2458 from johnplatts:hwy_get_exp_enh_013125
copybara-github Jan 31, 2025
046dee2
Zero masked arithmetic operations
mazimkhan Nov 15, 2024
329e50a
Fix review comments
wbb-ccl Jan 24, 2025
2ed7aab
Fix bool_lanes typing
wbb-ccl Jan 29, 2025
5c590e9
Remove unused code
wbb-ccl Jan 30, 2025
2fbf397
Fix broken rebase
wbb-ccl Jan 30, 2025
b4c66f7
Correct missing doc args
wbb-ccl Feb 3, 2025
3df6b16
Rename variable to avoid shadowing
wbb-ccl Feb 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/build_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ jobs:

steps:
- name: Harden Runner
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
uses: step-security/harden-runner@cb605e52c26070c328afc4562f0b4ada7618a84e # v2.10.4
with:
egress-policy: audit # cannot be block - runner does git checkout

Expand Down Expand Up @@ -230,7 +230,7 @@ jobs:

steps:
- name: Harden Runner
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
uses: step-security/harden-runner@cb605e52c26070c328afc4562f0b4ada7618a84e # v2.10.4
with:
egress-policy: audit # cannot be block - runner does git checkout

Expand Down Expand Up @@ -313,7 +313,7 @@ jobs:

steps:
- name: Harden Runner
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
uses: step-security/harden-runner@cb605e52c26070c328afc4562f0b4ada7618a84e # v2.10.4
with:
egress-policy: audit # cannot be block - runner does git checkout

Expand All @@ -334,15 +334,15 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Harden Runner
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
uses: step-security/harden-runner@cb605e52c26070c328afc4562f0b4ada7618a84e # v2.10.4
with:
egress-policy: audit # cannot be block - runner does git checkout

- uses: actions/checkout@8ade135a41bc03ea155e62e844d188df1ea18608 # v4.0.0

- uses: bazelbuild/setup-bazelisk@b39c379c82683a5f25d34f0d062761f62693e0b2 # v3.0.0

- uses: actions/cache@0c45773b623bea8c8e75f6c82b208c3cf94ea4f9 # v4.0.2
- uses: actions/cache@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
with:
path: ~/.cache/bazel
key: bazel-${{ runner.os }}
Expand Down
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
build
bazel-bin
bazel-highway
bazel-out
bazel-testlogs
MODULE.bazel.lock
docs/g3doc/*
docs/html/*
docs/md/*
Expand Down
17 changes: 17 additions & 0 deletions BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,7 @@ cc_library(
"hwy/ops/x86_128-inl.h",
"hwy/ops/x86_256-inl.h",
"hwy/ops/x86_512-inl.h",
"hwy/ops/x86_avx3-inl.h",
# Select avoids recompiling native arch if only non-native changed
] + select({
":compiler_emscripten": [
Expand Down Expand Up @@ -255,6 +256,19 @@ cc_library(
],
)

cc_library(
name = "perf_counters",
srcs = ["hwy/perf_counters.cc"],
hdrs = ["hwy/perf_counters.h"],
compatible_with = [],
copts = COPTS,
deps = [
":bit_set",
":hwy",
":nanobenchmark",
],
)

cc_library(
name = "profiler",
hdrs = [
Expand Down Expand Up @@ -485,6 +499,7 @@ HWY_TESTS = [
("hwy/", "bit_set_test"),
("hwy/", "highway_test"),
("hwy/", "nanobenchmark_test"),
("hwy/", "perf_counters_test"),
("hwy/", "targets_test"),
("hwy/tests/", "arithmetic_test"),
("hwy/tests/", "bit_permute_test"),
Expand Down Expand Up @@ -513,6 +528,7 @@ HWY_TESTS = [
("hwy/tests/", "mask_combine_test"),
("hwy/tests/", "mask_convert_test"),
("hwy/tests/", "mask_mem_test"),
("hwy/tests/", "mask_set_test"),
("hwy/tests/", "mask_slide_test"),
("hwy/tests/", "mask_test"),
("hwy/tests/", "masked_arithmetic_test"),
Expand Down Expand Up @@ -562,6 +578,7 @@ HWY_TEST_DEPS = [
":math",
":matvec",
":nanobenchmark",
":perf_counters",
":random",
":skeleton",
":thread_pool",
Expand Down
58 changes: 31 additions & 27 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -59,33 +59,6 @@ if(CHECK_PIE_SUPPORTED)
endif()
endif()

if (CMAKE_CXX_COMPILER_ARCHITECTURE_ID MATCHES "RISCV32|RISCV64|RISCV128" OR CMAKE_SYSTEM_PROCESSOR MATCHES "riscv32|riscv64|riscv128")
include(CheckCSourceCompiles)
check_c_source_compiles("
#if __riscv_xlen == 64
int main() { return 0; }
#else
#error Not RISCV-64
#endif
" IS_RISCV_XLEN_64)

check_c_source_compiles("
#if __riscv_xlen == 32
int main() { return 0; }
#else
#error Not RISCV-32
#endif
" IS_RISCV_XLEN_32)

if(IS_RISCV_XLEN_32)
set(RISCV_XLEN 32)
elseif(IS_RISCV_XLEN_64)
set(RISCV_XLEN 64)
else()
message(WARNING "Unable to determine RISC-V XLEN")
endif()
endif()

include(GNUInstallDirs)

if (NOT CMAKE_BUILD_TYPE)
Expand Down Expand Up @@ -163,6 +136,33 @@ check_cxx_source_compiles(
HWY_RISCV
)

if (HWY_RISCV OR CMAKE_CXX_COMPILER_ARCHITECTURE_ID MATCHES "RISCV32|RISCV64|RISCV128" OR CMAKE_SYSTEM_PROCESSOR MATCHES "riscv32|riscv64|riscv128")
include(CheckCSourceCompiles)
check_c_source_compiles("
#if __riscv_xlen == 64
int main() { return 0; }
#else
#error Not RISCV-64
#endif
" IS_RISCV_XLEN_64)

check_c_source_compiles("
#if __riscv_xlen == 32
int main() { return 0; }
#else
#error Not RISCV-32
#endif
" IS_RISCV_XLEN_32)

if(IS_RISCV_XLEN_32)
set(RISCV_XLEN 32)
elseif(IS_RISCV_XLEN_64)
set(RISCV_XLEN 64)
else()
message(WARNING "Unable to determine RISC-V XLEN")
endif()
endif()

if (HWY_ENABLE_CONTRIB)
# Glob all the traits so we don't need to modify this file when adding
# additional special cases.
Expand Down Expand Up @@ -219,6 +219,7 @@ set(HWY_SOURCES
hwy/ops/x86_128-inl.h
hwy/ops/x86_256-inl.h
hwy/ops/x86_512-inl.h
hwy/ops/x86_avx3-inl.h
hwy/per_target.h
hwy/print-inl.h
hwy/print.h
Expand All @@ -235,6 +236,7 @@ if (NOT HWY_CMAKE_HEADER_ONLY)
hwy/aligned_allocator.cc
hwy/nanobenchmark.cc
hwy/per_target.cc
hwy/perf_counters.cc
hwy/print.cc
hwy/targets.cc
hwy/timer.cc
Expand Down Expand Up @@ -717,6 +719,7 @@ set(HWY_TEST_FILES
hwy/bit_set_test.cc
hwy/highway_test.cc
hwy/nanobenchmark_test.cc
hwy/perf_counters_test.cc
hwy/targets_test.cc
hwy/examples/skeleton_test.cc
hwy/tests/arithmetic_test.cc
Expand Down Expand Up @@ -746,6 +749,7 @@ set(HWY_TEST_FILES
hwy/tests/mask_combine_test.cc
hwy/tests/mask_convert_test.cc
hwy/tests/mask_mem_test.cc
hwy/tests/mask_set_test.cc
hwy/tests/mask_slide_test.cc
hwy/tests/mask_test.cc
hwy/tests/masked_arithmetic_test.cc
Expand Down
2 changes: 1 addition & 1 deletion MODULE.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ module(
)

bazel_dep(name = "bazel_skylib", version = "1.6.1")
bazel_dep(name = "googletest", version = "1.15.2")
bazel_dep(name = "googletest", version = "1.15.2", repo_name = "com_google_googletest")
bazel_dep(name = "rules_cc", version = "0.0.9")
bazel_dep(name = "rules_license", version = "0.0.7")
bazel_dep(name = "platforms", version = "0.0.10")
5 changes: 5 additions & 0 deletions g3doc/impl_details.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,11 @@ For ZVector targets `HWY_Z14`, `HWY_Z15`, `HWY_Z16`, there is the
(requires IBMid login), plus a
[searchable reference](https://www.ibm.com/docs/en/zos/2.5.0?topic=topics-using-vector-programming-support).

For LoongArch, there is a
[list of intrinsics](https://jia.je/unofficial-loongarch-intrinsics-guide/lsx/integer_computation/)
and
[ISA reference](https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html).

## Why scalar target

There can be various reasons to avoid using vector intrinsics:
Expand Down
Loading
Loading