From 9485897ccb6da955a3efeba84e552e85d4efaa20 Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Thu, 17 Oct 2024 06:45:37 -0400 Subject: [PATCH] Minor: Document SIMD rationale and tips (#6554) * Minor: Document SIMD rationale and tips * Apply suggestions from code review Co-authored-by: Ed Seidl Co-authored-by: Piotr Findeisen * More review feedback * tweak * Update arrow/CONTRIBUTING.md * Update arrow/CONTRIBUTING.md * clarify inlining more * formating --------- Co-authored-by: Ed Seidl Co-authored-by: Piotr Findeisen --- arrow/CONTRIBUTING.md | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/arrow/CONTRIBUTING.md b/arrow/CONTRIBUTING.md index 0c795d6b9cbd..a9a9426a42a5 100644 --- a/arrow/CONTRIBUTING.md +++ b/arrow/CONTRIBUTING.md @@ -109,6 +109,42 @@ specific JIRA issues and reference them in these code comments. For example: // This is not sound because .... see https://issues.apache.org/jira/browse/ARROW-nnnnn ``` +### Usage of SIMD / auto vectorization + +This crate does not use SIMD intrinsics (e.g. [`std::simd`]) directly, but +instead relies on the Rust compiler's auto-vectorization capabilities, which are +built on LLVM. + +SIMD intrinsics are difficult to maintain and can be difficult to reason about. +The auto-vectorizer in LLVM is quite good and often produces kernels that are +faster than using hand-written SIMD intrinsics. This crate used to contain +several kernels with hand-written SIMD instructions, which were removed after +discovering the auto-vectorized code was faster. + +[`std::simd`]: https://doc.rust-lang.org/std/simd/index.html + +#### Tips for auto vectorization + +LLVM is relatively good at vectorizing vertical operations provided: + +1. No conditionals within the loop body (e.g no checking for nulls on each row) +2. Not too much inlining (judicious use of `#[inline]` and `#[inline(never)]`) as the vectorizer gives up if the code is too complex +3. No [horizontal reductions] or data dependencies +4. Suitable SIMD instructions available in the target ISA (e.g. `target-cpu` `RUSTFLAGS` flag) + +[horizontal reductions]: https://rust-lang.github.io/packed_simd/perf-guide/vert-hor-ops.html + +The last point is especially important as the default `target-cpu` doesn't +support many SIMD instructions. See the Performance Tips section at the +end of + +To ensure your code is fully vectorized, we recommend using tools like + (again being sure `RUSTFLAGS` is set appropriately) +to analyze the resulting code, and only once you've exhausted auto vectorization +think of reaching for manual SIMD. Generally the hard part of vectorizing code +is structuring the algorithm in such a way that it can be vectorized, regardless +of what generates those instructions. + # Releases and publishing to crates.io Please see the [release](../dev/release/README.md) for details on how to create arrow releases