Skip to content

Commit

Permalink
Minor: Document SIMD rationale and tips (#6554)
Browse files Browse the repository at this point in the history
* Minor: Document SIMD rationale and tips

* Apply suggestions from code review

Co-authored-by: Ed Seidl <[email protected]>
Co-authored-by: Piotr Findeisen <[email protected]>

* More review feedback

* tweak

* Update arrow/CONTRIBUTING.md

* Update arrow/CONTRIBUTING.md

* clarify inlining more

* formating

---------

Co-authored-by: Ed Seidl <[email protected]>
Co-authored-by: Piotr Findeisen <[email protected]>
  • Loading branch information
3 people authored Oct 17, 2024
1 parent 9d06019 commit 9485897
Showing 1 changed file with 36 additions and 0 deletions.
36 changes: 36 additions & 0 deletions arrow/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,42 @@ specific JIRA issues and reference them in these code comments. For example:
// This is not sound because .... see https://issues.apache.org/jira/browse/ARROW-nnnnn
```

### Usage of SIMD / auto vectorization

This crate does not use SIMD intrinsics (e.g. [`std::simd`]) directly, but
instead relies on the Rust compiler's auto-vectorization capabilities, which are
built on LLVM.

SIMD intrinsics are difficult to maintain and can be difficult to reason about.
The auto-vectorizer in LLVM is quite good and often produces kernels that are
faster than using hand-written SIMD intrinsics. This crate used to contain
several kernels with hand-written SIMD instructions, which were removed after
discovering the auto-vectorized code was faster.

[`std::simd`]: https://doc.rust-lang.org/std/simd/index.html

#### Tips for auto vectorization

LLVM is relatively good at vectorizing vertical operations provided:

1. No conditionals within the loop body (e.g no checking for nulls on each row)
2. Not too much inlining (judicious use of `#[inline]` and `#[inline(never)]`) as the vectorizer gives up if the code is too complex
3. No [horizontal reductions] or data dependencies
4. Suitable SIMD instructions available in the target ISA (e.g. `target-cpu` `RUSTFLAGS` flag)

[horizontal reductions]: https://rust-lang.github.io/packed_simd/perf-guide/vert-hor-ops.html

The last point is especially important as the default `target-cpu` doesn't
support many SIMD instructions. See the Performance Tips section at the
end of <https://crates.io/crates/arrow>

To ensure your code is fully vectorized, we recommend using tools like
<https://rust.godbolt.org/> (again being sure `RUSTFLAGS` is set appropriately)
to analyze the resulting code, and only once you've exhausted auto vectorization
think of reaching for manual SIMD. Generally the hard part of vectorizing code
is structuring the algorithm in such a way that it can be vectorized, regardless
of what generates those instructions.

# Releases and publishing to crates.io

Please see the [release](../dev/release/README.md) for details on how to create arrow releases

0 comments on commit 9485897

Please sign in to comment.