Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research on aarch64 support #23

Closed
wants to merge 1 commit into from

Conversation

chadbrewbaker
Copy link

I did some initial research on adding aarch64 support. Many neon intrinsics seem missing from core::arch::aarch64 and will need to be implemented locally until upstreamed.

@hkratz
Copy link
Contributor

hkratz commented Apr 26, 2021

Yeah, I was afraid of that. Since the support of aarch64 intrinsics is nightly-only anyway we might be able to get them upstreamed quickly though.

Related issues are #2 and #3.

@hkratz hkratz marked this pull request as draft April 26, 2021 09:04
@ArniDagur
Copy link

Since the ARM intrinsics are not yet available outside of nightly, wouldn't it make sense to use C FFI? It could be an on-by-default feature to enable support for build environments without a C compiler.

@hkratz
Copy link
Contributor

hkratz commented Apr 27, 2021

Since the ARM intrinsics are not yet available outside of nightly, wouldn't it make sense to use C FFI? It could be an on-by-default feature to enable support for build environments without a C compiler.

Interesting idea, not sure if it should be on be default on arm though. For now we could call into simdjson behind an experimental_aarch64_simdjson feature flag depending on a to-be-created crate simdjson-utf8-sys. Maybe also add a second entry point to the simdjson code to support compat checking.

@chadbrewbaker
Copy link
Author

After playing with it yesterday, I think C FFI is the best route forward to a working implementation for aarch64. Working on updating the PR now with a proof of concept. What we really need is a tool that consumes <arm_neon.h> and auto-generates all Rust intrinsics from it,

@hkratz
Copy link
Contributor

hkratz commented Apr 27, 2021

@chadbrewbaker Looking forward to it!

Meanwhile I have benchmarked Rust std vs simdjson validation on a Pi 4:
image
It is interesting that the scalar implementation in the std library is faster than simdjson on the Pi 4 for ASCII. Though the results will most likely be very different for the Apple M1.

@chadbrewbaker
Copy link
Author

For sse4 and neon, if I am writing a C FFI version, what would be a good Rust driver API? Goal is to keep as many operating system calls as possible out of the FFI. That might mean having the C code embed some inline Rust around those OS calls. The code would be useful for benchmarking/testing even after Rust fully supports aarch64 intrinsics as every release of Apple's M chip for the foreseeable future will ship with C optimized intrinsics.

@hkratz
Copy link
Contributor

hkratz commented Apr 27, 2021

For sse4 and neon, if I am writing a C FFI version, what would be a good Rust driver API?

Not sure how you want to do this. For a C implementation there should be just two main entry points both taking a pointer and the length of the string e.g. validate_utf8_native_basic() returning a bool and validate_utf8_native_compat() and returning an u64 with the start of the failing SIMD chunk (encode success as u64::MAX).

Calling into the C just for the intrinsics would be too slow.

@chadbrewbaker
Copy link
Author

(enum, u64)? {VALID, IO_ERROR, VALID_PREFIX (when the end of the buffer is a valid UTF8 prefix), INVALID}

@ArniDagur
Copy link

Since the SIMD code doesn't do IO, there wouldn't be any IO errors. Here is the compat error type:

/// UTF-8 error information compatible with [`std::str::Utf8Error`].
///
/// Contains information on the location of the encountered validation error and the length of the
/// invalid UTF-8 sequence.
#[derive(Copy, Eq, PartialEq, Clone, Debug)]
pub struct Utf8Error {
    pub(crate) valid_up_to: usize,
    pub(crate) error_len: Option<u8>,
}

and compat error type

/// Simple zero-sized UTF-8 error.
///
/// No information is provided where the error occured or how long the invalid byte
/// byte sequence is.
#[derive(Copy, Eq, PartialEq, Clone, Debug)]
pub struct Utf8Error;

@hkratz
Copy link
Contributor

hkratz commented Apr 27, 2021

Also the C implementation does not have to figure out the exact error location, if the index of the failing 64-byte block is known implementation::get_compat_error() can be used to derive the compat error.

@hkratz
Copy link
Contributor

hkratz commented Apr 28, 2021

Just a quick heads up: I have totally refactored the code so that one only has to implement SIMD primitives for each architecture. This should cut down the work porting it to arm massively... once the intrinsics are available.

@hkratz
Copy link
Contributor

hkratz commented Apr 28, 2021

I have a prototype implementation now in pure Rust, see #31.

@hkratz hkratz closed this Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants