-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mapping a zipped iterator to produce tuples #40
Comments
That's definitely on the roadmap. I was working on something similar around the 0.4.0 release, actually. As a stopgap, have you tried using the standard |
Ah! I think that's kind of surprising how the SIMD iterators implement Eventually I managed to write this: pub fn dot<K: AsKetRef>(self, other: K) -> Rect {
use ::faster::prelude::*;
let other = other.as_ket_ref();
assert_eq!(self.real.len(), other.real.len());
assert_eq!(self.real.len(), other.imag.len());
assert_eq!(self.real.len(), self.imag.len());
let zero = (f64s(0.0), f64s(0.0));
let add = |(ar, ai): (f64s, f64s), (br, bi): (f64s, f64s)| {
(ar + br, ai + bi)
};
// Performs `a.conj * b` on simd vectors
let simd_map_func = |(ar, ai, br, bi): (f64s, f64s, f64s, f64s)| {
let real = ar * br + ai * bi;
let imag = ar * bi - ai * br;
(real, imag)
};
let mut iter = (
// (the defaults of zero here may show up in the unaligned remainder,
// where they will harmlessly "contribute" to the sum)
self.real.simd_iter(f64s(0.0)),
self.imag.simd_iter(f64s(0.0)),
other.real.simd_iter(f64s(0.0)),
other.imag.simd_iter(f64s(0.0)),
).zip();
let partial_sums = {
// deal with aligned elements
iter.by_ref()
.map(simd_map_func)
.fold(zero, add)
};
// deal with unaligned remainder
let leftovers = iter.end().map(|(vs, _)| vs).map(simd_map_func).unwrap_or(zero);
let (real, imag) = add(partial_sums, leftovers);
let (real, imag) = (real.sum(), imag.sum());
Rect { real, imag }
} It is pretty ugly (considering the original code was once 5 lines!), but it does gain back that much needed performance:
I am excited to hear that something to this effect is planned! I'm curious, does the design include a trait that could be implemented by external types besides tuples? e.g. I'm picturing making (That said, I also recall from earlier design attempts of my own that these sort of "structure-of-array" style traits are difficult to design, so I'm really just wondering whether you managed to crack that nut 😉) |
Eugh, sincerest apologies for abandoning that feature. Faster is really good at computations which spit out a single array right now, and the regular iterator functions are a stopgap until I figure out an intuitive way to do n outputs. It's possible that returning a user-definable trait might be the missing piece to the puzzle, though. I haven't approached it from that angle before and that should let me shift the burden of interpreting the data coming in and out of the closures to the user. |
Fair warning: More or less every time I've tried something like it (not for SIMD obviously, but really just any sort of "SoA <-> AoS" conversion), I've found myself journeying down a rabbit hole of design, as I'd keep finding more and more things that needed trait methods with no easy way to derive them or define them in terms of others. (But don't let that stop you! It could be that you have a few pieces of the puzzle that I lacked. 🙂) |
I prototyped an idea using an HList-like type to reduce boilerplate in custom user impls: https://github.com/ExpHP/rust-zip-simd-idea An example of a custom user type can be seen in the tests of My approach was to make the Unfortunately, I stopped once I got to the iterators as I was having trouble deciding how to separate the functionality there. It's like I said before; designing a feature like this is a terrible time and energy sink. Ideally one might dream of it leading to a more orthogonal and cleanly-separated API, but if you don't draw the lines correctly, you end up needing to patch it up with typelevel programming and etc. that end up making it more confusing and difficult to learn. But maybe it will give you some ideas. |
I have a function which looks vaguely like this:
Converting it to use
faster
requires two passes over the arrays; I am unable to produce bothreal
andimag
in one pass becausesimd_map
requires the function output to be a single vector:So is it faster? Well, actually, yes! It is plenty faster... up to a point:
Yikes! Once we hit 16384 elements there is almost no speedup!
I suspect it is because at this point, memory has become the bottleneck, and most of what was gained by using SIMD was lost by making two passes over the arrays. It would be nice to have an API that allowed this do be done in one pass by allowing a mapping function to return a tuple (producing a new
PackedZippedIterator
or similar).The text was updated successfully, but these errors were encountered: