Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Portable version #1

Open
nemequ opened this issue Feb 17, 2016 · 9 comments
Open

Portable version #1

nemequ opened this issue Feb 17, 2016 · 9 comments

Comments

@nemequ
Copy link
Contributor

nemequ commented Feb 17, 2016

I would love to see a portable version instead of relying directly on SSE. Perhaps using OpenMP 4's SIMD extensions (http://primeurmagazine.com/repository/PrimeurMagazine-AE-PR-12-14-32.pdf is a decent introduction)?

Even if it's just a slow portable fallback for platforms without SSE, it could still be useful.

@ConorStokes
Copy link
Owner

I'll have to look into the extensions to see if they offer the needed functionality for a fallback. Otherwise, I might at some stage port to other platforms directly if they're suitable.

@nemequ
Copy link
Contributor Author

nemequ commented Feb 18, 2016

Based on my (very) brief look at the code I'm pretty confident they do, but to be clear the main point is that I would very much like an implementation that will run everywhere, even if it's slow; OpenMP 4 just (hopefully) provides a way to make it reasonably fast.

@nemequ
Copy link
Contributor Author

nemequ commented Apr 22, 2017

I just created a new LZSSE-SIMDe (friendly) fork which uses SIMDe to let LZSSE run where SSE4.1 isn't.

SIMDe is still under heavy development; I haven't even started working on optimizing it, and I've only tested a few compilers (recent versions of GCC, clang, and PGI). It will probably be a while before I'm ready to make a PR, but I wanted to make you aware of the work.

@ConorStokes
Copy link
Owner

Thanks for making me aware, will be interested to see the results.

@nemequ
Copy link
Contributor Author

nemequ commented Apr 28, 2017

I just noticed that LZSSE doesn't work on 32-bit, even if the CPU supports SSE 4.1, is that intentional? Just replacing the calls to _mm_cvtsi64_si128 would be enough to get it compiling (I haven't actually tested that, but it works fine in SIMDe where we emulate that call, as well as some other 64-bit specific functions, on 32-bit CPUs).

The default block size would also need to be reduced, otherwise malloc will fail. It would probably also be a good idea to verify that make sure bufferSize * sizeof(Arrival) doesn't overflow size_t…

FWIW, with a reduced block size LZSSE-SIMDe works on ARM (a Raspberry Pi 2).

@ConorStokes
Copy link
Owner

Yes, it's intentional that it doesn't work on 32bit, it was a conscious decision to exploit the wider/larger number of registers. It sounds like the SIMDe version is a good path to supporting 32bit as well.

It's a good point about the default block size in the example/verifying we don't overflow size_t. At some stage I was considering using a more limited size arrivals array and incremental output to reduce memory overhead, (although, I think the best use case of LZSSE's optimal parse is offline compression for fast decompression that will happen many times and that would be counter to that slightly).

It's fantastic progress to get something working on ARM!

@nemequ
Copy link
Contributor Author

nemequ commented Jun 30, 2020

I kind of forgot about this for a while, but SIMDe has been plugging along, particularly lately. It should work better now, and be much faster on non-SSE4.1 CPUs. There are lots more NEON implementations now, plus quite a few AltiVec and WebAssembly implementations. The README also has some updated benchmarking figures which are mildly interesting.

The "native aliases" support has also improved to the point where I'm comfortable using it, which has reduced the diff to practically nothing, and with the new simde-no-tests the submodule is a much more reasonable size.

If you're interested in merging this into LZSSE I can submit a PR (or, of course you can simply pull from my repo). I kept the README patch separate since I'm guessing you wouldn't want that. If you don't like submodules we do also have an amalgamated header.

If you don't want to use SIMDe, the LZSSE-SIMDe repo is still around for people who need it. Either way, LZSSE was has been a great test for SIMDe, so thanks :)

@ConorStokes
Copy link
Owner

That sounds good, I'm excited to have a look. How hard would it be to keep SIMDe as an optional dependency so that it was only required for those platforms?

I must admit I haven't had a chance to look much at LZSSE for a while either, although I do have an idea for a new version.

@nemequ
Copy link
Contributor Author

nemequ commented Jun 30, 2020

That sounds good, I'm excited to have a look. How hard would it be to keep SIMDe as an optional dependency so that it was only required for those platforms?

Not hard. You could just use an ifdef with something like:

#if defined(LZSSE_USE_SIMDE)
  #define SIMDE_ENABLE_NATIVE_ALIASES
  #include <simde/x86/sse4.1.h>
#else
  #include <smmintrin.h>
#endif

It would likely require people to use -I to specify the include directory, unless they have simde installed system-wide (there is a simde-dev package on debian, and a simde-devel on Fedora 33).

To be clear, the only real advantage here is you don't have to include a copy of SIMDe. There is no penalty for using SIMDe if you don't need it; it will just call the native functions so it doesn't make the code slower, just more portable.

I must admit I haven't had a chance to look much at LZSSE for a while either, although I do have an idea for a new version.

Nice. Hopefully you have some time to implement it soon :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants