Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prof of concept auto avx512 vectorization #1593

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

arturbac
Copy link
Contributor

@arturbac arturbac commented Feb 4, 2025

DONT'T merge
just example prof of concept for #1591 for output of json_performance on Ryzen 9 9950X -march=znver5

normal code with avx2 code block enabled on znver5

Glaze json write: 0.338783 s, 1886.05 MB/s
Glaze json write: 0.337576 s, 1912.27 MB/s
Glaze json write: 0.231021 s, 3690.51 MB/s

auto vectorized 512 (except movemask_64 )
Glaze json write: 0.318704 s, 2004.88 MB/s
Glaze json write: 0.343197 s, 1880.95 MB/s
Glaze json write: 0.219827 s, 3878.44 MB/s

@stephenberry
Copy link
Owner

The vector extensions only work for Clang and GCC, right? How would you recommend supporting MSVC?

@arturbac
Copy link
Contributor Author

arturbac commented Feb 5, 2025

The vector extensions only work for Clang and GCC, right? How would you recommend supporting MSVC?

As i mentioned earlier - wrap and simulate

https://devblogs.microsoft.com/cppblog/avx-512-auto-vectorization-in-msvc/

#if defined(__clang__) 
    using uint64x8_t = uint64_t __attribute__((__vector_size__(64)));


    inline auto as_uint64x8( __m512i v ) -> uint64x8_t { return v; }
    inline auto as_m512i( uint64x8_t v ) -> __m512i { return v; }

#elif defined(__GNUC__)
    using uint64x8_t = uint64_t __attribute__((__vector_size__(64)));
    inline auto as_uint64x8( __m512i v ) -> uint64x8_t { return (uint64x8_t)v; }
    inline auto as_m512i( uint64x8_t v ) -> __m512i { return (__m512i)v; }
#else
union uint64x8_t // or memcpy
{
 alignas(64) uint64_t value[8];
 __m512i v512;
};
[[msvc::forceinline]]
inline auto as_uint64x8( __m512i v ) -> uint64x8_t 
    { 
        uint64x8_t res;
        res.v512 = v;
        return res;
         }
[[msvc::forceinline]]
inline auto as_m512i( uint64x8_t v ) -> __m512i { 
    return v.v512;
     }

auto operator &( uint64x8_t a, uint64x8_t b ) 
    {
    uint64x8_t res;
   res.v512 =_mm512_and_epi64(a.v512,b.v512);
   return res;
  }
#endif 

@arturbac
Copy link
Contributor Author

arturbac commented Feb 8, 2025

I am not sure it is worth doing so.
When I checked on znver5 generic uint64 version performance it is almost same as optimized versions some cases better some worse.

Glaze json write: 0.336473 s, 1899 MB/s
Glaze json write: 0.390721 s, 1650.43 MB/s
Glaze json write: 0.200108 s, 4260.62 MB/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants