prof of concept auto avx512 vectorization #1593

arturbac · 2025-02-04T21:42:20Z

DONT'T merge
just example prof of concept for #1591 for output of json_performance on Ryzen 9 9950X -march=znver5

normal code with avx2 code block enabled on znver5

Glaze json write: 0.338783 s, 1886.05 MB/s
Glaze json write: 0.337576 s, 1912.27 MB/s
Glaze json write: 0.231021 s, 3690.51 MB/s

auto vectorized 512 (except movemask_64 )
Glaze json write: 0.318704 s, 2004.88 MB/s
Glaze json write: 0.343197 s, 1880.95 MB/s
Glaze json write: 0.219827 s, 3878.44 MB/s

stephenberry · 2025-02-05T18:07:29Z

The vector extensions only work for Clang and GCC, right? How would you recommend supporting MSVC?

arturbac · 2025-02-05T19:33:48Z

The vector extensions only work for Clang and GCC, right? How would you recommend supporting MSVC?

As i mentioned earlier - wrap and simulate

https://devblogs.microsoft.com/cppblog/avx-512-auto-vectorization-in-msvc/

#if defined(__clang__) 
    using uint64x8_t = uint64_t __attribute__((__vector_size__(64)));


    inline auto as_uint64x8( __m512i v ) -> uint64x8_t { return v; }
    inline auto as_m512i( uint64x8_t v ) -> __m512i { return v; }

#elif defined(__GNUC__)
    using uint64x8_t = uint64_t __attribute__((__vector_size__(64)));
    inline auto as_uint64x8( __m512i v ) -> uint64x8_t { return (uint64x8_t)v; }
    inline auto as_m512i( uint64x8_t v ) -> __m512i { return (__m512i)v; }
#else
union uint64x8_t // or memcpy
{
 alignas(64) uint64_t value[8];
 __m512i v512;
};
[[msvc::forceinline]]
inline auto as_uint64x8( __m512i v ) -> uint64x8_t 
    { 
        uint64x8_t res;
        res.v512 = v;
        return res;
         }
[[msvc::forceinline]]
inline auto as_m512i( uint64x8_t v ) -> __m512i { 
    return v.v512;
     }

auto operator &( uint64x8_t a, uint64x8_t b ) 
    {
    uint64x8_t res;
   res.v512 =_mm512_and_epi64(a.v512,b.v512);
   return res;
  }
#endif

arturbac · 2025-02-08T18:21:52Z

I am not sure it is worth doing so.
When I checked on znver5 generic uint64 version performance it is almost same as optimized versions some cases better some worse.

Glaze json write: 0.336473 s, 1899 MB/s
Glaze json write: 0.390721 s, 1650.43 MB/s
Glaze json write: 0.200108 s, 4260.62 MB/s

prof of concept auto avx512 vectorization

585b1e6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prof of concept auto avx512 vectorization #1593

prof of concept auto avx512 vectorization #1593

arturbac commented Feb 4, 2025 •

edited

Loading

stephenberry commented Feb 5, 2025

arturbac commented Feb 5, 2025

arturbac commented Feb 8, 2025

prof of concept auto avx512 vectorization #1593

Are you sure you want to change the base?

prof of concept auto avx512 vectorization #1593

Conversation

arturbac commented Feb 4, 2025 • edited Loading

stephenberry commented Feb 5, 2025

arturbac commented Feb 5, 2025

arturbac commented Feb 8, 2025

arturbac commented Feb 4, 2025 •

edited

Loading