Consolidate NaN/invalid handling in min/max/nanmin/nanmax. #120

jack-pappas · 2022-03-22T18:25:35Z

Fixes rtosholdings/riptable#87 and rtosholdings/riptable#175.

This change fixes the max/nanmax/min/nanmin reduction kernels so they propgate NaNs (max, min) or ignore NaNs (nanmax, nanmin). The changes also consolidate + streamline some logic; for example, the vector register type is now derived from the element type provided in the template argument vs. being passed as a separate argument.

This change also implements part of the support needed for these functions to (selectively) handle integer invalids. That could be exposed in the future as a keyword argument to these ufuncs. Before the logic will work correctly, we'll need to implement invalid-recognizing versions of the MIN_OP, MAX_OP, etc. functions used in the AVX2-vectorized versions of these kernels; the logic for them should be fairly similar to the logic used by the floating-point MIN_OP, FMIN_OP, etc. after this change.

Before merging, I'll run some before/after benchmarks + try to put some proper tests together to run in this repo.

Testing / demonstration code:

Setup

>>> arr = rt.FA([np.nan, 0.0, 0.1, 0.2, np.nan, 0.3], dtype=np.float64).repeat(100)
>>> arr
FastArray([nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3])

Before

>>> rt.max(arr)
0.3
>>> rt.nanmax(arr)
0.3
>>> rt.min(arr)
0.3
>>> rt.nanmin(arr)
0.0

>>> arr = rt.arange(1000).astype(np.float64)
>>> rt.max(arr)
999.0
>>> rt.nanmax(arr)
999.0
>>> rt.min(arr)
0.0
>>> rt.nanmin(arr)
0.0

After

>>> rt.max(arr)
nan
>>> rt.nanmax(arr)
0.3
>>> rt.min(arr)
nan
>>> rt.nanmin(arr)
0.0

>>> arr = rt.arange(1000).astype(np.float64)
>>> arr[501] = np.nan
>>> rt.max(arr)
nan
>>> rt.nanmax(arr)
999.0
>>> rt.min(arr)
nan
>>> rt.nanmin(arr)
0.0

>>> ints = rt.FA([0, 2, 1, 4, 3, 5], dtype=np.int8)
>>> rt.max(arr)
nan
>>> rt.nanmax(arr)
999.0
>>> rt.max(ints)
5
>>> rt.nanmax(ints)
5
>>> rt.min(ints)
0
>>> rt.nanmin(ints)
0

>>> intsinv = ints.copy()
>>> intsinv[3] = intsinv.inv
>>> rt.max(intsinv)
5
>>> rt.nanmax(intsinv)
5
>>> rt.min(intsinv)
-128
>>> rt.nanmin(intsinv)
-128

jack-pappas · 2022-03-22T18:34:23Z

This work should maybe also take a look at the nan-propagation behavior in the group-based versions of these functions to ensure consistent behavior. (Either in this PR or in a follow-up PR.)

staffantj · 2022-03-22T18:37:40Z

The failing AVX2 specializations might be better off as constexpr's - not sure if that will work out, but could be worth trying.

OrestZborowski-SIG · 2022-03-22T18:43:35Z

src/simd/avx2.h

@@ -1,4 +1,5 @@
-#pragma once


I'm curious as to the motivation for removing pragma once (which seems to be supported in all compilers we use?)

I was trying to diagnose a compilation failure with Clang and thought it might be related to this. I'll change it back.

Longer-term, the project should consolidate on one style -- the codebase currently has a mix of #pragma once and old-style include guards.

There are pathological cases where the pragma fails, which is why it's never been fully standardized.

OrestZborowski-SIG · 2022-03-22T18:44:41Z

src/simd/avx2.h

@@ -64,20 +65,20 @@ namespace riptide
         * @return T const& The result of the operation.
         */
        template <typename T>
-        T const & min_with_nan_passthru(T const & x, T const & y)
+        static T const & min_with_nan_passthru(T const & x, T const & y)


I'm also curious as to the motivation of making all these function templates static, as opposed to (implicitly) inline?

Implicit inlining from the template didn't seem to be enough to resolve ODR errors. I've forced them to be inlined now + marked the templated functions with static (but removed static from the explicit template specializations, because that's not allowed).

Yeah, it's a bit of a safeguard. The storage class has to match between the primary template and the specializations of that primary template. So they're only allowed on the primary now.

Per cppreference, , explicit specializations do not inherit the inline attribute from the primary function template, so they need to be explicitly made 'inline', which is likely the cause of the ODR violation.

…"static").

Consolidate NaN/invalid handling in min/max/nanmin/nanmax.

b2028bb

jack-pappas requested a review from a team March 22, 2022 18:25

OrestZborowski-SIG reviewed Mar 22, 2022

View reviewed changes

jack-pappas added 2 commits March 22, 2022 16:00

Explicit template specialization can't specify a storage class (e.g. …

3a49ed1

…"static").

Fix typedef based on a dependent name -- needs to use typename.

80ce0c3

jack-pappas mentioned this pull request Mar 23, 2022

Add compile options for clang-cl to allow it to compile riptide_cpp. #121

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consolidate NaN/invalid handling in min/max/nanmin/nanmax. #120

Consolidate NaN/invalid handling in min/max/nanmin/nanmax. #120

Uh oh!

jack-pappas commented Mar 22, 2022

Uh oh!

jack-pappas commented Mar 22, 2022

Uh oh!

staffantj commented Mar 22, 2022

Uh oh!

OrestZborowski-SIG Mar 22, 2022

Uh oh!

jack-pappas Mar 23, 2022

Uh oh!

staffantj Mar 23, 2022

Uh oh!

OrestZborowski-SIG Mar 22, 2022 •

edited

Loading

Uh oh!

jack-pappas Mar 23, 2022

Uh oh!

staffantj Mar 23, 2022

Uh oh!

OrestZborowski-SIG Mar 23, 2022

Uh oh!

Uh oh!

Consolidate NaN/invalid handling in min/max/nanmin/nanmax. #120

Are you sure you want to change the base?

Consolidate NaN/invalid handling in min/max/nanmin/nanmax. #120

Uh oh!

Conversation

jack-pappas commented Mar 22, 2022

Setup

Before

After

Uh oh!

jack-pappas commented Mar 22, 2022

Uh oh!

staffantj commented Mar 22, 2022

Uh oh!

OrestZborowski-SIG Mar 22, 2022

Choose a reason for hiding this comment

Uh oh!

jack-pappas Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

staffantj Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

OrestZborowski-SIG Mar 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jack-pappas Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

staffantj Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

OrestZborowski-SIG Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

OrestZborowski-SIG Mar 22, 2022 •

edited

Loading