Skip to content

Consolidate NaN/invalid handling in min/max/nanmin/nanmax. #120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jack-pappas
Copy link
Collaborator

Fixes rtosholdings/riptable#87 and rtosholdings/riptable#175.

This change fixes the max/nanmax/min/nanmin reduction kernels so they propgate NaNs (max, min) or ignore NaNs (nanmax, nanmin). The changes also consolidate + streamline some logic; for example, the vector register type is now derived from the element type provided in the template argument vs. being passed as a separate argument.

This change also implements part of the support needed for these functions to (selectively) handle integer invalids. That could be exposed in the future as a keyword argument to these ufuncs. Before the logic will work correctly, we'll need to implement invalid-recognizing versions of the MIN_OP, MAX_OP, etc. functions used in the AVX2-vectorized versions of these kernels; the logic for them should be fairly similar to the logic used by the floating-point MIN_OP, FMIN_OP, etc. after this change.

Before merging, I'll run some before/after benchmarks + try to put some proper tests together to run in this repo.

Testing / demonstration code:

Setup

>>> arr = rt.FA([np.nan, 0.0, 0.1, 0.2, np.nan, 0.3], dtype=np.float64).repeat(100)
>>> arr
FastArray([nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3,
           nan, 0. , 0.1, 0.2, nan, 0.3, nan, 0. , 0.1, 0.2, nan, 0.3])

Before

>>> rt.max(arr)
0.3
>>> rt.nanmax(arr)
0.3
>>> rt.min(arr)
0.3
>>> rt.nanmin(arr)
0.0
>>> arr = rt.arange(1000).astype(np.float64)
>>> rt.max(arr)
999.0
>>> rt.nanmax(arr)
999.0
>>> rt.min(arr)
0.0
>>> rt.nanmin(arr)
0.0

After

>>> rt.max(arr)
nan
>>> rt.nanmax(arr)
0.3
>>> rt.min(arr)
nan
>>> rt.nanmin(arr)
0.0
>>> arr = rt.arange(1000).astype(np.float64)
>>> arr[501] = np.nan
>>> rt.max(arr)
nan
>>> rt.nanmax(arr)
999.0
>>> rt.min(arr)
nan
>>> rt.nanmin(arr)
0.0
>>> ints = rt.FA([0, 2, 1, 4, 3, 5], dtype=np.int8)
>>> rt.max(arr)
nan
>>> rt.nanmax(arr)
999.0
>>> rt.max(ints)
5
>>> rt.nanmax(ints)
5
>>> rt.min(ints)
0
>>> rt.nanmin(ints)
0
>>> intsinv = ints.copy()
>>> intsinv[3] = intsinv.inv
>>> rt.max(intsinv)
5
>>> rt.nanmax(intsinv)
5
>>> rt.min(intsinv)
-128
>>> rt.nanmin(intsinv)
-128

@jack-pappas jack-pappas requested a review from a team March 22, 2022 18:25
@jack-pappas
Copy link
Collaborator Author

This work should maybe also take a look at the nan-propagation behavior in the group-based versions of these functions to ensure consistent behavior. (Either in this PR or in a follow-up PR.)

@staffantj
Copy link
Contributor

The failing AVX2 specializations might be better off as constexpr's - not sure if that will work out, but could be worth trying.

@@ -1,4 +1,5 @@
#pragma once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious as to the motivation for removing pragma once (which seems to be supported in all compilers we use?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to diagnose a compilation failure with Clang and thought it might be related to this. I'll change it back.

Longer-term, the project should consolidate on one style -- the codebase currently has a mix of #pragma once and old-style include guards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are pathological cases where the pragma fails, which is why it's never been fully standardized.

src/simd/avx2.h Outdated
@@ -64,20 +65,20 @@ namespace riptide
* @return T const& The result of the operation.
*/
template <typename T>
T const & min_with_nan_passthru(T const & x, T const & y)
static T const & min_with_nan_passthru(T const & x, T const & y)
Copy link
Contributor

@OrestZborowski-SIG OrestZborowski-SIG Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious as to the motivation of making all these function templates static, as opposed to (implicitly) inline?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicit inlining from the template didn't seem to be enough to resolve ODR errors. I've forced them to be inlined now + marked the templated functions with static (but removed static from the explicit template specializations, because that's not allowed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a bit of a safeguard. The storage class has to match between the primary template and the specializations of that primary template. So they're only allowed on the primary now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per cppreference, , explicit specializations do not inherit the inline attribute from the primary function template, so they need to be explicitly made 'inline', which is likely the cause of the ODR violation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

min and max gives arbitrary result (instead of nan) when there is nan
3 participants