-
Notifications
You must be signed in to change notification settings - Fork 5
Consolidate NaN/invalid handling in min/max/nanmin/nanmax. #120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This work should maybe also take a look at the nan-propagation behavior in the group-based versions of these functions to ensure consistent behavior. (Either in this PR or in a follow-up PR.) |
The failing AVX2 specializations might be better off as constexpr's - not sure if that will work out, but could be worth trying. |
@@ -1,4 +1,5 @@ | |||
#pragma once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious as to the motivation for removing pragma once (which seems to be supported in all compilers we use?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to diagnose a compilation failure with Clang and thought it might be related to this. I'll change it back.
Longer-term, the project should consolidate on one style -- the codebase currently has a mix of #pragma once
and old-style include guards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are pathological cases where the pragma fails, which is why it's never been fully standardized.
src/simd/avx2.h
Outdated
@@ -64,20 +65,20 @@ namespace riptide | |||
* @return T const& The result of the operation. | |||
*/ | |||
template <typename T> | |||
T const & min_with_nan_passthru(T const & x, T const & y) | |||
static T const & min_with_nan_passthru(T const & x, T const & y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also curious as to the motivation of making all these function templates static, as opposed to (implicitly) inline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implicit inlining from the template didn't seem to be enough to resolve ODR errors. I've forced them to be inlined now + marked the templated functions with static (but removed static
from the explicit template specializations, because that's not allowed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a bit of a safeguard. The storage class has to match between the primary template and the specializations of that primary template. So they're only allowed on the primary now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per cppreference, , explicit specializations do not inherit the inline attribute from the primary function template, so they need to be explicitly made 'inline', which is likely the cause of the ODR violation.
Fixes rtosholdings/riptable#87 and rtosholdings/riptable#175.
This change fixes the max/nanmax/min/nanmin reduction kernels so they propgate NaNs (max, min) or ignore NaNs (nanmax, nanmin). The changes also consolidate + streamline some logic; for example, the vector register type is now derived from the element type provided in the template argument vs. being passed as a separate argument.
This change also implements part of the support needed for these functions to (selectively) handle integer invalids. That could be exposed in the future as a keyword argument to these ufuncs. Before the logic will work correctly, we'll need to implement invalid-recognizing versions of the
MIN_OP
,MAX_OP
, etc. functions used in the AVX2-vectorized versions of these kernels; the logic for them should be fairly similar to the logic used by the floating-pointMIN_OP
,FMIN_OP
, etc. after this change.Before merging, I'll run some before/after benchmarks + try to put some proper tests together to run in this repo.
Testing / demonstration code:
Setup
Before
After