You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In our current implementation of adjust-interval (See #14), we use the first item in a column to determine the target-datatype of the time unit to which we are converting. More specifically, adjust-interval takes a ->new-time_converter fn that the user supplies, and calls that function on the first item in the targeted column to get the new unit, and then uses tech.v3.datatype/elemwise-datatype to determine the unit's keyword.
This is all fine, but @cnuernber raised a good point that we overlooked:
There are some auto-detection routines for datatype that rely on converting the first element. All I might add to that is you may want to convert the first non-missing element; what if your first element is a missing/null value?
We should figure out a way to handle this case. It might also pay to generalize the process of determining the time datatype from the row if this is going to be a more common practice.
The text was updated successfully, but these errors were encountered:
When we added the index structure to tech.ml.datset (see techascent/tech.ml.dataset#214), we prevented a column from returning an index if there are missing values in the column here. So this issue may not be relevant any more. There should always be an item in the first position because the column should not have any missing values.
In our current implementation of
adjust-interval
(See #14), we use the first item in a column to determine thetarget-datatype
of the time unit to which we are converting. More specifically,adjust-interval
takes a->new-time_converter
fn that the user supplies, and calls that function on the first item in the targeted column to get the new unit, and then usestech.v3.datatype/elemwise-datatype
to determine the unit's keyword.This is all fine, but @cnuernber raised a good point that we overlooked:
We should figure out a way to handle this case. It might also pay to generalize the process of determining the time datatype from the row if this is going to be a more common practice.
The text was updated successfully, but these errors were encountered: