You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
unify_json_strings in map_utils.cu has two places where it asserts on string column data sizes. The first is this one:
auto const output_size =
2l + // two extra bracket characters '[' and ']'
static_cast<int64_t>(chars_size) +
static_cast<int64_t>(input.size() - 1) + // append `,` character between input rows
static_cast<int64_t>(input.null_count()) * 2l; // replace null with "{}"
CUDF_EXPECTS(output_size <= static_cast<int64_t>(std::numeric_limits<cudf::size_type>::max()),
"The input json column is too large and causes overflow.");
Eventually cudf will support string columns with more than 2GB of character data, and this assert will fire when we do not want it to.
Later there's a sanity check assertion that used to be super cheap, because before the recent cudf strings column change, calculating the size of character data in a string column involved only accessing data already on the CPU. Now this assert requires an extra stream synchronization that could be avoided by simply computing it directly from the known output_size.
We could consider removing this assertion if we never see issues with join_strings doing what it's supposed to do, and that would remove a stream synchronization required to access the string column's character data size.
The text was updated successfully, but these errors were encountered:
unify_json_strings
in map_utils.cu has two places where it asserts on string column data sizes. The first is this one:Eventually cudf will support string columns with more than 2GB of character data, and this assert will fire when we do not want it to.
Later there's a sanity check assertion that used to be super cheap, because before the recent cudf strings column change, calculating the size of character data in a string column involved only accessing data already on the CPU. Now this assert requires an extra stream synchronization that could be avoided by simply computing it directly from the known
output_size
.We could consider removing this assertion if we never see issues with
join_strings
doing what it's supposed to do, and that would remove a stream synchronization required to access the string column's character data size.The text was updated successfully, but these errors were encountered: