You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The comments in ColumnVector indicate that we should only use the isNull array to determine null values when noNulls is set to false. This helps us reuse ColumnVector, for example, during reuse when noNulls transitions from true to true or false to true, we don't need to set isNull, which can help improve performance.
Unfortunately, there are some counterexamples in the Java impl, such as in BitFieldReader.java, where we directly read isNull without checking noNulls first. To ensure correctness, Java resets noNulls and isNull every time, as seen in TreeReaderFactory.java.
This issue originates from the discussion of ORC-1408, whether the ORC Java version needs to modify all counterexamples to adhere to the contract. I'm inclined to make the changes to be consistent with the C++ impl and to avoid unnecessary isNull setting, although I haven't verified how much performance improvement it could bring.
The text was updated successfully, but these errors were encountered:
IMHO, it is safe to make the change if the writer and reader can promise the contract to the downstream consumers (e.g. Apache Spark and Apache Hive). The ORC java library can break the promise in the internal implementation.
It would be great to standardize this interaction across the project with respect to the contract. Agree with the proposal that as long as the contract with external consumers is not affected we can work on internal standardization/consistency.
Regards,
Pavan
On Apr 15, 2023, at 6:07 AM, Gang Wu ***@***.***> wrote:
IMHO, it is safe to make the change if the writer and reader can promise the contract to the downstream consumers (e.g. Apache Spark and Apache Hive). The ORC java library can break the promise in the internal implementation.
—
Reply to this email directly, view it on GitHub <#1470 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABYDHO3E55QXFBW5YYD2DFTXBKMSFANCNFSM6AAAAAAW7G4FZM>.
You are receiving this because you are subscribed to this thread.
The comments in ColumnVector indicate that we should only use the
isNull
array to determine null values whennoNulls
is set to false. This helps us reuseColumnVector
, for example, during reuse whennoNulls
transitions fromtrue
totrue
orfalse
totrue
, we don't need to setisNull
, which can help improve performance.Unfortunately, there are some counterexamples in the Java impl, such as in BitFieldReader.java, where we directly read
isNull
without checkingnoNulls
first. To ensure correctness, Java resetsnoNulls
andisNull
every time, as seen in TreeReaderFactory.java.This issue originates from the discussion of ORC-1408, whether the ORC Java version needs to modify all counterexamples to adhere to the contract. I'm inclined to make the changes to be consistent with the C++ impl and to avoid unnecessary
isNull
setting, although I haven't verified how much performance improvement it could bring.The text was updated successfully, but these errors were encountered: