Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Remove the ColumnVector::getStruct API #2131

Merged
merged 3 commits into from
Oct 13, 2023

Conversation

allisonport-db
Copy link
Collaborator

@allisonport-db allisonport-db commented Oct 3, 2023

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Removes the getStruct API from ColumnVector. We will use a wrapper to convert to rows only for the ColumnarBatch/FilteredColumnarBatch row-based processing APIs.

How was this patch tested?

Existing tests should suffice.

allisonport-db added a commit that referenced this pull request Oct 10, 2023
## Description

Provides implementations for `getChild` for column vectors that are missing them.

## How was this patch tested?

Adds simple tests for `DefaultViewVector` and `DefaultGenericVector` (used by complex types in the JSON handler).
#2131 also is based off these changes and uses `getChild` instead of `getStruct` everywhere in the code.
allisonport-db added a commit that referenced this pull request Oct 10, 2023
## Description

Provides implementations for `getChild` for column vectors that are missing them.

## How was this patch tested?

Adds simple tests for `DefaultViewVector` and `DefaultGenericVector` (used by complex types in the JSON handler).
#2131 also is based off these changes and uses `getChild` instead of `getStruct` everywhere in the code.
/**
* Wrapper around list of {@link Row}s to expose the rows as a column vector
*/
private static class RowBasedVector implements ColumnVector {
Copy link
Collaborator Author

@allisonport-db allisonport-db Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to add this until/if we decide to remove JsonHandlerTestImpl

@allisonport-db allisonport-db changed the title [WIP][Kernel] Remove the ColumnVector::getStruct API [Kernel] Remove the ColumnVector::getStruct API Oct 10, 2023
@allisonport-db allisonport-db added this to the 3.0.0 milestone Oct 12, 2023
Format.fromRow(requireNonNull(row, 0, "id").getStruct(3)),
requireNonNull(vector.getChild(0), rowId, "id").getString(rowId),
Optional.ofNullable(vector.getChild(1).isNullAt(rowId) ? null :
vector.getChild(1).getString(rowId)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the cost of getChild? Wondering with this approach are we creating too many objects?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think depends on the implementation; some create wrapper objects, we can update those implementations to only create the child vectors once.

It's no more object creation than the row API (and often less) since we'd create a wrapper row each getStruct call

Copy link
Collaborator

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@allisonport-db allisonport-db merged commit 67d6c5d into delta-io:master Oct 13, 2023
6 checks passed
allisonport-db added a commit to allisonport-db/delta that referenced this pull request Oct 13, 2023
## Description

Removes the `getStruct` API from `ColumnVector`. We will use a wrapper to convert to rows only for the ColumnarBatch/FilteredColumnarBatch row-based processing APIs.

## How was this patch tested?

Existing tests should suffice.
xupefei pushed a commit to xupefei/delta that referenced this pull request Oct 31, 2023
…-io#2133)

## Description

Provides implementations for `getChild` for column vectors that are missing them.

## How was this patch tested?

Adds simple tests for `DefaultViewVector` and `DefaultGenericVector` (used by complex types in the JSON handler).
delta-io#2131 also is based off these changes and uses `getChild` instead of `getStruct` everywhere in the code.
xupefei pushed a commit to xupefei/delta that referenced this pull request Oct 31, 2023
## Description

Removes the `getStruct` API from `ColumnVector`. We will use a wrapper to convert to rows only for the ColumnarBatch/FilteredColumnarBatch row-based processing APIs.

## How was this patch tested?

Existing tests should suffice.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants