Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Implement getChild for a few remaining column vectors #2133

Merged
merged 7 commits into from
Oct 10, 2023

Conversation

allisonport-db
Copy link
Collaborator

@allisonport-db allisonport-db commented Oct 3, 2023

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Provides implementations for getChild for column vectors that are missing them.

How was this patch tested?

Adds simple tests for DefaultViewVector and DefaultGenericVector (used by complex types in the JSON handler).
#2131 also is based off these changes and uses getChild instead of getStruct everywhere in the code.

@allisonport-db allisonport-db force-pushed the support-get-child branch 2 times, most recently from c87daed to dd3a925 Compare October 3, 2023 22:45
@@ -120,4 +123,16 @@ public Row getStruct(int rowId) {
public ArrayValue getArray(int rowId) {
return (ArrayValue) value;
}

@Override
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems incomplete to ignore DefaultConstantVector but this should never be used. I think possible options

  1. Restrict DefaultConstantVector to only allow StructType for null values (for NonExistentColumnConverter)
  2. Provide this implementation and add a test somewhere (not sure where to put this?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 1 + restrict it for Map and Array types for non-null values only?

@@ -120,4 +123,16 @@ public Row getStruct(int rowId) {
public ArrayValue getArray(int rowId) {
return (ArrayValue) value;
}

@Override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 1 + restrict it for Map and Array types for non-null values only?

@@ -136,6 +137,17 @@ public MapValue getMap(int rowId) {
return (MapValue) values[rowId];
}

@Override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can DefaultConstantVector be derived from DefaultGenericVector? If yes, then we can reduce the code size bit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could combine them and use a function like "value accessor" possibly? Then all the class does really is cast

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't restrict the types then though

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add the type checks in DefaultConstantVector constructor (this will just have the constructor)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined them here instead

/**
* {@link ColumnVector} wrapper on top of {@link Row} objects. This wrapper allows referencing
* any nested level column vector from a set of rows.
* TODO: We should change the {@link io.delta.kernel.defaults.client.DefaultJsonHandler} to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this TODO still needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think so we're still just wrapping rows in DefaultRowBasedColumnarBatch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the todo to a more appropriate spot

@allisonport-db allisonport-db merged commit cd02359 into delta-io:master Oct 10, 2023
6 checks passed
allisonport-db added a commit that referenced this pull request Oct 10, 2023
## Description

Provides implementations for `getChild` for column vectors that are missing them.

## How was this patch tested?

Adds simple tests for `DefaultViewVector` and `DefaultGenericVector` (used by complex types in the JSON handler).
#2131 also is based off these changes and uses `getChild` instead of `getStruct` everywhere in the code.
xupefei pushed a commit to xupefei/delta that referenced this pull request Oct 31, 2023
…-io#2133)

## Description

Provides implementations for `getChild` for column vectors that are missing them.

## How was this patch tested?

Adds simple tests for `DefaultViewVector` and `DefaultGenericVector` (used by complex types in the JSON handler).
delta-io#2131 also is based off these changes and uses `getChild` instead of `getStruct` everywhere in the code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants