-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify SchemaTransform API #531
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #531 +/- ##
==========================================
- Coverage 80.43% 80.41% -0.02%
==========================================
Files 62 62
Lines 13645 13641 -4
Branches 13645 13641 -4
==========================================
- Hits 10975 10970 -5
- Misses 2114 2116 +2
+ Partials 556 555 -1 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cow
-> ref change LGTM but left a question about the trait-level lifetime
/// Called for each array element encountered during the schema traversal. Implementations can | ||
/// call [`Self::transform`] if they wish to recursively transform the array element type. | ||
fn transform_array_element(&mut self, etype: &'a DataType) -> Option<Cow<'a, DataType>> { | ||
self.transform(etype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious: why do we have these transform_array_element/map_key/map_val that just wrap transform
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're missing hook points this PR adds. The nested data skipping and column mapping PR will use them to improve its error messages (because column mapping transformation needs to recurse through maps and arrays, and if something goes wrong inside them, the error message should make it clear where the problem was.
(there's no recurse_into_xxx
because no additional boilerplate is needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks :)
@@ -647,61 +647,73 @@ impl Display for DataType { | |||
/// The provided `recurse_into_xxx` methods encapsulate the boilerplate work of recursing into the | |||
/// child schema elements of each schema element. Implementations can call these as needed but will | |||
/// generally not need to override them. | |||
pub trait SchemaTransform { | |||
pub trait SchemaTransform<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does sharing the lifetime across methods make sense here? put another way: should the lifetimes of the types we are transforming all be the same? I suppose the answer is yes since each method is invokes as we are transforming some type and possibly recursing which would imply the same lifetime requirements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an observation by @nicklan that allows the data skipping PR to not clone string field names it stores references to -- everything reachable through the entry point must survive at least as long as the entry point, so we can define a single lifetime for the entire traversal that visitors can take advantage of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay yep makes sense! thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! one very minor nit. Thanks!
&mut self, | ||
_ptype: Cow<'a, PrimitiveType>, | ||
_ptype: &'a PrimitiveType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit:
_ptype: &'a PrimitiveType, | |
_: &'a PrimitiveType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was normal to keep the names of unused params in public method definitions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually not sure if there's "normal" on this one. I don't find it adds much here so suggested removing it, but I don't feel strongly either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick regexp search over the kernel sources shows seven places that use _: Foo
as a function argument and ~100 that use _foo: Foo
. I think I'll leave this one as-is.
What changes are proposed in this pull request?
Change
SchemaTransform
to have a trait-level named lifetime and for its methods to take&'a Foo
instead ofCow<'a, Foo>
. Not only wasCow
unnecessary there, it also caused borrow checker problems (recursing on aCow
consumes it, which fails borrow checks if there are any references to the underlying reference -- we know it's aCow::Borrowed
but the compiler cannot prove that.How was this change tested?
Existing tests.