-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Zero-Field Structs and DataFrame with Height Property #19123
refactor: Zero-Field Structs and DataFrame with Height Property #19123
Conversation
2445782
to
11dbc10
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19123 +/- ##
==========================================
+ Coverage 79.78% 79.79% +0.01%
==========================================
Files 1531 1532 +1
Lines 208445 208649 +204
Branches 2913 2913
==========================================
+ Hits 166301 166498 +197
- Misses 41593 41601 +8
+ Partials 551 550 -1 ☔ View full report in Codecov by Sentry. |
unsafe fn _mmap_unchecked<T: AsRef<[u8]>>( | ||
fields: &ArrowSchema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by: this function was unused
let mut out = a | ||
.fields_as_series() | ||
.iter() | ||
.zip(b.fields_as_series().iter()) | ||
.map(|(l, r)| op(l, r)) | ||
.reduce(reduce) | ||
.unwrap(); | ||
.unwrap_or_else(|| BooleanChunked::full(PlSmallStr::EMPTY, !value, a.len())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if both structs are ZFSs it is always eq / not ne
@@ -192,10 +192,7 @@ impl DataFrame { | |||
|
|||
match n.get(0) { | |||
Some(n) => self.sample_n_literal(n as usize, with_replacement, shuffle, seed), | |||
None => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by: this should be the same
@@ -237,10 +234,7 @@ impl DataFrame { | |||
let n = (self.height() as f64 * frac) as usize; | |||
self.sample_n_literal(n, with_replacement, shuffle, seed) | |||
}, | |||
None => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by: this should be the same
@@ -32,6 +32,15 @@ impl DataFrame { | |||
/// - the length of all [`Column`] is equal to the height of this [`DataFrame`] | |||
/// - the columns names are unique | |||
pub unsafe fn hstack_mut_unchecked(&mut self, columns: &[Column]) -> &mut Self { | |||
// If we don't have any columns yet, copy the height from the given columns. | |||
if let Some(fst) = columns.first() { | |||
if self.width() == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this should have a self.height() == 0
check as well, but I want to leave that as future correctness work
@@ -1232,6 +1292,10 @@ impl DataFrame { | |||
if let Some(idx) = self.get_column_index(column.name().as_str()) { | |||
self.replace_column(idx, column)?; | |||
} else { | |||
if self.width() == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
@@ -1274,7 +1338,13 @@ impl DataFrame { | |||
debug_assert!(self.width() == 0 || self.height() == column.len()); | |||
debug_assert!(self.get_column_index(column.name().as_str()).is_none()); | |||
|
|||
// SAFETY: Invariant of function guarantees for case `width` > 0. We set the height | |||
// properly for `width` == 0. | |||
if self.width() == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
@@ -1288,6 +1358,10 @@ impl DataFrame { | |||
self.replace_column(idx, c)?; | |||
} | |||
} else { | |||
if self.width() == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
@@ -657,13 +657,6 @@ fn any_values_to_list( | |||
DataType::Categorical(Some(Arc::new(RevMapping::default())), *ordering) | |||
}, | |||
|
|||
// Structs don't support empty fields yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is very nice that we can remove these here.
.iter() | ||
.map(|s| s.new_from_index(0, num_rows).into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by: just convert into scalar column
} | ||
|
||
/// The mutable values | ||
pub fn mut_values(&mut self) -> &mut Vec<Box<dyn MutableArray>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this method was removed by accident. I'm not a polars expert, but I don't think there is a way to mutate the data without mut_values()
or value()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not removed by accident. Exposing this data and mutating this data may break invariants of the MutableStructArray
. For example, you may change the length of the columns which would make the freeze output invalid. At the moment, the only way to mutate items is to append to the MutableStructArray
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not able to figure out how to mutate data anymore. Could you please share an example?
This PR refactors a large part of the code to allow for:
This required quite a bit of changes all over the place to be able to support.