Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Zero-Field Structs and DataFrame with Height Property #19123

Merged
merged 17 commits into from
Oct 11, 2024

Conversation

coastalwhite
Copy link
Collaborator

@coastalwhite coastalwhite commented Oct 7, 2024

This PR refactors a large part of the code to allow for:

  • Zero-Field Structs (ZFSs)
  • Zero-Column DataFrame (with non-zero height) (ZCDFs)

This required quite a bit of changes all over the place to be able to support.

@github-actions github-actions bot added internal An internal refactor or improvement python Related to Python Polars rust Related to Rust Polars labels Oct 7, 2024
@coastalwhite coastalwhite force-pushed the refactor/zero-field-struct branch from 2445782 to 11dbc10 Compare October 8, 2024 12:39
Copy link

codecov bot commented Oct 8, 2024

Codecov Report

Attention: Patch coverage is 82.30088% with 120 lines in your changes missing coverage. Please review.

Project coverage is 79.79%. Comparing base (9dada18) to head (d87fbf8).
Report is 40 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-core/src/frame/mod.rs 81.01% 30 Missing ⚠️
crates/polars-arrow/src/array/struct_/mutable.rs 0.00% 19 Missing ⚠️
...tream/src/nodes/parquet_source/row_group_decode.rs 0.00% 17 Missing ⚠️
crates/polars-core/src/serde/series.rs 42.85% 12 Missing ⚠️
crates/polars-core/src/chunked_array/ops/bits.rs 43.75% 9 Missing ⚠️
crates/polars-arrow/src/legacy/array/mod.rs 0.00% 7 Missing ⚠️
crates/polars-arrow/src/array/struct_/mod.rs 87.50% 3 Missing ⚠️
...rates/polars-core/src/chunked_array/struct_/mod.rs 94.23% 3 Missing ⚠️
crates/polars-core/src/frame/row/av_buffer.rs 82.35% 3 Missing ⚠️
...s-pipe/src/executors/sinks/group_by/generic/mod.rs 0.00% 3 Missing ⚠️
... and 12 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #19123      +/-   ##
==========================================
+ Coverage   79.78%   79.79%   +0.01%     
==========================================
  Files        1531     1532       +1     
  Lines      208445   208649     +204     
  Branches     2913     2913              
==========================================
+ Hits       166301   166498     +197     
- Misses      41593    41601       +8     
+ Partials      551      550       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines -110 to -111
unsafe fn _mmap_unchecked<T: AsRef<[u8]>>(
fields: &ArrowSchema,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by: this function was unused

let mut out = a
.fields_as_series()
.iter()
.zip(b.fields_as_series().iter())
.map(|(l, r)| op(l, r))
.reduce(reduce)
.unwrap();
.unwrap_or_else(|| BooleanChunked::full(PlSmallStr::EMPTY, !value, a.len()));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if both structs are ZFSs it is always eq / not ne

@@ -192,10 +192,7 @@ impl DataFrame {

match n.get(0) {
Some(n) => self.sample_n_literal(n as usize, with_replacement, shuffle, seed),
None => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by: this should be the same

@@ -237,10 +234,7 @@ impl DataFrame {
let n = (self.height() as f64 * frac) as usize;
self.sample_n_literal(n, with_replacement, shuffle, seed)
},
None => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by: this should be the same

@@ -32,6 +32,15 @@ impl DataFrame {
/// - the length of all [`Column`] is equal to the height of this [`DataFrame`]
/// - the columns names are unique
pub unsafe fn hstack_mut_unchecked(&mut self, columns: &[Column]) -> &mut Self {
// If we don't have any columns yet, copy the height from the given columns.
if let Some(fst) = columns.first() {
if self.width() == 0 {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should have a self.height() == 0 check as well, but I want to leave that as future correctness work

@@ -1232,6 +1292,10 @@ impl DataFrame {
if let Some(idx) = self.get_column_index(column.name().as_str()) {
self.replace_column(idx, column)?;
} else {
if self.width() == 0 {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

@@ -1274,7 +1338,13 @@ impl DataFrame {
debug_assert!(self.width() == 0 || self.height() == column.len());
debug_assert!(self.get_column_index(column.name().as_str()).is_none());

// SAFETY: Invariant of function guarantees for case `width` > 0. We set the height
// properly for `width` == 0.
if self.width() == 0 {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

@@ -1288,6 +1358,10 @@ impl DataFrame {
self.replace_column(idx, c)?;
}
} else {
if self.width() == 0 {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

@@ -657,13 +657,6 @@ fn any_values_to_list(
DataType::Categorical(Some(Arc::new(RevMapping::default())), *ordering)
},

// Structs don't support empty fields yet.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very nice that we can remove these here.

.iter()
.map(|s| s.new_from_index(0, num_rows).into());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by: just convert into scalar column

@coastalwhite coastalwhite marked this pull request as ready for review October 9, 2024 07:36
@ritchie46 ritchie46 merged commit dbbd93f into pola-rs:main Oct 11, 2024
25 checks passed
@coastalwhite coastalwhite deleted the refactor/zero-field-struct branch October 11, 2024 07:12
@c-peters c-peters added the accepted Ready for implementation label Oct 14, 2024
}

/// The mutable values
pub fn mut_values(&mut self) -> &mut Vec<Box<dyn MutableArray>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this method was removed by accident. I'm not a polars expert, but I don't think there is a way to mutate the data without mut_values() or value().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not removed by accident. Exposing this data and mutating this data may break invariants of the MutableStructArray. For example, you may change the length of the columns which would make the freeze output invalid. At the moment, the only way to mutate items is to append to the MutableStructArray.

Copy link
Contributor

@utkarshgupta137 utkarshgupta137 Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not able to figure out how to mutate data anymore. Could you please share an example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation internal An internal refactor or improvement python Related to Python Polars rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants