Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Improve histogram bin logic #18761

Merged
merged 4 commits into from
Nov 19, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Fix uniformity check
mcrumiller authored and alexander-beedie committed Nov 13, 2024
commit c653d1626d69c85d89b9ea6ce14cb244332b8d97
18 changes: 13 additions & 5 deletions crates/polars-ops/src/chunked_array/hist.rs
Original file line number Diff line number Diff line change
@@ -26,19 +26,27 @@ where
// User-supplied bins. Note these are actually bin edges. Check for monotonicity.
// If we only have one edge, we have no bins.
let bin_len = bins.len();
let mut uniform = true;
// We also check for uniformity of bins. We declare uniformity if the difference
// between the largest and smallest bin is < 0.00001 the average bin size.
if bin_len > 1 {
let diff = bins[1] - bins[0];
let mut smallest = bins[1] - bins[0];
let mut largest = smallest;
let mut avg_bin_size = smallest;
for i in 1..bins.len() {
if bins[i - 1] >= bins[i] {
let d = bins[i] - bins[i - 1];
if d <= 0.0 {
return Err(PolarsError::ComputeError(
"bins must increase monotonically".into(),
));
}
if uniform && (bins[i] - bins[i - 1]) != diff {
uniform = false;
if d > largest {
largest = d;
} else if d < smallest {
smallest = d;
}
avg_bin_size += d;
}
let uniform = (largest - smallest) / (avg_bin_size / bin_len as f64) < 0.00001;
(bins.to_vec(), uniform)
} else {
(Vec::<f64>::new(), false) // uniformity doesn't matter here