Skip to content

Add support for DataFrame sum operation with tests #1148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 26, 2025
Merged

Conversation

zaleslaw
Copy link
Collaborator

No description provided.

Introduced the `sum` operation for DataFrames, supporting numerical columns aggregation. Updated relevant tests and added new test cases to verify functionality. Included schema modifications for handling numerical column operations.
Converted various internal classes, interfaces, and functions related to aggregation into public entities. This change expands their visibility, enabling external usage and facilitating integration with other modules or libraries.
…re compatibility and correctness in sum calculations.
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for the DataFrame sum operation along with comprehensive tests covering various summation scenarios and aggregation handlers. Key changes include:

  • Adding a new sum operation test in both generated tests and dedicated test data for verifying correct summation.
  • Extending the aggregation framework with new Sum0 and Sum1 implementations.
  • Changing visibility modifiers from internal to public in several core aggregator components.

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
plugins/kotlin-dataframe/tests-gen/org/jetbrains/kotlin/fir/dataframe/DataFrameBlackBoxCodegenTestGenerated.java Adds a new test method for the sum operation
plugins/kotlin-dataframe/testData/box/sum.kt Introduces test scenarios for sum over all, selective, and expression-based columns
plugins/kotlin-dataframe/src/org/jetbrains/kotlinx/dataframe/plugin/loadInterpreter.kt Updates load interpreter to support new Sum aggregators
plugins/kotlin-dataframe/src/org/jetbrains/kotlinx/dataframe/plugin/impl/api/statistics.kt Adds implementations for summation aggregators
core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/statistics.kt Expands tests for groupBy operations to include new numerical columns
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/sum.kt Annotates and exposes the new sum APIs
Other core aggregator files Adjusts visibility and typing to support public sum operation API
Comments suppressed due to low confidence (1)

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/statistics.kt:571

  • In the groupBy maxBy test, the column names are checked using 'res4' while the aggregation result is later accessed from 'res5'. This inconsistency might lead to erroneous test behavior; ensure the same result object is used throughout the test.
res4.columnNames() shouldBe listOf("city", "name", "age", "weight", "height", "yearsToRetirement", "workExperienceYears", "dependentsCount", "annualIncome")

@Jolanrensen Jolanrensen mentioned this pull request Apr 26, 2025
# Conflicts:
#	core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/Aggregators.kt
@zaleslaw zaleslaw marked this pull request as ready for review April 26, 2025 15:05
@zaleslaw zaleslaw merged commit 2ac173c into master Apr 26, 2025
4 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants