-
Notifications
You must be signed in to change notification settings - Fork 128
Make most pyclasses
frozen
#1252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This and #1253 appear to be duplicates. @kosiew and @ntjohnson1 do you have thoughts about merging or closing one? |
#1253 looks way more comprehensive so should be merged. Would potentially be nice to cherry-pick my 2nd, 3rd, and 4th commit from here to mark every existing class as frozen to make it less likely we add non-frozen classes in the future |
@ntjohnson1
I'll work on this. |
Closed in favor of #1253 |
It's unfortunate that both PRs were opened at the same time. Sorry about that duplicated work @ntjohnson1 |
…rrow errors (#1253) * Refactor schema, config, dataframe, and expression classes to use RwLock and Mutex for interior mutability * Add error handling to CaseBuilder methods to preserve builder state * Refactor to use parking_lot for interior mutability in schema, config, dataframe, and conditional expression modules * Add concurrency tests for SqlSchema, Config, and DataFrame * Add tests for CaseBuilder to ensure builder state is preserved on success * Add test for independent handles in CaseBuilder to verify behavior * Fix CaseBuilder to preserve state correctly in when() method * Refactor to use named constant for boolean literals in test_expr.py * fix ruff errors * Refactor to introduce type aliases for cached batches in dataframe.rs * Cherry pick from #1252 * Add most expr - cherry pick from #1252 * Add source root - cherry pick #1252 * Fix license comment formatting in config.rs * Refactor caching logic to use a local variable for IPython environment check * Add test for ensuring exposed pyclasses default to frozen * Add PyO3 class mutability guidelines reference to contributor guide * Mark boolean expression classes as frozen for immutability * Refactor PyCaseBuilder methods to eliminate redundant take/store logic * Refactor PyConfig methods to improve readability by encapsulating configuration reads * Resolve patch apply conflicts for CaseBuilder concurrency improvements - Added CaseBuilderHandle guard that keeps the underlying CaseBuilder alive while holding the mutex and restores it on drop - Updated when, otherwise, and end methods to operate through the guard and consume the builder explicitly - This prevents transient None states during concurrent access and improves thread safety * Resolve Config optimization conflicts for improved read/write concurrency - Released Config read guard before converting values to Python objects in get and get_all - Ensures locks are held only while collecting scalar entries, not during expensive Python object conversion - Added regression test that runs Config.get_all and Config.set concurrently to guard against read/write contention regressions - Improves overall performance by reducing lock contention in multi-threaded scenarios * Refactor PyConfig get methods for improved readability and performance * Refactor test_expr.py to replace positional boolean literals with named constants for improved linting compliance * fix ruff errors * Add license header to test_pyclass_frozen.py for compliance * Alternate approach to case expression * Replace case builter with keeping the expressions and then applying as required * Update unit tests * Refactor case and when functions to utilize PyCaseBuilder for improved clarity and functionality * Update src/expr/conditional_expr.rs --------- Co-authored-by: ntjohnson1 <[email protected]> Co-authored-by: Tim Saucer <[email protected]>
Which issue does this PR close?
Covers most of #1250
Rationale for this change
Previous PR fixed inability to use python threads around SessionContext #1248
Suggested that we mark all possible classes frozen to be more explicit/better thread support.
What changes are included in this PR?
Arc<Mutex
as a strategy to manageSync
andfrozen
RwLock
is better since we might be reading configs more than writingpyo3[(get,set)]
would require new wrapper code for each member if we think frozen makes senseArc<Mutex
works decently.I don't expect anything here to be a major performance delta. It wasn't clear if there was a good way to check that. The benchmarks look more interested in datafusion vs other tools rather than comparing branches.
Are there any user-facing changes?
There are no user-facing python changes. I'm don't think the wrapper code is considered public apis.