Skip to content

Conversation

ntjohnson1
Copy link
Contributor

Which issue does this PR close?

Covers most of #1250

Rationale for this change

Previous PR fixed inability to use python threads around SessionContext #1248

Suggested that we mark all possible classes frozen to be more explicit/better thread support.

What changes are included in this PR?

  • Fix a typos setting that failed for me locally
  • For all classes that don't mutate members just add frozen tag
  • For two classes apply Arc<Mutex as a strategy to manage Sync and frozen
    • Maybe for PyConfig RwLock is better since we might be reading configs more than writing
  • Left todos for remaining pyclasses
    • everything with pyo3[(get,set)] would require new wrapper code for each member if we think frozen makes sense
    • CaseBuilder was a bit messier. If we derive clone upstream then Arc<Mutex works decently.

I don't expect anything here to be a major performance delta. It wasn't clear if there was a good way to check that. The benchmarks look more interested in datafusion vs other tools rather than comparing branches.

Are there any user-facing changes?

There are no user-facing python changes. I'm don't think the wrapper code is considered public apis.

@ntjohnson1 ntjohnson1 changed the title Make Most pyclasses frozen Make most pyclasses frozen Sep 26, 2025
@timsaucer
Copy link
Member

This and #1253 appear to be duplicates. @kosiew and @ntjohnson1 do you have thoughts about merging or closing one?

@ntjohnson1
Copy link
Contributor Author

This and #1253 appear to be duplicates. @kosiew and @ntjohnson1 do you have thoughts about merging or closing one?

#1253 looks way more comprehensive so should be merged. Would potentially be nice to cherry-pick my 2nd, 3rd, and 4th commit from here to mark every existing class as frozen to make it less likely we add non-frozen classes in the future

@kosiew
Copy link
Contributor

kosiew commented Sep 28, 2025

@ntjohnson1
Sorry, I did not notice this PR when I worked on #1253

Would potentially be nice to cherry-pick my 2nd, 3rd, and 4th commit from here to mark every existing class as frozen to make it less likely we add non-frozen classes in the future

I'll work on this.

kosiew pushed a commit to kosiew/datafusion-python that referenced this pull request Sep 28, 2025
kosiew pushed a commit to kosiew/datafusion-python that referenced this pull request Sep 28, 2025
kosiew pushed a commit to kosiew/datafusion-python that referenced this pull request Sep 28, 2025
@ntjohnson1
Copy link
Contributor Author

Closed in favor of #1253

@ntjohnson1 ntjohnson1 closed this Sep 28, 2025
@kylebarron
Copy link
Member

It's unfortunate that both PRs were opened at the same time. Sorry about that duplicated work @ntjohnson1

kosiew pushed a commit to kosiew/datafusion-python that referenced this pull request Oct 1, 2025
kosiew pushed a commit to kosiew/datafusion-python that referenced this pull request Oct 1, 2025
kosiew pushed a commit to kosiew/datafusion-python that referenced this pull request Oct 1, 2025
timsaucer added a commit that referenced this pull request Oct 7, 2025
…rrow errors (#1253)

* Refactor schema, config, dataframe, and expression classes to use RwLock and Mutex for interior mutability

* Add error handling to CaseBuilder methods to preserve builder state

* Refactor to use parking_lot for interior mutability in schema, config, dataframe, and conditional expression modules

* Add concurrency tests for SqlSchema, Config, and DataFrame

* Add tests for CaseBuilder to ensure builder state is preserved on success

* Add test for independent handles in CaseBuilder to verify behavior

* Fix CaseBuilder to preserve state correctly in when() method

* Refactor to use named constant for boolean literals in test_expr.py

* fix ruff errors

* Refactor to introduce type aliases for cached batches in dataframe.rs

* Cherry pick from #1252

* Add most expr - cherry pick from #1252

* Add source root - cherry pick #1252

* Fix license comment formatting in config.rs

* Refactor caching logic to use a local variable for IPython environment check

* Add test for ensuring exposed pyclasses default to frozen

* Add PyO3 class mutability guidelines reference to contributor guide

* Mark boolean expression classes as frozen for immutability

* Refactor PyCaseBuilder methods to eliminate redundant take/store logic

* Refactor PyConfig methods to improve readability by encapsulating configuration reads

* Resolve patch apply conflicts for CaseBuilder concurrency improvements

- Added CaseBuilderHandle guard that keeps the underlying CaseBuilder alive while holding the mutex and restores it on drop
- Updated when, otherwise, and end methods to operate through the guard and consume the builder explicitly
- This prevents transient None states during concurrent access and improves thread safety

* Resolve Config optimization conflicts for improved read/write concurrency

- Released Config read guard before converting values to Python objects in get and get_all
- Ensures locks are held only while collecting scalar entries, not during expensive Python object conversion
- Added regression test that runs Config.get_all and Config.set concurrently to guard against read/write contention regressions
- Improves overall performance by reducing lock contention in multi-threaded scenarios

* Refactor PyConfig get methods for improved readability and performance

* Refactor test_expr.py to replace positional boolean literals with named constants for improved linting compliance

* fix ruff errors

* Add license header to test_pyclass_frozen.py for compliance

* Alternate approach to case expression

* Replace case builter with keeping the expressions and then applying as required

* Update unit tests

* Refactor case and when functions to utilize PyCaseBuilder for improved clarity and functionality

* Update src/expr/conditional_expr.rs

---------

Co-authored-by: ntjohnson1 <[email protected]>
Co-authored-by: Tim Saucer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants