Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(catalog): Prepare existing catalog APIs for integration [1/3] #3820

Merged
merged 6 commits into from
Feb 20, 2025

Conversation

rchowell
Copy link
Contributor

@rchowell rchowell commented Feb 17, 2025

This is PR [1/3] for integrating #3805 to Daft/main – it's primary purpose is to establish the stable APIs and prepare the cutover to Session for attach/detach and creating temporary tables.

Changes

  1. Define minimal Catalog and Table ABCs.
  2. Internalize Catalog/Table implementations.
  3. Marks deprecated APIs along with their alternatives.
  4. Existing public classes are aliased to the internalized ones for backwards compatibility.

Part [2/3] will integrate the session actions (from #3805) to replace deprecated MetaCatalog APIs.

  • deprecated -> replacement
  • daft.register_python_catalog -> daft.attach_catalog
  • daft.unregister_catalog -> daft.detach_catalog
  • daft.read_table -> daft.read_table (via session)
  • daft.register_table -> daft.attach_table

@rchowell rchowell requested a review from jaychia February 17, 2025 23:53
@github-actions github-actions bot added the feat label Feb 17, 2025
Copy link

codecov bot commented Feb 18, 2025

Codecov Report

Attention: Patch coverage is 62.11180% with 61 lines in your changes missing coverage. Please review.

Project coverage is 78.03%. Comparing base (b34c2bf) to head (1b0ab2d).
Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
daft/catalog/__unity.py 54.90% 23 Missing ⚠️
daft/catalog/__init__.py 74.54% 14 Missing ⚠️
daft/catalog/__iceberg.py 69.04% 13 Missing ⚠️
daft/catalog/pyiceberg.py 0.00% 4 Missing ⚠️
daft/catalog/unity.py 0.00% 4 Missing ⚠️
daft/catalog/python_catalog.py 0.00% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3820      +/-   ##
==========================================
+ Coverage   77.89%   78.03%   +0.13%     
==========================================
  Files         751      754       +3     
  Lines       94879    95258     +379     
==========================================
+ Hits        73905    74333     +428     
+ Misses      20974    20925      -49     
Files with missing lines Coverage Δ
src/daft-catalog/src/python.rs 61.76% <100.00%> (+0.57%) ⬆️
daft/catalog/python_catalog.py 0.00% <0.00%> (ø)
daft/catalog/pyiceberg.py 0.00% <0.00%> (ø)
daft/catalog/unity.py 0.00% <0.00%> (ø)
daft/catalog/__iceberg.py 69.04% <69.04%> (ø)
daft/catalog/__init__.py 76.66% <74.54%> (+24.20%) ⬆️
daft/catalog/__unity.py 54.90% <54.90%> (ø)

... and 18 files with indirect coverage changes

Copy link

codspeed-hq bot commented Feb 18, 2025

CodSpeed Performance Report

Merging #3820 will degrade performances by 25.1%

Comparing rchowell/catalog-1-of-3 (1b0ab2d) with main (df6f134)

Summary

⚡ 1 improvements
❌ 1 regressions
✅ 25 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
test_count[1 Small File] 3.8 ms 3.4 ms +11.76%
test_iter_rows_first_row[100 Small Files] 221 ms 295 ms -25.1%

@rchowell rchowell changed the title feat(catalog): [1/3] prepare existing catalog APIs for integration feat(catalog): Prepare existing catalog APIs for integration [1/3] Feb 18, 2025
@rchowell
Copy link
Contributor Author

Copy link
Contributor

@jaychia jaychia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the """DEPRECATED: Please use read().""" docstrings, we can use the warnings stdlib as well, something like: This is deprecated, please prefer using ... instead; version=0.5.0


def select(self, *columns: ColumnInputType) -> DataFrame:
"""Returns a DataFrame from this table with the selected columns."""
return self.read().select(*columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I do like the UX and don't think read() should take any arguments.

try:
from daft.unity_catalog import UnityCatalog
@staticmethod
def _try_from_iceberg(obj: object) -> Catalog | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most accurately, from_pyiceberg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I went for consistency with daft.read_iceberg, but could go either way.

@rchowell rchowell merged commit bcd818b into main Feb 20, 2025
42 of 44 checks passed
@rchowell rchowell deleted the rchowell/catalog-1-of-3 branch February 20, 2025 22:11
rchowell added a commit that referenced this pull request Feb 21, 2025
…ng APIs [2/3] (#3825)

This is PR [2/3] for integrating #3805 to Daft/main. This PR adds
several implementations with basic end-to-end sanity tests. Most
notably, this PR includes the bi-directional implementations for
Catalogs and Tables. These enable us to use python-backed
implementations in rust (for daft-connect and daft-sql) while also being
able to return Catalog/Table python objects for use with the python
APIs.

**Changes**
1. Adds several internal factories for attach/detaching supported
objects to the session.
2. Adds `Catalog.from_pydict` to cover basic needs and some backwards
compatibility.
3. Adds `Identifier.from_str` for pyiceberg style identifiers (these are
entirely optional).
4. Flesh out the [minimal session
actions](https://github.com/Eventual-Inc/Daft/compare/rchowell/catalog-1-of-3...rchowell/catalog-2-of-3?expand=1#diff-fc2305e1560f8f7d5a974f58325f4b231bc638d6bacebbc795b6f8fb924114ccR1884-R1895)
for a release.
5. Adds Bindings for a single name resolution abstraction, will come in
handy for case-normalization in the near future.
6. Updates daft-connect and daft-sql to use `attach_table` 

Part [3/3] will cutover all deprecated APIs to use these new APIs and
remove the unused artifacts. These APIs are enumerated in #3820
kevinzwang pushed a commit that referenced this pull request Feb 21, 2025
…ng APIs [2/3] (#3825)

This is PR [2/3] for integrating #3805 to Daft/main. This PR adds
several implementations with basic end-to-end sanity tests. Most
notably, this PR includes the bi-directional implementations for
Catalogs and Tables. These enable us to use python-backed
implementations in rust (for daft-connect and daft-sql) while also being
able to return Catalog/Table python objects for use with the python
APIs.

**Changes**
1. Adds several internal factories for attach/detaching supported
objects to the session.
2. Adds `Catalog.from_pydict` to cover basic needs and some backwards
compatibility.
3. Adds `Identifier.from_str` for pyiceberg style identifiers (these are
entirely optional).
4. Flesh out the [minimal session
actions](https://github.com/Eventual-Inc/Daft/compare/rchowell/catalog-1-of-3...rchowell/catalog-2-of-3?expand=1#diff-fc2305e1560f8f7d5a974f58325f4b231bc638d6bacebbc795b6f8fb924114ccR1884-R1895)
for a release.
5. Adds Bindings for a single name resolution abstraction, will come in
handy for case-normalization in the near future.
6. Updates daft-connect and daft-sql to use `attach_table` 

Part [3/3] will cutover all deprecated APIs to use these new APIs and
remove the unused artifacts. These APIs are enumerated in #3820
rchowell added a commit that referenced this pull request Feb 25, 2025
… abstractions [3/3] (#3830)

This PR swaps existing functionality to be backed by the new session,
catalog, and table APIs.

**Changes**
* Adds set_namespace and current_namespace for qualified name resolution
control
* Adds support for catalog-qualified and schema-qualified identifiers
* Adds the new session APIs to daft.* top-level
  * `daft.register_python_catalog -> daft.attach_catalog`
  * `daft.unregister_catalog -> daft.detach_catalog`
  * `daft.read_table -> daft.read_table` (via session)
  * `daft.register_table -> daft.attach_table`
* Ports existing rust tests for DaftMetaCatalog to the Session.

_Context_
* #3820
* #3825
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants