Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deltalake-python: Missing support for adding new factories and logstores #2818

Open
MartinKolbAtWork opened this issue Aug 23, 2024 · 5 comments
Labels
binding/python Issues for the Python package enhancement New feature or request

Comments

@MartinKolbAtWork
Copy link

Description

The Python binding has a hard-coded list of deltalake handlers that are registered:

delta-rs/python/src/lib.rs

Lines 2035 to 2039 in fcd62ab

deltalake::aws::register_handlers(None);
deltalake::azure::register_handlers(None);
deltalake::gcp::register_handlers(None);
deltalake::hdfs::register_handlers(None);
deltalake_mount::register_handlers(None);

To add support for another object store (SAP BTP) we have a Rust crate available, we did not find a way to register these handlers onto the already existing Python binding.
The shared library that comes with deltalake-python does not expose an entry point for adding new object stores.
We ended up in forking delta-rs and adding the registration call as another line in the list above. But we don’t think using a fork to add an additional object store is an appropriate approach.

Have we missed something here? Shouldn’t there be a way to add additional stores in addition to the 5 existing ones?

@MartinKolbAtWork MartinKolbAtWork added the enhancement New feature or request label Aug 23, 2024
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Aug 23, 2024

I'll take a look on how an api should look like to expose and register an external handler through python

@ion-elgreco
Copy link
Collaborator

@MartinKolbAtWork it seems that this might be not possible or quite complex, I at least can't find any docs in Pyo3 to achieve this

Do you have the SAP BTP Object store published somewhere?

@MartinKolbAtWork
Copy link
Author

Hi @ion-elgreco ,
Thanks for looking into this. The integration for SAP BTP is currently used internally at SAP and might be published later, however currently I cannot share the code.
It’s actually using “SAP Data Lake Files” (https://help.sap.com/docs/hana-cloud-data-lake/user-guide-for-data-lake-files/understanding-data-lake-files) as object storge, which is accessible via SAP’s Business Technology Platform (BTP, https://www.sap.com/products/technology-platform.html).

I also investigated a possible solution and it’s especially challenging because the shared library packaged with the Wheel of deltalake-python would need binary compatibility with the shared library that would be packaged with the “add-on”. Ensuring the binary compatibility between these libraries (e.g. related to the used Rust version and the used version of the deltalake crate) would be hard to achieve. An approach that “tunnels” all calls between the two Rust libraries over Python could mitigate the binary compatibility issues but would probably suffer from poor performance.

@Xuanwo
Copy link
Contributor

Xuanwo commented Sep 13, 2024

Hello, I'm from the OpenDAL community, which aims to provide storage access to various services in multiple languages. Perhaps we can build something extensible to allow us to integrate with more storage services easily.

Tools we have now:

@ion-elgreco
Copy link
Collaborator

@Xuanwo hey, I wasn't aware that opendal has an objectstore Impl, that's useful!

Any help on this is much appreciated :)

@rtyler rtyler added the binding/python Issues for the Python package label Sep 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants