Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: datafusion dataframe insert into iceberg table #50

Open
itsTykho opened this issue Nov 5, 2024 · 7 comments
Open

question: datafusion dataframe insert into iceberg table #50

itsTykho opened this issue Nov 5, 2024 · 7 comments

Comments

@itsTykho
Copy link

itsTykho commented Nov 5, 2024

hello. i see in this issue that there is support for writing a datafusion dataframe to an existing iceberg table, but im not able to find any examples or documentation on doing this. specifically, i'm trying to work with the rest catalog (its a glue catalog im trying to use).

is writing a datafusion dataframe to an iceberg table in a glue catalog possible? if so, is there some example for using a rest catalog, as it seems quite different than the sql catalog shown in the only example i can find.

really appreciate it.

@JanKaul
Copy link
Owner

JanKaul commented Nov 6, 2024

I've updated the dataframe example to include inserting a dataframe into an iceberg table.

Regarding the catalog, there currently is no glue catalog implementation. But I've created an issue #52 for it. The REST catalog should just work fine.

@itsTykho
Copy link
Author

itsTykho commented Nov 6, 2024

Regarding the catalog, there currently is no glue catalog implementation. But I've created an issue #52 for it. The REST catalog should just work fine.

thank you very much for the quick response and help. i do have another question. for the RestCatalog in iceberg-rest-catalog, it's looking for an ObjectStoreBuilder, not an object-store. is there a reason for that, opposed to the other catalogs that just want an object-store?

@JanKaul
Copy link
Owner

JanKaul commented Nov 6, 2024

The motivation for the ObjectStoreBuilder is to enable multiple buckets with a single REST catalog. The other catalog implementations currently only support a single bucket per catalog. I'm not entirely sure whether it's an important use case that I should expand to the other catalog implementations.

@itsTykho
Copy link
Author

itsTykho commented Nov 7, 2024

thanks for the explanation. i guess im struggling to wrap my head around how it's actually used. does one just pass the enum (for example: ObjectStoreBuilder.S3) as the arg?

it seems like a common use case would already have the catalog being used somewhere, for example, loading data from iceberg, doing something with it, and putting it back into iceberg.

@JanKaul
Copy link
Owner

JanKaul commented Nov 8, 2024

I'm wondering if it makes sense to use the ObjectStoreBuilder. I thought it would be a common use case, but most catalog implementations do it differently. If it makes sense, I'll try to make it easier to use.

For the time being you can find an example in the trino integration test.

@itsTykho
Copy link
Author

itsTykho commented Nov 14, 2024

ive been trying to get the REST catalog to work, but in vain. is there any sort of documentation on it apart from those examples? im just unsure of what it's actually looking for in some cases. is it necessary to run testcontainers and set the localstack variable?

apologies for using issues for support - ive also asked around in the datafusion discord, but it seems thats the wrong place.

@JanKaul
Copy link
Owner

JanKaul commented Nov 18, 2024

Sorry about that. The REST catalog currently has some issues. Maybe you could try the Glue catalog directly. I'll look into the REST catalog issues and come back to you once they have been resolved.

If you're using AWS directly you don't need to use testcontainers or set the localstack variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants