Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issues of aligning storage support with iceberg-java #408

Open
1 of 5 tasks
Xuanwo opened this issue Jun 19, 2024 · 11 comments
Open
1 of 5 tasks

Tracking issues of aligning storage support with iceberg-java #408

Xuanwo opened this issue Jun 19, 2024 · 11 comments
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@Xuanwo
Copy link
Member

Xuanwo commented Jun 19, 2024

iceberg-java now supports

  • aws s3
  • aliyun oss
  • azure adlsv2
  • gcp gcs
  • hadoop hdfs

Although OpenDAL supports more storage services than this, it still makes sense to at least support all existing storage services. This issue will track the progress.


After this been implemented, iceberg-rust will have the same storage support level as iceberg-java. I'm willing implement those features but also open to help review related changes. Please comment if you want to join the development and pick up one of them.

@Xuanwo Xuanwo self-assigned this Jun 19, 2024
@Xuanwo
Copy link
Member Author

Xuanwo commented Jun 19, 2024

cc @liurenjie1024, I'm not sure about your 0.3 release plan going. Maybe we can include this one inside?

Most changes should be easy and no API changes. It's also fine to be included in the following 0.3.x releases.

@Xuanwo Xuanwo added good first issue Good for newcomers help wanted Extra attention is needed labels Jun 19, 2024
@liurenjie1024
Copy link
Contributor

Hi, @Xuanwo There are two places to track 0.3 features:

  1. Tracking issues of iceberg-rust v0.3.0 #348
  2. https://github.com/apache/iceberg-rust/milestone/2

I'm ok with waiting for adding this into 0.3 release. I'm just curious how to test against these? Or maybe we can start with declaring these features as experimental.

@Xuanwo
Copy link
Member Author

Xuanwo commented Jun 19, 2024

I'm just curious how to test against these? Or maybe we can start with declaring these features as experimental.

  • aws s3: tested by minio now. We can add real s3 bucket in with sponsor.
  • aliyun oss: need an oss bucket (better to locate near us-east-1)
  • azure adlsv2: can be tested by Azurite. And I'm willing to provide test infra as Microsoft MVP.
  • gcp gcs: need a gcs bucket (better to locate near us-east-1)
  • hadoop hdfs: can setup in CI directly (thanks open source!)

I agree that we can label these features as experimental. Setting up the CI infrastructure requires time, more so than implementing those features.

@sdd
Copy link
Contributor

sdd commented Jun 19, 2024

I have a question on the current FileIO - @Xuanwo is probably the right person to ask here.

It would be useful to be able to customize the OpenDAL Operator by being able to attach layers. Could we extend expose this capability somewhere? I've more than happy to work on this.

@liurenjie1024
Copy link
Contributor

I'm just curious how to test against these? Or maybe we can start with declaring these features as experimental.

  • aws s3: tested by minio now. We can add real s3 bucket in with sponsor.
  • aliyun oss: need an oss bucket (better to locate near us-east-1)
  • azure adlsv2: can be tested by Azurite. And I'm willing to provide test infra as Microsoft MVP.
  • gcp gcs: need a gcs bucket (better to locate near us-east-1)
  • hadoop hdfs: can setup in CI directly (thanks open source!)

I agree that we can label these features as experimental. Setting up the CI infrastructure requires time, more so than implementing those features.

Cool, let's move!

@Xuanwo
Copy link
Member Author

Xuanwo commented Jun 19, 2024

It would be useful to be able to customize the OpenDAL Operator by being able to attach layers. Could we extend expose this capability somewhere? I've more than happy to work on this.

Any detailed ideas? Are you talking about enabling some existing layers for opendal or allow users to implement something new based on FileIO?

I can imagine that enabling logging and retry layers by default or by configuring might be useful.

@Xuanwo
Copy link
Member Author

Xuanwo commented Jun 19, 2024

I agree that we can label these features as experimental. Setting up the CI infrastructure requires time, more so than implementing those features.

Split into a new issue: #410.

I plan to track them after 0.3 release.

@jsimbadev
Copy link

@Xuanwo I can take the Azure datalake FileIO Implementation + the corresponding infrastructure set up, sound ok?

@Xuanwo
Copy link
Member Author

Xuanwo commented Jul 5, 2024

@Xuanwo I can take the Azure datalake FileIO Implementation + the corresponding infrastructure set up, sound ok?

Welcome, have fun!

@liurenjie1024
Copy link
Contributor

cc @Xuanwo Do you still plan to finish this before in 0.3.0? Or we can postpone it to next release?

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 6, 2024

cc @Xuanwo Do you still plan to finish this before in 0.3.0? Or we can postpone it to next release?

There are some more work to do at opendal side. I believe we can let 0.3.0 go first.

@liurenjie1024 liurenjie1024 removed this from the 0.3.0 Release milestone Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
Status: No status
Development

No branches or pull requests

4 participants