-
Notifications
You must be signed in to change notification settings - Fork 101
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE REQUEST] Add basic HDFS storage option for catalogs #85
Comments
We can add HDFS support, but handling credentials might be difficult. @sfc-gh-schen, can you provide more insight? I think it may not be possible for credential vending. |
Would permissive HDFS be a simpler first ask with potential evolutions in the future to support credentials? Aka start off assuming that the HDFS cluster is open (no Kerberos, etc), running on-premise protected by networking, and then consider a proper strategy in the future? Sorry if I'm off the mark on what you meant by your last comment but I assumed it's relating to authn/authz against HDFS (and not mapping to an internal strategy in Polaris). |
Yup, it's reasonable to start with non-authentication and non-authorization for HDFS. |
I don't mind trying my hand at this for the simple case and provide a baseline for people to extend in the future. I think in order for this to work, due to the DFS is created inside of Iceberg core with the Hadoop configuration object we initialize, we'll need to rely on the I can see if I find time one of these days to throw something together in a PR (as an non-tested PoC) just to get feedback on if it's something we want to move forward |
@flyrain QQ about the repo which I noticed starting to write the changes required by this MR:
|
cc @dennishuo @collado-mike @eric-maynard for the first question. for 2, we cannot really do that without a sponsor of cloud environments. We have discussed using minIO to simulate it. But for HDFS, it should be OK to add integration tests. |
Is your feature request related to a problem? Please describe.
Currently it appears that the storage options are geared towards cloud providers. To support companies running on premise I would like to request HDFS support.
Describe the solution you'd like
Catalog R/W support HDFS as a storage option.
For the first implementation we would like something basic:
The text was updated successfully, but these errors were encountered: