-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3: provide java.nio.FileSystem implementation #1388
Comments
We've talked about this recently, actually. It would be cool to have, but our friends over in .NET (who have done something similar) have stated that it's actually surprisingly tricky to do correctly. Marking as feature request. |
Big +1 for this feature. We maintain a fork of Upplication project (see here), but it would be very useful to have an official implementation. |
Has there been any changes and are there any plans to make official support? :) |
We still think it would be a cool idea. Right now the team is focused on getting customer's favorite V1 features into V2. Once we've gotten further along in that process, we can start more seriously considering new, cool features like this one. |
Great! Thanks for the update! Please keep us in the loop :) |
@millems This would be very useful for us. The proliferation of forks of the Upplication provider ( which is itself a fork of an older provider) causes a lot of confusion. Google has a very robust open source Path provider for |
Are there any updates on this? |
Sorry, this still has not been prioritized. |
Google's NIO storage provider works really well and is pretty straight-forward to integrate in any project. What would be required in order to get the ball rolling for an S3 provider as well? If somebody were to, say, walk over the different forks of Upplication/Amazon-S3-FileSystem-NIO2 and merge the useful changes that people have done in their forks, would the Amazon team be interested in adopting such a fork and continuing the work? |
Unfortunately we aren't able to take on the project ourselves right now, even in just a maintenance capacity. That might change in the future, as demand for this feature rises (both here on Github and via any other official AWS channels of communication) or demand for our time elsewhere falls. Until such time, we would be surprised and delighted if the open source community were to take up the mantle and develop such a feature. We'd be willing to provide any kind of AWS expertise you might require in the design or development of such a project. |
@millems , would you mind elaborating? What were the issues? What were they trying to do and what exactly didn't work? |
Having been involved in the development of the google implementation as well as currently developing a generic https filesystem provider, I can say that it's a reasonable amount of work but definitely not insurmountable. One person working part time for a year should be able to come up with a very good solution. It probably has to be iterated though as new error modes are discovered / appear due to changes in the underlying infrastructure. I would say the hardest part is making it robust against intermittent failure. A file system can't fail at the same rate the internet does so every operation has to be able to continue and retry in the face of failures. This is the sort of project that definitely benefits from a set of dedicated maintainers rather than a hodgepodge of forks with their own solutions. |
@lbergelson's summary is great. @normj can weigh in on the struggles encountered doing it for .NET. |
The "year" of time might sound more intense than I meant. It took initial work but then needed continual adjustment over time as we discovered new rare edge cases through use. Not a solid year of someone writing code. |
For the .NET SDK we have a similar feature where make S3 look like a file system matching the .NET File IO API. Although it does make it easier to traverse it does cause pitfalls that are not obvious to the user because S3 really isn't a filesystem. For example the .NET File IO has file operations to append to an existing file. Looks simple and very tempting API for users to call. Under the cover we have to download the object concat the new data and reupload the data. Also if you do a simple File system operation like move or rename directories S3 doesn't really have directories and you end up having to get list all of the objects copy them over and then delete them. If there are a lot of objects under that S3 virtual directory this can be very costly. So although we have the similar approach in .NET it has cause a lot of confusion for users, especially new to S3, that I actually regret us having the feature. I would rather users of S3 know what manipulations they are doing to S3 then doing what looks like a simple operations but getting a big surprise when it is actually very slow and costly operation. |
Thanks for sharing your experience with GCS, as well as the S3 .Net implementation! |
Hi guys, We are pleased to let you know that we've created a spin-off project (rebranded fork called s3fs-nio) of Upplication/Amazon-S3-FileSystem-NIO2. This is a spin-off (of the latest Upplication/Amazon-S3-FileSystem-NIO2 As there is a need for such a library, we have decided to take on this task and rebuild an active project with a knowledge base, chat channel and helpful community around it. Ultimately, we would really appreciate it, if we could have some of the Amazon folks helping out with advice and reviewing pull requests, as this would be a massive help for us! We've done a big clean up of the code, upgraded its depenencies and migrated to AWS SDK v2 (special thanks to @ptirador for all the hard work, as well as to @elerch + @markjschreiber for their advice and reviews!). The project is actively tested against JDK 8 and 11 via Github Actions. Our work is not done and we would like to keep working on this project! We intend to invite all the contributors with open pull requests to join our efforts and forward-port their fixes to our project. If anyone is interested in lending a hand and joining our project, please reach out, as we have plenty to do! :) @ashleymercer : Could you please add us to your list above? Thanks! :) cc: @ptirador , @steve-todorov , @sbespalov |
It seems that s3fs-nio didn't came to the point where a first version was finished. |
I hope so. I wrote that package as I needed something that used the AWS SDK v2 and offered the option of standard S3Clients and Async S3 clients while also making a clean break from some of the approaches of the Upplication library. Currently |
@markjschreiber thank you! Yes, we certainly need write access and would obviously prefer to have it within the library you wrote. We also don't need random writes, so a complete put of the file would be enough for our use case at least. |
Makes sense (we should probably also move this discussion to that project). I'm certainly happy to have review any pull request, even on that is a work in progress (WIP). Let's follow up with more detailed requirements in a discussion at https://github.com/awslabs/aws-java-nio-spi-for-s3 |
Hi everyone, thank you for your interest in seeing S3 FileSystem integration supported in the Java SDK v2. Given that the Closing. |
|
Hey guys, After a long wait, we'd like to let you know that we've cut a release for s3fs-nio as We are also working on improving our documentation and contributions would be highly appreciated. We would like to welcome you to test and report back any findings! For those of you interested in contributing, there is plenty yet to be done and we'd be more than happy to have you aboard! Looking forward to your feedback! Happy coding! :) |
Expected Behavior
The
java.nio.FileSystem
API provides an abstraction for dealing with different types of file systems, and for accessing files and folders within that file system. AWS should provide an implementation of this interface backed by an s3 client.Current Behavior
Application code either has to explicitly know about s3 (by passing around
S3Client
everywhere) or else use some custom abstraction, which inevitably ends up being a half-baked implementation of parts of theFileSystem
andPath
APIs anyway.Possible Solution
There are two existing attempts to solve this problem that I can find:
Upplication/Amazon-S3-FileSystem-NIO2
elerch/Amazon-S3-FileSystem-NIO2
aws-sdk-java-v2
carlspring/s3fs-nio
Fortunately, this code is MIT-licensed so perhaps could form the basis of an official library?
Context
Developing application code against s3 can be problematic because it's not always possible to have a live s3 instance available: access policies might be very strict (no access to s3 from outside corp network, or only from specific (non-dev) machines), or developers might not even have internet access at all when e.g. working on the move.
Attempts to solve this problem in a different way exists (e.g. libraries which provide a local webserver with an s3-like interface) but this adds another set of dependencies, another tool to have to learn / configure / debug.
In my view, a cleaner solution would be to provide an implementation of
java.nio.FileSystem
which is the standard Java abstraction for dealing with different file systems. Application code would only need to talk tojava.nio.file.Path
and friends and developers could be confident of their code working reliably regardless of whether it's running against local disk storage, or s3.The text was updated successfully, but these errors were encountered: