Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riofs - read after write failure (with any sort of newly created files) #124

Open
StefanCG opened this issue Aug 16, 2016 · 13 comments
Open

Comments

@StefanCG
Copy link

StefanCG commented Aug 16, 2016

Hi,

I just went through the other issues here and it seems that some people are dealing with the same issue I'm facing right now.

It seems that newly created files are not existing "on s3" once the write operation is complete which lets subsequent read operations fail. This sort of delay seems to affect all file operations like php file uploads, getimagesize() after upload etc. and bash operations aswell (dd,openssl etc.)

Is there a way to disable this asynchronous write behaviour of riofs?! I already tried to disable cache and tested various fuse options but nothing seems to work.

The only working solution here seems to be waiting for one or two seconds before running the next command that reads the file - which is a really dirty hack if you ask me.

Any suggestions/advice?! In fact the issue doesn't seem to affect s3fs though I'd prefer not to use that for various reasons. So I'd be happy if someone can help me to fix this

@wizzard
Copy link
Member

wizzard commented Aug 16, 2016

Hello,
the initial idea of RioFS was to allow to upload local files to S3 Cloud in an effective way. We implemented the caching mechanism to minimize the number of required HTTP requests.

I remember I added a few configuration parameters which control the caching, but I have to take a look at the code to give you the answer if it's possible to disable caching at all.

@bitwombat
Copy link

I believe I'm seeing the same problem when trying to rsync to an RioFS mounted dir (via encfs):

.PhpStorm2016.1/config/plugins/IdeaVim/classes/com/maddyhome/idea/vim/extension/surround/VimSurroundExtension.class
rsync: write failed on "/mnt/local/home/gbell2/.bzr.log": Input/output error (5)
rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]

This does not happen with sshfs. Happy to run other tests and/or send more debugging info.

@kahing
Copy link

kahing commented Aug 16, 2016

I believe this has nothing to do with caching. The reason is that riofs flushes the file on fuse release instead of flush (https://github.com/skoobe/riofs/blob/master/src/rfuse.c#L678 and https://github.com/skoobe/riofs/blob/master/src/rfuse.c#L1030). flush is inline (it blocks close() from application) whereas release is not. The result is that when application calls close(), then opens the the file again, riofs might not have actually uploaded the file yet (it might not have received the release call yet, or still in the process of uploading).

The proper thing to do would be flush the file on flush instead of release. It will also allow applications to know if the flush has failed (since close can return an error code but release is transparent to the application). It will also make the benchmark I make for https://github.com/kahing/goofys more realistic ;-)

@StefanCG
Copy link
Author

Well cache or not it would be really great if there'd be a config switch to tune upload behaviour. Such an asynchronous upload might be a great thing in general but it's bad for our use-case where we require the (small) files to be stored instantly on s3.

I think I'm not the only one that whould be really happy if there'd be any option to make uploads more synchronous ...

PS: As mentioned above I already tried some FUSE flags and I think I also tried to disable cache.

@wizzard
Copy link
Member

wizzard commented Aug 17, 2016

Hello,
could you please help me and prepare a small script which reproduces the described issue? It would be greatly appreciated!

@StefanCG
Copy link
Author

Sure. Would a PHP or bash based PoC be ok for you?

@wizzard
Copy link
Member

wizzard commented Aug 17, 2016

Bash would be better, thanks!

@StefanCG
Copy link
Author

StefanCG commented Aug 22, 2016

Here's a simple testcase using openSSL to generate a dummy cert. It represents the issue, even though the files created here are really small. The files we first noted the issue had some KB more in size.

openssl req -new -newkey rsa:4096 -nodes -out domain.csr -keyout domain.key -subj "/C=DE/ST=BW/L=Stuttgart/O=Test-Company/OU=Test/CN=mydomain.test"; openssl x509 -in domain.csr -req -signkey domain.key -out domain.pem -days 99

On a local filesystem
Generating a 4096 bit RSA private key ...........................................................................................++ .......................................................................................................................................................................................................................................................................................................++ writing new private key to 'domain.key' Signature ok subject=/C=DE/ST=BW/L=Stuttgart/O=Test-Company/OU=Test/CN=mydomain.test Getting Private key

While on riofs mounted volume (same region)
Generating a 4096 bit RSA private key ...................................................................................++ ...................++ writing new private key to 'domain.key' 139704966727328:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:703:Expecting: CERTIFICATE REQUEST

The error message results from openSSL being unable to read the CSR file created by first command. Note that if we enhance above command with a sleep (1 second e.g.) it does actually work.

So could you please help me to fix this somehow. I don't want to be using sleep as it's a really dirty hack ... A config switch that triggers synchronous behaviour for uploads would be fine aswell.

Update: fixed typo in example code (4096 vs 8192 bit)

@StefanCG
Copy link
Author

Hi, just wanted to ask if you've found some time to look into this.
I'd really like to disable the cache as it breaks everything in my use-case but it doesn't seem to work.

@StefanCG
Copy link
Author

StefanCG commented Sep 8, 2016

Hi,

just wanted to let you know that I'm now using the s3 API for some uploads and even then there is a delay on the riofs mount (file not found) while it's available on the API immediately.

So this doesn't seem to be just an upload cache related issue. Any suggestions how to fix this?

@gaul
Copy link

gaul commented Sep 3, 2017

Flushing on release confuses me -- I expect flush on close like NFSv3:

Version 3 clients use COMMIT operations when flushing safe asynchronous writes to the server during a close(2) or fsync(2) system call, or when encountering memory pressure.

From http://nfs.sourceforge.net/#faq_a1 .

@wizzard
Copy link
Member

wizzard commented Sep 4, 2017

@andrewgaul If I'm not mistaken (I haven't worked with FUSE for some time), FUSE calls "release()" when a file handle is closed.

There are many other places where RioFS could be improved, unfortunately at this time I do not have any free time to work on this project.

@kahing
Copy link

kahing commented Sep 4, 2017

fuse does call release() when a file is closed, but that call is async, which means that close() will return before release() is done. The catch is that flush() can be called multiple times (dup(2)).

lourot added a commit to ghuser-io/ghuser.io that referenced this issue Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants