Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LibOS] Add O_APPEND emulation for single process #1935

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dimakuv
Copy link
Contributor

@dimakuv dimakuv commented Jul 4, 2024

Description of the changes

This PR adds O_APPEND emulation, but only within a single process.

Note that Gramine doesn't support multi-process operations on the same file anyhow, in particular, the file position is not synchronized among multiple processes. This PR doesn't change this limitation of Gramine, thus O_APPEND writes don't work correctly (may overwrite file contents).

Note that Linux has a bug in pwrite(). Gramine emulates this bug in the same way.

Fixes #1921.

See also:

How to test this PR?

A new FS test is added. One LTP test is uncommented.


This change is Reviewable

Note that Gramine doesn't support multi-process operations on the same
file anyhow, in particular, the file position is not synchronized among
multiple processes. This commit doesn't change this limitation of
Gramine, thus `O_APPEND` writes don't work correctly (may overwrite file
contents).

Note that Linux has a bug in `pwrite()`. Gramine emulates this bug
in the same way.

A new FS test is added. One LTP test is uncommented.

Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
@dimakuv dimakuv force-pushed the dimakuv/add-oappend-only-singleprocess branch from a2a9b6b to 4950661 Compare July 9, 2024 10:01
Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 18 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

a discussion (no related file):
FYI: implementation of O_APPEND in the Linux kernel is very simple and self-contained. I searched for APPEND keyword and these are the main sources:

I hope I covered all Linux logic on O_APPEND in this PR. (Note that some logic was already implemented in Gramine.)



libos/src/fs/chroot/encrypted.c line 480 at r2 (raw file):

    lock(&hdl->inode->lock);

    file_off_t actual_pos = (hdl->flags & O_APPEND) ? hdl->inode->size : *pos;

I admit that I don't understand the locking in these handlers... Technically, O_APPEND flag may be changed at runtime via fcntl(SET_FL), so there must be smth like lock(&hdl->lock). At the same time, I don't observe such locking in other (similar) places, and it's also confusing what needs to be locked where.

So for now I do not perform additional locking. I even wonder if hdl->flags can be considered thread-safe for well-behaved apps, because if some app changes the flags in one thread and e.g. writes in another thread, that's a data race and a bug in the app.


libos/src/sys/libos_file.c line 445 at r2 (raw file):

    }

    if (!(in_hdl->acc_mode & MAY_READ) || !(out_hdl->acc_mode & MAY_WRITE)

FYI: this change is orthogonal to this PR. Surprisingly, there was already handling of O_APPEND in this sendfile emulation, though it returned an incorrect error code. So I fixed this and also added two other checks. This could be moved out to a separate PR if needed.


libos/test/fs/test_fs.py line 136 at r2 (raw file):

        self.assertIn('TEST 1 (' + file_path + ')', stdout)
        self.assertIn('TEST 2 (' + file_path + ')', stdout)
        self.assertIn('TEST 3 (' + file_path + ')', stdout)

FYI: Other FS tests also do things like self.assertIn(close(...)), but there are so many lines in the log, that I found it stupid to check each line. So just checking that sub-tests are at least started and printed out. And of course the whole test must run to successful completion.

@dimakuv dimakuv marked this pull request as ready for review July 9, 2024 10:09
@dimakuv
Copy link
Contributor Author

dimakuv commented Jul 30, 2024

Jenkins, test this please (Jenkins was dead when this PR was created)

Copy link
Contributor

@efu39 efu39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 18 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel)


libos/src/sys/libos_file.c line 445 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

FYI: this change is orthogonal to this PR. Surprisingly, there was already handling of O_APPEND in this sendfile emulation, though it returned an incorrect error code. So I fixed this and also added two other checks. This could be moved out to a separate PR if needed.

According to the sendfile man page, it appears that the original error code -EINVAL is intended to be returned when out_hdl has the O_APPEND flag set.
"EINVAL out_fd has the O_APPEND flag set. This is not currently
supported by sendfile()."

Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 7 files at r1, 15 of 17 files at r2, all commit messages.
Reviewable status: 16 of 18 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv and @efu39)


libos/src/sys/libos_file.c line 445 at r2 (raw file):

Previously, efu39 (Erica Fu) wrote…

According to the sendfile man page, it appears that the original error code -EINVAL is intended to be returned when out_hdl has the O_APPEND flag set.
"EINVAL out_fd has the O_APPEND flag set. This is not currently
supported by sendfile()."

+1

Why change the previous check on outfd w/ O_APPEND to return -EBADF (which should return -EINVAL)?


libos/src/sys/libos_open.c line 335 at r2 (raw file):

    /*
     * Note that Linux has the following bug: POSIX requires that opening a file with the O_APPEND
     * flag should have no affect on the location at which pwrite() writes data. However, on Linux,

-> effect

Code quote:

affect

libos/test/fs/read_append.c line 36 at r2 (raw file):

    printf("close(%s) OK\n", file_path);
    if (file_size(file_path) != size)
        fatal_error("File size is wrong (expected sizeof(buf1))\n");

This looks a bit weird -- shouldn't we print the expected and got bytes?

Code quote:

expected sizeof(buf1)

libos/test/fs/read_append.c line 73 at r2 (raw file):

    printf("close(%s) OK\n", file_path);
    if (file_size(file_path) != size * 2)
        fatal_error("File size is wrong (expected sizeof(buf1) * 2)\n");

ditto

Code quote:

expected sizeof(buf1) * 2

libos/test/fs/read_append.c line 105 at r2 (raw file):

    printf("close(%s) OK\n", file_path);
    if (file_size(file_path) != size * 3)
        fatal_error("File size is wrong (expected sizeof(buf1) * 3)\n");

ditto

Code quote:

expected sizeof(buf1) * 3

@dimakuv dimakuv added the P: 1 label Oct 29, 2024
Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR doesn't change this limitation of Gramine, thus O_APPEND writes don't work correctly (may overwrite file contents).

I think we want to log some warning when e.g., child process tries to write an O_APPEND fd created by parent process?

Reviewable status: 16 of 18 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv and @efu39)

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 13 of 18 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)

a discussion (no related file):
@kailun-qin wrote:

I think we want to log some warning when e.g., child process tries to write an O_APPEND fd created by parent process?

Yes, for some reason I thought I implemented that... But now I see that I didn't add this logic. Unfortunately, this is not that trivial to add here, because I'm not sure to which file types it should be applied, and whether we should keep track of the offset (like, the file was opened in APPEND mode but was never written to, and then fork() happens, in this case the child would be the only process that uses this file and it's actually safe).

So I decided not to implement it in this PR, as we won't have time to finalize this before I'm gone.

(For info, we mean the trick like used in secure eventfd: 51e99f9)



libos/src/sys/libos_file.c line 445 at r2 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

+1

Why change the previous check on outfd w/ O_APPEND to return -EBADF (which should return -EINVAL)?

Interesting, you're right after some quick testing on Linux.

I was confused by this Linux code: https://elixir.bootlin.com/linux/v6.11.5/source/fs/read_write.c#L1735. But apparently this code path is not used in sendfile().

Done.


libos/src/sys/libos_open.c line 335 at r2 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

-> effect

Done.


libos/test/fs/read_append.c line 36 at r2 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

This looks a bit weird -- shouldn't we print the expected and got bytes?

Done.


libos/test/fs/read_append.c line 73 at r2 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

ditto

Done.


libos/test/fs/read_append.c line 105 at r2 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

ditto

Done.

Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: 16 of 18 files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@kailun-qin wrote:

I think we want to log some warning when e.g., child process tries to write an O_APPEND fd created by parent process?

Yes, for some reason I thought I implemented that... But now I see that I didn't add this logic. Unfortunately, this is not that trivial to add here, because I'm not sure to which file types it should be applied, and whether we should keep track of the offset (like, the file was opened in APPEND mode but was never written to, and then fork() happens, in this case the child would be the only process that uses this file and it's actually safe).

So I decided not to implement it in this PR, as we won't have time to finalize this before I'm gone.

(For info, we mean the trick like used in secure eventfd: 51e99f9)

Yeah, fair enough. I'm not blocking as we mentioned this single-process limitation in the doc and Gramine doesn't support multi-process operations on the same file anyhow (w/o warning logs).


Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 18 files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)


libos/src/sys/libos_file.c line 445 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Interesting, you're right after some quick testing on Linux.

I was confused by this Linux code: https://elixir.bootlin.com/linux/v6.11.5/source/fs/read_write.c#L1735. But apparently this code path is not used in sendfile().

Done.

Yeah, it looks to be checked here: https://elixir.bootlin.com/linux/v6.11.5/source/fs/splice.c#L1204.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MariaDB Replication Is Broken
3 participants