Skip to content

Commit

Permalink
[LibOS,PAL] Emulate file-backed mmap via PAL read/write APIs
Browse files Browse the repository at this point in the history
Previously, the `chroot` FS (plain host-backed files) used the
`PalStreamMap()` PAL API for file-backed mmap. This had three problems:
1) need to implement a non-trivial `map` callback in PALs;
2) discrepancy between `map` implementations in different PALs;
3) hard to debug file-mmap bugs because they only reproduced on SGX
   (`gramine-sgx`) PAL and not on Linux (`gramine-direct`) PAL.

Note that other FSes already used emulated file-backed mmap: `tmpfs` and
`encrypted` FSes emulate such mmaps via `PalStreamRead()` and
`PalStreamWrite()`.

This commit switches `chroot` FS to use emulated file-backed mmap. This
way, `chroot` becomes similar in implementation to `tmpfs` and
`encrypted` FSes. Only `shm` FS still uses `PalDeviceMap()` (previously
called `PalStreamMap()`) because devices with shared memory have
non-standard semantics of mmaps.  Corresponding `file_map()` functions
in PAL are removed.

In this commit, we also introduce the model of "logical" split within a
single VMA: some prefix of the VMA is accessible (has valid pages),
while the rest is unmapped (returns SIGBUS). Only file-backed VMAs are
split in this way (anonymous-memory VMAs can't be in "unmapped" state).
This logical split is achieved via a new `vma->valid_length` field.

The switch to emulated mmap uncovered several bugs:
- Underlying file may be shorter than the requested mmap size. In this
  case access beyond the last file-backed page must cause SIGBUS.
  Previously this semantics worked only on `gramine-direct` and wasn't
  implemented on `gramine-sgx` (even with EDMM).
- As a consequence of the semantics above, file-growing `write()` and
  `ftruncate()` on already-mmapped file must make newly extended file
  contents accessible. Previously it didn't work on `gramine-sgx` (with
  EDMM), now it is resolved via `prot_refresh_mmaped_from_file_handle()`
  call.
- `msync()` must update file contents with the mmapped-in-process
  contents, but only those parts that do not exceed the file size.
  Previously there was a bug that msync'ed even the exceeding parts.
- Applications expect `msync(MS_ASYNC)` to update file contents before
  the next app access to the file. Gramine instead ignored such
  requests, leading to accessing stale contents. We fix this bug by
  treating `MS_ASYNC` the same way as `MS_SYNC`. This bug was detected
  on LTP test `msync01`.

A few more FS tests are enabled on SGX now. Generally, `gramine-sgx` now
supports shared file-backed mappings, i.e. `mmap(MAP_SHARED)`. New LibOS
test `mmap_file_sigbus` is added; old bad `mmap_file` test is removed.

Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
  • Loading branch information
Dmitrii Kuvaiskii committed Jul 29, 2024
1 parent a173a4f commit 4a5c4a1
Show file tree
Hide file tree
Showing 35 changed files with 682 additions and 584 deletions.
6 changes: 3 additions & 3 deletions Documentation/pal/host-abi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -175,9 +175,6 @@ applications.
.. doxygenfunction:: PalStreamDelete
:project: pal

.. doxygenfunction:: PalStreamMap
:project: pal

.. doxygenfunction:: PalStreamSetLength
:project: pal

Expand Down Expand Up @@ -366,3 +363,6 @@ random bits, to obtain an attestation report and quote, etc.

.. doxygenfunction:: PalGetSpecialKey
:project: pal

.. doxygenfunction:: PalDeviceMap
:project: pal
27 changes: 17 additions & 10 deletions libos/include/libos_fs.h
Original file line number Diff line number Diff line change
Expand Up @@ -108,21 +108,28 @@ struct libos_fs_ops {
/*
* \brief Map file at an address.
*
* \param hdl File handle.
* \param addr Address of the memory region. Cannot be NULL.
* \param size Size of the memory region.
* \param prot Permissions for the memory region (`PROT_*`).
* \param flags `mmap` flags (`MAP_*`).
* \param offset Offset in file.
*
* Maps the file at given address. This might involve mapping directly (`PalStreamMap`), or
* \param hdl File handle.
* \param addr Address of the memory region. Cannot be NULL.
* \param size Size of the memory region.
* \param prot Permissions for the memory region (`PROT_*`).
* \param flags `mmap` flags (`MAP_*`).
* \param offset Offset in file.
* \param[out] out_valid_size Valid size (i.e. backed by file).
*
* Maps the file at given address. This might involve mapping directly (`PalDeviceMap`), or
* mapping anonymous memory (`PalVirtualMemoryAlloc`) and writing data.
*
* The contents of the mapping are initialized using `size` bytes starting at `offset` offset in
* the file. For a file size that is not a multiple of the page size, the remaining bytes on the
* last page are zeroed. Pages that are not backed by file contents are inaccessible
* (effectively they have PROT_NONE permissions). This function returns the valid size (i.e. the
* pages backed by file contents) in `out_valid_size`.
*
* `addr`, `offset` and `size` must be alloc-aligned (see `IS_ALLOC_ALIGNED*` macros in
* `libos_internal.h`).
*/
int (*mmap)(struct libos_handle* hdl, void* addr, size_t size, int prot, int flags,
uint64_t offset);
uint64_t offset, size_t* out_valid_size);

/*
* \brief Write back mapped memory to file.
Expand Down Expand Up @@ -968,7 +975,7 @@ file_off_t generic_inode_seek(struct libos_handle* hdl, file_off_t offset, int o
int generic_inode_poll(struct libos_handle* hdl, int in_events, int* out_events);

int generic_emulated_mmap(struct libos_handle* hdl, void* addr, size_t size, int prot, int flags,
uint64_t offset);
uint64_t offset, size_t* valid_size);
int generic_emulated_msync(struct libos_handle* hdl, void* addr, size_t size, int prot, int flags,
uint64_t offset);
int generic_truncate(struct libos_handle* hdl, file_off_t size);
Expand Down
2 changes: 0 additions & 2 deletions libos/include/libos_handle.h
Original file line number Diff line number Diff line change
Expand Up @@ -303,8 +303,6 @@ int init_exec_handle(const char* const* argv, char*** out_new_argv);

int open_executable(struct libos_handle* hdl, const char* path);

int get_file_size(struct libos_handle* file, uint64_t* size);

ssize_t do_handle_read(struct libos_handle* hdl, void* buf, size_t count);
ssize_t do_handle_write(struct libos_handle* hdl, const void* buf, size_t count);

Expand Down
7 changes: 7 additions & 0 deletions libos/include/libos_vma.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
struct libos_vma_info {
void* addr;
size_t length;
size_t valid_length; // memory accesses beyond valid_length result in SIGBUS/EFAULT
int prot; // memory protection flags: PROT_*
int flags; // MAP_* and VMA_*
struct libos_handle* file;
Expand Down Expand Up @@ -99,6 +100,9 @@ int bkeep_mmap_any(size_t length, int prot, int flags, struct libos_handle* file
int bkeep_mmap_any_aslr(size_t length, int prot, int flags, struct libos_handle* file,
uint64_t offset, const char* comment, void** ret_val_ptr);

/* Looks up VMA that starts at `begin_addr` and if found, updates `vma->valid_length`. */
int bkeep_vma_update_valid_length(void* begin_addr, size_t valid_length);

/* Looking up VMA that contains `addr`. If one is found, returns its description in `vma_info`.
* This function increases ref-count of `vma_info->file` by one (if it is not NULL). */
int lookup_vma(void* addr, struct libos_vma_info* vma_info);
Expand Down Expand Up @@ -133,6 +137,9 @@ int msync_handle(struct libos_handle* hdl);
/* Reload file mappings of `hdl` */
int reload_mmaped_from_file_handle(struct libos_handle* hdl);

/* Refresh page protections of file mappings of `hdl` when the file size has changed */
int prot_refresh_mmaped_from_file_handle(struct libos_handle* hdl, size_t file_size);

void debug_print_all_vmas(void);

/* Returns the peak amount of memory usage */
Expand Down
21 changes: 0 additions & 21 deletions libos/src/bookkeep/libos_handle.c
Original file line number Diff line number Diff line change
Expand Up @@ -548,27 +548,6 @@ void put_handle(struct libos_handle* hdl) {
}
}

int get_file_size(struct libos_handle* hdl, uint64_t* size) {
if (!hdl->fs || !hdl->fs->fs_ops)
return -EINVAL;

if (hdl->fs->fs_ops->hstat) {
struct stat stat;
int ret = hdl->fs->fs_ops->hstat(hdl, &stat);
if (ret < 0) {
return ret;
}
if (stat.st_size < 0) {
return -EINVAL;
}
*size = (uint64_t)stat.st_size;
return 0;
}

*size = 0;
return 0;
}

static struct libos_handle_map* get_new_handle_map(uint32_t size) {
struct libos_handle_map* handle_map = calloc(1, sizeof(struct libos_handle_map));

Expand Down
8 changes: 2 additions & 6 deletions libos/src/bookkeep/libos_signal.c
Original file line number Diff line number Diff line change
Expand Up @@ -361,12 +361,8 @@ static void memfault_upcall(bool is_in_pal, uintptr_t addr, PAL_CONTEXT* context
struct libos_handle* file = vma_info.file;
if (file && file->type == TYPE_CHROOT) {
/* If the mapping exceeds end of a file then return a SIGBUS. */
lock(&file->inode->lock);
file_off_t size = file->inode->size;
unlock(&file->inode->lock);

uintptr_t eof_in_vma = (uintptr_t)vma_info.addr + (size - vma_info.file_offset);
if (addr > eof_in_vma) {
uintptr_t eof_in_vma = (uintptr_t)vma_info.addr + vma_info.valid_length;
if (addr >= eof_in_vma) {
info.si_signo = SIGBUS;
info.si_code = BUS_ADRERR;
} else {
Expand Down
Loading

0 comments on commit 4a5c4a1

Please sign in to comment.