Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for seekable read/writes #307

Merged
merged 3 commits into from
Sep 17, 2022
Merged

Add API for seekable read/writes #307

merged 3 commits into from
Sep 17, 2022

Conversation

nojb
Copy link
Contributor

@nojb nojb commented Sep 4, 2022

This partly addresses #305. Currently just a draft, meant for discussing the API.

The PR adds a method #read_at_offset to the object returned by open_in and with_open_in allowing to do a read at a given offset. The new method is exposed as Eio.Fs.read_at_offset and a helper function is added as well Eio.Fs.read_exact_at_offset. Not sure if this is the right level at which to expose this functionality.

I'm just getting started with this library and am not sure I have yet grokked the full picture, so all feedback is welcome!

Copy link
Collaborator

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I added some suggestions inline (I'm not quite sure what the best API would be either).

lib_eio/fs.ml Outdated
@@ -12,6 +12,10 @@ exception Already_exists of path * exn
exception Not_found of path * exn
exception Permission_denied of path * exn

class virtual read_at_offset = object
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well make a ro type for this and related operations. We already have rw type.

Suggested change
class virtual read_at_offset = object
class virtual ro = object (_ : <Generic.t; Flow.source; ..>)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

lib_eio/fs.ml Outdated
@@ -29,7 +33,7 @@ type create = [
(** Note: use the functions in {!Path} to access directories. *)
class virtual dir = object (_ : #Generic.t)
method probe _ = None
method virtual open_in : sw:Switch.t -> path -> <Flow.source; Flow.close>
method virtual open_in : sw:Switch.t -> path -> <Flow.source; Flow.close; read_at_offset>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
method virtual open_in : sw:Switch.t -> path -> <Flow.source; Flow.close; read_at_offset>
method virtual open_in : sw:Switch.t -> path -> <ro; Flow.close>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

lib_eio/fs.ml Outdated
@@ -12,6 +12,10 @@ exception Already_exists of path * exn
exception Not_found of path * exn
exception Permission_denied of path * exn

class virtual read_at_offset = object
method virtual read_at_offset : int64 -> Cstruct.t -> int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't particularly like the name, but POSIX and Lwt_unix both call this pread.

Elsewhere, we're using Int63.t as the type for offsets, to avoid an allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed to pread, and switched to Int63.t.

lib_eio/fs.ml Outdated
class virtual read_at_offset = object
method virtual read_at_offset : int64 -> Cstruct.t -> int
end

class virtual rw = object (_ : <Generic.t; Flow.source; Flow.sink; ..>)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class virtual rw = object (_ : <Generic.t; Flow.source; Flow.sink; ..>)
class virtual rw = object (_ : <ro; Flow.sink; ..>)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

lib_eio/fs.ml Outdated
@@ -48,3 +52,15 @@ and virtual dir_with_close = object
inherit dir
method virtual close : unit
end

let read_at_offset (t : #read_at_offset) ofs buf =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just call this read and take an optional argument. eio_linux.mli uses file_offset, e.g.

Suggested change
let read_at_offset (t : #read_at_offset) ofs buf =
let read (t : #ro) ?file_offset buf =

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I was in two minds because there already is a Flow.read which corresponds to the case of file_offset = None. So for the moment I renamed this to pread, with a mandatory ~file_offset argument (to avoid duplicating the functionality of Flow.read which may be a bit confusing).

@nojb
Copy link
Contributor Author

nojb commented Sep 10, 2022

Thanks for the review @talex5. I amended the PR roughly as suggested, and added an analogous "pwrite" API. Let me know what you think. Thanks!

Copy link
Collaborator

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Can be merged once squashed and no longer marked as draft.

Two things I wonder about:

  • I see the linux and luv backends both accept a list of buffers. Maybe that should be exposed to users in the cross-platform API too?
  • Both backend APIs use write and write_single, whereas the new cross-platform API uses write_exact and write. (OCaml itself has Unix.single_write, but I dislike that word order because it's hard to find when using tab-completion).

@nojb
Copy link
Contributor Author

nojb commented Sep 13, 2022

  • I see the linux and luv backends both accept a list of buffers. Maybe that should be exposed to users in the cross-platform API too?

Makes sense, will amend.

  • Both backend APIs use write and write_single, whereas the new cross-platform API uses write_exact and write. (OCaml itself has Unix.single_write, but I dislike that word order because it's hard to find when using tab-completion).

Since read is already used by flow I kept the p prefix: pread and pwrite. Does pread_single/pread and pwrite_single/pwrite sound good for the new functions?

@talex5
Copy link
Collaborator

talex5 commented Sep 13, 2022

Since read is already used by flow I kept the p prefix: pread and pwrite. Does pread_single/pread and pwrite_single/pwrite sound good for the new functions?

I'm a bit nervous about pread. Short reads are useful at the end of a file and it might be unexpected if that's an error. Perhaps it's best to be explicit and have p{read,write}_{all,single}.

@nojb
Copy link
Contributor Author

nojb commented Sep 13, 2022

Since read is already used by flow I kept the p prefix: pread and pwrite. Does pread_single/pread and pwrite_single/pwrite sound good for the new functions?

I'm a bit nervous about pread. Short reads are useful at the end of a file and it might be unexpected if that's an error. Perhaps it's best to be explicit and have p{read,write}_{all,single}.

Sorry, FWIW I just remembered why I had used pread and pread_exact originally: it matches the terminology in flow's read and read_exact.

Anyway, I don't have a strong opinion either way: pread/pread_exact or pread_single/pread_all. Let me know which one strikes you as preferable...

@talex5
Copy link
Collaborator

talex5 commented Sep 14, 2022

Sorry, FWIW I just remembered why I had used pread and pread_exact originally: it matches the terminology in flow's read and read_exact.

That seems reasonable. Let's use pread/pread_exact for now then, and we can rename to p?read_single in both places in a future PR, if we decide that's better.

Anyway, I don't have a strong opinion either way: pread/pread_exact or pread_single/pread_all. Let me know which one strikes you as preferable...

pread_all sounds like it might mean to read all of the file.

@nojb nojb marked this pull request as ready for review September 14, 2022 17:42
@nojb
Copy link
Contributor Author

nojb commented Sep 14, 2022

That seems reasonable.

Perfect, I squashed the PR, and exposed the possibility of passing a list of buffers, as suggested. This is ready for merging on my side. Thanks!

Also, use `pwrite_single` and `pwrite_all` for clarity, and don't treat
a 0-length write as `End_of_file`.
Copy link
Collaborator

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit with the suggested changes.

I also moved the file operations out to a new Eio.File module, added some doc-strings, and renamed the write operations slightly.

If you're happy with that, it can be merged.

Comment on lines 565 to 567
match File.write_single ~file_offset fd bufs |> or_raise |> Unsigned.Size_t.to_int with
| 0 -> raise End_of_file
| got -> got
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is right. 0 doesn't have a special meaning for pwrite.

Suggested change
match File.write_single ~file_offset fd bufs |> or_raise |> Unsigned.Size_t.to_int with
| 0 -> raise End_of_file
| got -> got
File.write_single ~file_offset fd bufs |> or_raise |> Unsigned.Size_t.to_int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

lib_eio/fs.ml Outdated
Comment on lines 22 to 23
method probe _ = None
method read_methods = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would make more sense to move these to ro, since they are about reading.

@nojb
Copy link
Contributor Author

nojb commented Sep 16, 2022

If you're happy with that, it can be merged.

LGTM, thanks.

@haesbaert
Copy link
Contributor

For consistency, File.write is binding the result with or_raise (t -> Luv.Buffer.t list -> unit), shouldn't we make File.write_single the same ?

For consistency with the rest of the API.

Spotted by Christiano Haesbaert.
@talex5
Copy link
Collaborator

talex5 commented Sep 17, 2022

For consistency, File.write is binding the result with or_raise (t -> Luv.Buffer.t list -> unit), shouldn't we make File.write_single the same ?

Good point. I pushed a commit making write return a result, to match the rest of the API, and also exposed write_single.

@talex5 talex5 merged commit 5a77787 into ocaml-multicore:main Sep 17, 2022
@nojb nojb deleted the seek branch September 17, 2022 16:32
talex5 added a commit to talex5/opam-repository that referenced this pull request Oct 12, 2022
CHANGES:

Changes:

- Update to OCaml 5.0.0~beta1 (@anmonteiro @talex5 ocaml-multicore/eio#346).

- Add API for seekable read/writes (@nojb ocaml-multicore/eio#307).

- Add `Flow.write` (@haesbaert ocaml-multicore/eio#318).
  This provides an optimised alternative to `copy` in the case where you are writing from a buffer.

- Add `Net.with_tcp_connect` (@bikallem ocaml-multicore/eio#302).
  Convenience function for opening a TCP connection.

- Add `Eio.Time.Timeout` (@talex5 ocaml-multicore/eio#320).
  Makes it easier to pass timeouts around.

- Add `Eio_mock.Clock` (@talex5 ocaml-multicore/eio#328).
  Control time in tests.

- Add `Buf_read.take_while1` and `skip_while1` (@bikallem ocaml-multicore/eio#309).
  These fail if no characters match.

- Make the type parameter for `Promise.t` covariant (@anmonteiro ocaml-multicore/eio#300).

- Move list functions into a dedicated submodule (@raphael-proust ocaml-multicore/eio#315).

- Direct implementation of `Flow.source_string` (@c-cube ocaml-multicore/eio#317).
  Slightly faster.

Bug fixes:

- `Condition.broadcast` must interlock as well (@haesbaert ocaml-multicore/eio#324).

- Split the reads into no more than 2^32-1 for luv (@haesbaert @talex5 @EduardoRFS ocaml-multicore/eio#343).
  Luv uses a 32 bit int for buffer sizes and wraps if the value passed is too big.

- eio_luv: allow `Net.connect` to be cancelled (@talex5 @nojb ocaml-multicore/eio#311).

- eio_main: Use dedicated log source (@anmonteiro ocaml-multicore/eio#326).

- linux_eio: fix kernel version number in log message (@talex5 @nojb ocaml-multicore/eio#314).

- Account for stack differences in the socketpair test (issue ocaml-multicore/eio#312) (@haesbaert ocaml-multicore/eio#313).

Documentation:

- Add Best Practices section to README (@talex5 ocaml-multicore/eio#299).

- Documentation improvements (@talex5 ocaml-multicore/eio#295 ocaml-multicore/eio#337).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants