Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set appropriate binary mode writing stdout #465

Merged
merged 4 commits into from
Feb 1, 2024

Conversation

jonahbeckford
Copy link
Contributor

@jonahbeckford jonahbeckford commented Jan 16, 2024

TLDR: Windows needs the binary mode set.

Possible fix for #466

Context

I'm using Dune in an unusual way that is exposing this ppxlib bug. Basically, since I want to use PPX-es with Dune from a bytecode-only environment, and because Dune requires a C compiler to build a (staged_pps) or (pps) driver, I have to resort to:

 (preprocess
  (action
   (run %{lib:TheNameOfTheLibrary:ppx.exe} -as-pp %{input-file})))

That in turn runs -as-pp which writes to stdout.

So:

# Works
Y:\source\dksdk-type\tests\_std\lib\DkSDKCoder_PPX\ppx.exe -as-pp .\tests\_std\expression.ml -o a.pp.ml

# Fails
Y:\source\dksdk-type\tests\_std\lib\DkSDKCoder_PPX\ppx.exe -as-pp .\tests\_std\expression.ml
Caml1999M031Fatal error: exception Failure("output_value: not a binary channel")
Raised by primitive operation at Stdlib.output_value in file "stdlib.ml", line 386, characters 26-54
Called from Ppxlib__Utils.Ast_io.write.(fun) in file "src/utils.ml", line 187, characters 8-34
Called from Ppxlib__Driver.process_file in file "src/driver.ml", line 1169, characters 6-159
Called from Ppxlib__Driver.standalone in file "src/driver.ml", line 1541, characters 9-27
Re-raised at Location.report_exception.loop in file "parsing/location.ml", line 938, characters 14-25
Called from Ppxlib__Driver.standalone in file "src/driver.ml", line 1544, characters 4-59
Called from Dune__exe___ppx in file ".ppx/6241d9fed22cd957d799b2636a4b50f0/_ppx.ml-gen", line 1, characters 9-36

@jonahbeckford jonahbeckford force-pushed the fix-stdout-binary-mode branch from 05f19c7 to cd8d6f6 Compare January 25, 2024 13:44
Copy link
Member

@pitag-ha pitag-ha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix! It's very good for us to have Windows users who notice these kinds of compatibility bugs.

I'm not very familiar with the topic as you can see from my first comment below. The second comment is a lot more relevant.

src/utils.ml Outdated
Comment on lines 3 to 8
let with_output fn ~binary ~f =
match fn with
| None | Some "-" -> f stdout
| None | Some "-" ->
set_binary_mode_out stdout binary;
f stdout
| Some fn -> Out_channel.with_file fn ~binary ~f
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a possibly obvious question here: From what I understand here, binary mode should not be set when writing binary to a file. Why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binary mode should be set when writing binary to a file. I thought the ~binary in Out_channel.with_file fn ~binary ~f already handled that.

I could be very wrong. Please remember this is the first time I have seen this codebase and I barely understand it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine. I am assuming that if this routine is used, nothing else is sent to stdout. If that is not the case, one may want to do flush stdout before set_binary_out stdout binary in case there's something left in the buffer (text mode translation occurs when bytes are actually read or written by a system call, so anything in the write buffer that has not yet been flushed when the mode is changed will be translated).

Also the stdout is left in binary mode after this function returns, which is a side-effect. But assuming nothing else is sent to stdout, this doesn't matter (but maybe a comment could be in order).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, both of you, for the clarifications, they're very useful for me!

I agree with both things @nojb has said: Fully switching stdout to binary mode should not be a problem as the driver either outputs a binary AST or text (e.g. errors, lint results, a pprinted AST or similar). Mixing both would be a bug. Even if we implemented a server mode for the driver some day, the output kind should remain the same between different server calls. However, a comment about the impact on stdout's buffer and about leaving stdout in binary mode is useful, thanks for already writing it, @jonahbeckford.

src/utils.ml Outdated
Comment on lines 157 to 161
| Stdin ->
(match input_kind with
| Necessarily_binary -> set_binary_mode_in stdin true
| _ -> ());
from_channel stdin ~input_kind
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only set binary mode when the driver is called with -as-ppx option. With -as-ppx option, the only possible input mode is binary. However, there are other options, which allow either source code or binary as input mode and detect which one it is by reading the first few characters (see logic around read_magic.

So we probably can't find a reliable way to set binary mode from the beginning on when reading binary, as ppxlib detects whether it's binary or source after having read the potential magic number. I'm not sure what the best strategy is to go about this.

What kind of things typically go wrong when reading binary without having set binary mode? Can just reading the magic number already be problematic? Magic numbers don't contain characters that differ under different character encodings (as far as I know). One example of a magic number is "Caml1999M033". If only reading the magic number can't be problematic, I suggest setting binary mode after having detected that it's binary mode (i.e. in from_channel).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default text mode translates \r\n to \n, so a magic number like Caml1999M033 should have no problem being read in text mode. In general, as long as there is no byte 0x0d (CR) in the magic number it should be safe to read in text mode, and then switch to binary mode.

The only thing I am worried about is buffering. Would the entire buffer be translated (big problem), rather than just the magic number field (no problem). Perhaps @nojb can comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I am worried about is buffering. Would the entire buffer be translated (big problem), rather than just the magic number field (no problem). Perhaps @nojb can comment?

As mentioned above, by the time the bytes get into the input buffer, translation has already ocurred, so it is too late to switch modes after checking the magic number. Switching stdin to binary mode before reading the magic seems the only way. And anyway, binary mode works just fine for reading source code.

Copy link
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping; I left some comments.

src/utils.ml Outdated
Comment on lines 3 to 8
let with_output fn ~binary ~f =
match fn with
| None | Some "-" -> f stdout
| None | Some "-" ->
set_binary_mode_out stdout binary;
f stdout
| Some fn -> Out_channel.with_file fn ~binary ~f
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine. I am assuming that if this routine is used, nothing else is sent to stdout. If that is not the case, one may want to do flush stdout before set_binary_out stdout binary in case there's something left in the buffer (text mode translation occurs when bytes are actually read or written by a system call, so anything in the write buffer that has not yet been flushed when the mode is changed will be translated).

Also the stdout is left in binary mode after this function returns, which is a side-effect. But assuming nothing else is sent to stdout, this doesn't matter (but maybe a comment could be in order).

src/utils.ml Outdated
Comment on lines 157 to 161
| Stdin ->
(match input_kind with
| Necessarily_binary -> set_binary_mode_in stdin true
| _ -> ());
from_channel stdin ~input_kind
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I am worried about is buffering. Would the entire buffer be translated (big problem), rather than just the magic number field (no problem). Perhaps @nojb can comment?

As mentioned above, by the time the bytes get into the input buffer, translation has already ocurred, so it is too late to switch modes after checking the magic number. Switching stdin to binary mode before reading the magic seems the only way. And anyway, binary mode works just fine for reading source code.

let len =
set_binary_mode_in stdin true;
input stdin buf 0 magic_length

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking this is probably not needed because the magic number does not contain CRs. Also this test is not run on Windows :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way it is now, this test doesn't fail for you on Windows, @jonahbeckford , does it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything now passes on Windows using the latest ppxlib PR and two merlin PRs (one closed, one open).

+ Document assumptions in [with_output] using binary mode.

+ Remove unneeded changes in print_magic_number.ml.

Signed-off-by: Jonah Beckford <[email protected]>
@jonahbeckford jonahbeckford marked this pull request as ready for review January 30, 2024 09:09
@jonahbeckford
Copy link
Contributor Author

Addressed feedback. Changed from Draft to normal PR. The first problem mentioned in #466 is fixed, but the "Second Attempt" is still not fixed (and I have no good way of troubleshooting it).

src/utils.ml Outdated
Comment on lines 171 to 173
(match input_kind with
| Necessarily_binary -> set_binary_mode_in stdin true
| _ -> ());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input_kind can be either Necessarily_binary or Possibly_source. I think binary mode should be set on both cases, because both cases may end up reading binary data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed and tested. (The "Second" problem still remains open)

@nojb
Copy link
Contributor

nojb commented Jan 30, 2024

@jonahbeckford do you confirm that you still have the "Second Attempt" issue even after the latest changes to this PR?

@nojb
Copy link
Contributor

nojb commented Jan 30, 2024

(Yes, you do; I had missed your last message).

@nojb
Copy link
Contributor

nojb commented Jan 30, 2024

@jonahbeckford for the "second" problem, there is a suspicious open_in in https://github.com/ocaml/merlin/blob/05e90d823dd5b5acaaa70d2b2095b5bde2e395c4/src/ocaml/driver/pparse.ml#L180

Did you try changing it to open_in_bin?

@jonahbeckford
Copy link
Contributor Author

@jonahbeckford for the "second" problem, there is a suspicious open_in in https://github.com/ocaml/merlin/blob/05e90d823dd5b5acaaa70d2b2095b5bde2e395c4/src/ocaml/driver/pparse.ml#L180

Did you try changing it to open_in_bin?

Wow, you deserve an Olympics medal. The "Second Attempt" works! I'll submit a PR for merlin for that bit.

Copy link
Member

@pitag-ha pitag-ha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me. Thanks again, @jonahbeckford!

If there are no objections (@NathanReb , @panglesd , @ceastlund), I'll go ahead and merge it.

@pitag-ha pitag-ha merged commit 3473e52 into ocaml-ppx:main Feb 1, 2024
5 checks passed
@NathanReb
Copy link
Collaborator

Looks good to me too! Let's merge!

@jonahbeckford jonahbeckford deleted the fix-stdout-binary-mode branch August 3, 2024 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants