-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set appropriate binary mode writing stdout #465
Conversation
Signed-off-by: Jonah Beckford <[email protected]>
Signed-off-by: Jonah Beckford <[email protected]>
05f19c7
to
cd8d6f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this fix! It's very good for us to have Windows users who notice these kinds of compatibility bugs.
I'm not very familiar with the topic as you can see from my first comment below. The second comment is a lot more relevant.
src/utils.ml
Outdated
let with_output fn ~binary ~f = | ||
match fn with | ||
| None | Some "-" -> f stdout | ||
| None | Some "-" -> | ||
set_binary_mode_out stdout binary; | ||
f stdout | ||
| Some fn -> Out_channel.with_file fn ~binary ~f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a possibly obvious question here: From what I understand here, binary mode should not be set when writing binary to a file. Why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Binary mode should be set when writing binary to a file. I thought the ~binary
in Out_channel.with_file fn ~binary ~f
already handled that.
I could be very wrong. Please remember this is the first time I have seen this codebase and I barely understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine. I am assuming that if this routine is used, nothing else is sent to stdout
. If that is not the case, one may want to do flush stdout
before set_binary_out stdout binary
in case there's something left in the buffer (text mode translation occurs when bytes are actually read or written by a system call, so anything in the write buffer that has not yet been flushed when the mode is changed will be translated).
Also the stdout
is left in binary mode after this function returns, which is a side-effect. But assuming nothing else is sent to stdout
, this doesn't matter (but maybe a comment could be in order).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, both of you, for the clarifications, they're very useful for me!
I agree with both things @nojb has said: Fully switching stdout
to binary mode should not be a problem as the driver either outputs a binary AST or text (e.g. errors, lint results, a pprinted AST or similar). Mixing both would be a bug. Even if we implemented a server mode for the driver some day, the output kind should remain the same between different server calls. However, a comment about the impact on stdout
's buffer and about leaving stdout
in binary mode is useful, thanks for already writing it, @jonahbeckford.
src/utils.ml
Outdated
| Stdin -> | ||
(match input_kind with | ||
| Necessarily_binary -> set_binary_mode_in stdin true | ||
| _ -> ()); | ||
from_channel stdin ~input_kind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only set binary mode when the driver is called with -as-ppx
option. With -as-ppx
option, the only possible input mode is binary. However, there are other options, which allow either source code or binary as input mode and detect which one it is by reading the first few characters (see logic around read_magic
.
So we probably can't find a reliable way to set binary mode from the beginning on when reading binary, as ppxlib detects whether it's binary or source after having read the potential magic number. I'm not sure what the best strategy is to go about this.
What kind of things typically go wrong when reading binary without having set binary mode? Can just reading the magic number already be problematic? Magic numbers don't contain characters that differ under different character encodings (as far as I know). One example of a magic number is "Caml1999M033". If only reading the magic number can't be problematic, I suggest setting binary mode after having detected that it's binary mode (i.e. in from_channel
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default text mode translates \r\n
to \n
, so a magic number like Caml1999M033
should have no problem being read in text mode. In general, as long as there is no byte 0x0d
(CR) in the magic number it should be safe to read in text mode, and then switch to binary mode.
The only thing I am worried about is buffering. Would the entire buffer be translated (big problem), rather than just the magic number field (no problem). Perhaps @nojb can comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing I am worried about is buffering. Would the entire buffer be translated (big problem), rather than just the magic number field (no problem). Perhaps @nojb can comment?
As mentioned above, by the time the bytes get into the input buffer, translation has already ocurred, so it is too late to switch modes after checking the magic number. Switching stdin
to binary mode before reading the magic seems the only way. And anyway, binary mode works just fine for reading source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping; I left some comments.
src/utils.ml
Outdated
let with_output fn ~binary ~f = | ||
match fn with | ||
| None | Some "-" -> f stdout | ||
| None | Some "-" -> | ||
set_binary_mode_out stdout binary; | ||
f stdout | ||
| Some fn -> Out_channel.with_file fn ~binary ~f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine. I am assuming that if this routine is used, nothing else is sent to stdout
. If that is not the case, one may want to do flush stdout
before set_binary_out stdout binary
in case there's something left in the buffer (text mode translation occurs when bytes are actually read or written by a system call, so anything in the write buffer that has not yet been flushed when the mode is changed will be translated).
Also the stdout
is left in binary mode after this function returns, which is a side-effect. But assuming nothing else is sent to stdout
, this doesn't matter (but maybe a comment could be in order).
src/utils.ml
Outdated
| Stdin -> | ||
(match input_kind with | ||
| Necessarily_binary -> set_binary_mode_in stdin true | ||
| _ -> ()); | ||
from_channel stdin ~input_kind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing I am worried about is buffering. Would the entire buffer be translated (big problem), rather than just the magic number field (no problem). Perhaps @nojb can comment?
As mentioned above, by the time the bytes get into the input buffer, translation has already ocurred, so it is too late to switch modes after checking the magic number. Switching stdin
to binary mode before reading the magic seems the only way. And anyway, binary mode works just fine for reading source code.
let len = | ||
set_binary_mode_in stdin true; | ||
input stdin buf 0 magic_length | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking this is probably not needed because the magic number does not contain CRs. Also this test is not run on Windows :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way it is now, this test doesn't fail for you on Windows, @jonahbeckford , does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything now passes on Windows using the latest ppxlib PR and two merlin PRs (one closed, one open).
+ Document assumptions in [with_output] using binary mode. + Remove unneeded changes in print_magic_number.ml. Signed-off-by: Jonah Beckford <[email protected]>
Addressed feedback. Changed from Draft to normal PR. The first problem mentioned in #466 is fixed, but the "Second Attempt" is still not fixed (and I have no good way of troubleshooting it). |
src/utils.ml
Outdated
(match input_kind with | ||
| Necessarily_binary -> set_binary_mode_in stdin true | ||
| _ -> ()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_kind
can be either Necessarily_binary
or Possibly_source
. I think binary mode should be set on both cases, because both cases may end up reading binary data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed and tested. (The "Second" problem still remains open)
Signed-off-by: Jonah Beckford <[email protected]>
@jonahbeckford do you confirm that you still have the "Second Attempt" issue even after the latest changes to this PR? |
(Yes, you do; I had missed your last message). |
@jonahbeckford for the "second" problem, there is a suspicious Did you try changing it to |
Wow, you deserve an Olympics medal. The "Second Attempt" works! I'll submit a PR for merlin for that bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me. Thanks again, @jonahbeckford!
If there are no objections (@NathanReb , @panglesd , @ceastlund), I'll go ahead and merge it.
Looks good to me too! Let's merge! |
TLDR: Windows needs the binary mode set.
Possible fix for #466
Context
I'm using Dune in an unusual way that is exposing this ppxlib bug. Basically, since I want to use PPX-es with Dune from a bytecode-only environment, and because Dune requires a C compiler to build a
(staged_pps)
or(pps)
driver, I have to resort to:That in turn runs
-as-pp
which writes tostdout
.So: