-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-90533: Implement BytesIO.peek() #30808
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
3111318
to
675718d
Compare
This comment was marked as outdated.
This comment was marked as outdated.
675718d
to
d4e849f
Compare
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
ef5eb69
to
b82fa85
Compare
6998be1
to
be39ff2
Compare
@AlexWaygood You’ve been the only human to interact with this PR so far, do you possibly have any advice on how to move this forward? |
Hi @marcelm — sorry for the delay in anybody looking at this. I haven't studied your PR in detail (or thought about whether the proposal is a good idea), but it looks well put together at first glance. I'll try to take a look soon. I won't be able to review the C code, but I can comment on whether the proposal seems like a good idea, and I can review the Python implementation and the tests. Note that if this proposal is accepted, it will also need:
You can also add yourself to |
Thank you! I pushed a documentation update and will add an entry to the What’s new document in case the PR is reviewed favorably. |
I'll defer to @benjaminp's judgement on this one -- I'm really not the right reviewer for this, unfortunately :( Please ping me again if you still haven't had a review in a few weeks. |
Thank you! I appreciate that you took the time. |
@AlexWaygood Hi! Been 8 months, no review so far :) How can we take this forward? I'm not the original author of the PR but I also have a use-case for this API and would like to see this change land, happy to help progress this. |
Thanks for your interest! I have updated the PR to fix the merge conflict and to reflect that it now needs to target 3.13. |
@marcelm That was quick, thank you so much! |
Why not, but it would also deviate from the
You wouldn't need to guarantee anything IMHO. |
I prefer that when I read data form the filesystem, the data doesn't change, even if I use it 10 seconds later, or 1 day later. Or for my needs, apparently, bytes is the good type. It's less efficient, but it works as expected. Maybe a new API is needed for memoryview which has a different semantics. |
That was just a random suggestion, no need to act on it here. |
I’ve now taken the liberty of changing the BufferedReader.peek() documentation as I suggested above. I think I’ve addressed all comments. |
Hi, just a gentle reminder that this is ready from my side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that peek(0) returns an empty bytes string.
At least one byte of data is returned if not at EOF. | ||
Return an empty :class:`bytes` object at EOF. | ||
If the size argument is less than one or larger than the number of available bytes, | ||
a copy of the buffer from the current position until the end is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand that size=-1 or size=None return the whole content. But I'm surprised that size=0 returns something different than an empty string or raise an exception.
I suggest to return an empty string when peek(0) is called, it would be similar to read(0).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, changed now.
This was originally for (perceived) consistency with BufferedReader.peek()
, which does not return empty bytes objects for size=0. But then BufferedReader.peek()
ignores the size anyway.
Lib/_pyio.py
Outdated
# even if the size is greater than the buffer length or | ||
# the position is beyond the end of the buffer | ||
if size < 1: | ||
size = len(self._buffer) - self._pos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks kind of complicated, whereas you can just do:
if size < 1:
return self._buffer[self._pos:]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
with self.ioclass(buf) as memio: | ||
self.assertEqual(memio.tell(), 0) | ||
self.assertEqual(memio.peek(1), buf[:1]) | ||
self.assertEqual(memio.peek(1), buf[:1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add tests reading 3 and 5 bytes? It seems like most tests read 1 byte or everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, added now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for multiple updates!
@erlend-aasland @pitrou: Are you ok that peek(0) returns an empty string? IMO it makes the io module more consistent between peek(0) and read(0). In a previous version, peek(1) returned 1 byte, whereas peek(0) returned all remaining bytes (full content).
What does peek(0) do on other buffered input objects? |
The following table lists all
I’d still say it is ok for |
Well, people calling |
If you want (at least) 128 bytes, why not calling peek(128)? |
Perhaps returning just a single byte would be best. This would avoid giving the wrong impression that always 128 bytes are available when someone starts out with I am starting to think that it would be good to let |
|
Really, it helps to think about actual uses of Lines 523 to 531 in 937872e
Reading ahead by 1 byte is completely pointless, which is why I suggest a small but non-trivial size. |
It’s not completely pointless: My use case is actually reading exactly one byte ahead. But yes, I got lucky because that single byte is sufficient to distinguish the two file formats. There is also basic_stream::peek in C++, which peeks one character ahead. And the C function ungetc guarantees at most one pushback, which is equivalent. I am guessing these functions inspired BufferedReader.peek, which is possibly why the guarantees are the way they are. That said, I’m totally fine with returning 128 bytes for |
128 sounds arbitrary. The io module has one constant: io.DEFAULT_BUFFER_SIZE. It's used by BufferedReader.peek(). |
128 is arbitrary, and so is |
Closes gh-90533