Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-90533: Implement BytesIO.peek() #30808

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
1e5e3ab
Implement BytesIO.peek()
marcelm Jan 22, 2022
acdfe2e
📜🤖 Added by blurb_it.
blurb-it[bot] Apr 10, 2022
b3f3c3d
Document BytesIO.peek()
marcelm Nov 9, 2022
2506967
Implement with the help of read_bytes()
marcelm Nov 9, 2022
a5ac601
Add to What’s New
marcelm Jul 8, 2023
9da7f9f
versionadded: 3.12 -> 3.13
marcelm Jul 8, 2023
79d5032
Remove unused variable
marcelm Jul 8, 2023
2963dab
Test tell() after peek()
marcelm Sep 22, 2023
7d8793a
Update docs, factor out peek_bytes, semantics
marcelm Sep 22, 2023
9bf291a
Merge branch 'main' into fix-issue-46375
erlend-aasland Sep 28, 2023
d4d5a55
Update Misc/NEWS.d/next/Library/2022-04-10-20-10-59.bpo-46375.8j1ogZ.rst
marcelm Sep 28, 2023
9cdd231
Update Modules/_io/bytesio.c
marcelm Sep 28, 2023
d9948c8
Update Modules/_io/bytesio.c
marcelm Sep 28, 2023
69ddb4f
Use SemBr
marcelm Sep 28, 2023
58a5d58
Update Doc/whatsnew/3.13.rst
marcelm Sep 28, 2023
3d21011
Apply suggestions from code review
marcelm Sep 28, 2023
4f4999f
Use a context manager around memio in test_peek
marcelm Sep 28, 2023
56fbee3
Add more tests for tell() after peek()
marcelm Sep 28, 2023
4c3c908
Document why size < 0 can happen
marcelm Sep 28, 2023
1f2b5c5
Update Modules/_io/bytesio.c
marcelm Sep 29, 2023
a1504a7
Do not update pos if peek_bytes failed
marcelm Sep 29, 2023
827a785
Size can be negative after truncate or seek
marcelm Sep 29, 2023
eebd289
Test with size<0 and size>len(buf)
marcelm Sep 29, 2023
4855189
Test peek() after write()
marcelm Sep 29, 2023
fd85b46
Document BufferedReader.peek and BytesIO.peek similarly
marcelm Sep 29, 2023
5733d5a
Comment
marcelm Sep 29, 2023
4d832e0
More it more explicit that size is ignored
marcelm Sep 29, 2023
7a449e1
Merge remote-tracking branch 'upstream/main' into fix-issue-46375
marcelm Oct 20, 2023
1d793e3
Return an empty bytes object for size=0
marcelm Oct 23, 2023
9b0b04f
Simplify
marcelm Oct 23, 2023
bb6447d
Test peek(3) and peek(5)
marcelm Oct 23, 2023
611bdaa
Merge remote-tracking branch 'upstream/main' into fix-issue-46375
marcelm Oct 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions Doc/library/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -728,6 +728,15 @@ than raw I/O does.

Return :class:`bytes` containing the entire contents of the buffer.

.. method:: peek(size=1, /)

Return bytes from the current position onwards without advancing the position.
At least one byte of data is returned if not at EOF.
Return an empty :class:`bytes` object at EOF.
If the size argument is less than one or larger than the number of available bytes,
a copy of the buffer from the current position until the end is returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand that size=-1 or size=None return the whole content. But I'm surprised that size=0 returns something different than an empty string or raise an exception.

I suggest to return an empty string when peek(0) is called, it would be similar to read(0).

Copy link
Author

@marcelm marcelm Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, changed now.

This was originally for (perceived) consistency with BufferedReader.peek(), which does not return empty bytes objects for size=0. But then BufferedReader.peek() ignores the size anyway.


.. versionadded:: 3.13

.. method:: read1(size=-1, /)

Expand Down Expand Up @@ -761,9 +770,12 @@ than raw I/O does.

.. method:: peek(size=0, /)

Return bytes from the stream without advancing the position. At most one
single read on the raw stream is done to satisfy the call. The number of
bytes returned may be less or more than requested.
Return bytes from the current position onwards without advancing the position.
At least one byte of data is returned if not at EOF.
Return an empty :class:`bytes` object at EOF.
At most one single read on the underlying raw stream is done to satisfy the call.
The exact number of bytes returned is unspecified
(*size* is ignored).
marcelm marked this conversation as resolved.
Show resolved Hide resolved

.. method:: read(size=-1, /)

Expand Down
2 changes: 2 additions & 0 deletions Doc/whatsnew/3.13.rst
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,8 @@ and only logged in :ref:`Python Development Mode <devmode>` or on :ref:`Python
built on debug mode <debug-build>`.
(Contributed by Victor Stinner in :gh:`62948`.)

* Add :meth:`io.BytesIO.peek`. (Contributed by Marcel Martin in :gh:`90533`.)

opcode
------

Expand Down
7 changes: 7 additions & 0 deletions Lib/_pyio.py
Original file line number Diff line number Diff line change
Expand Up @@ -978,6 +978,13 @@ def tell(self):
raise ValueError("tell on closed file")
return self._pos

def peek(self, size=1):
if self.closed:
raise ValueError("peek on closed file")
if size < 1:
size = len(self._buffer) - self._pos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C implementation is differently implemented, which surprised me at first.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that you would have expected something like if size < 1 or size > len(self._buffer)? The Python version relies on how slicing semantics work to deal with this. Shall I change it to be more explicit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is needed. Perhaps a comment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment. Let me know if it’s too verbose now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks kind of complicated, whereas you can just do:

if size < 1:
    return self._buffer[self._pos:]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

return self._buffer[self._pos : self._pos + size]

def truncate(self, pos=None):
if self.closed:
raise ValueError("truncate on closed file")
Expand Down
35 changes: 35 additions & 0 deletions Lib/test/test_memoryio.py
Original file line number Diff line number Diff line change
Expand Up @@ -517,6 +517,41 @@ def test_relative_seek(self):
memio.seek(1, 1)
self.assertEqual(memio.read(), buf[1:])

def test_peek(self):
buf = self.buftype("1234567890")
with self.ioclass(buf) as memio:
self.assertEqual(memio.tell(), 0)
self.assertEqual(memio.peek(1), buf[:1])
self.assertEqual(memio.peek(1), buf[:1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add tests reading 3 and 5 bytes? It seems like most tests read 1 byte or everything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, added now.

self.assertEqual(memio.peek(), buf[:1])
self.assertEqual(memio.peek(0), buf)
marcelm marked this conversation as resolved.
Show resolved Hide resolved
self.assertEqual(memio.peek(len(buf) + 100), buf)
self.assertEqual(memio.peek(-1), buf)
self.assertEqual(memio.tell(), 0)
memio.read(1)
self.assertEqual(memio.tell(), 1)
self.assertEqual(memio.peek(1), buf[1:2])
self.assertEqual(memio.peek(1), buf[1:2])
self.assertEqual(memio.peek(), buf[1:2])
self.assertEqual(memio.peek(0), buf[1:])
self.assertEqual(memio.peek(len(buf) + 100), buf[1:])
self.assertEqual(memio.peek(-1), buf[1:])
self.assertEqual(memio.tell(), 1)
memio.read()
self.assertEqual(memio.tell(), len(buf))
self.assertEqual(memio.peek(1), self.EOF)
self.assertEqual(memio.tell(), len(buf))
marcelm marked this conversation as resolved.
Show resolved Hide resolved
marcelm marked this conversation as resolved.
Show resolved Hide resolved
# Peeking works after writing
abc = self.buftype("abc")
memio.write(abc)
self.assertEqual(memio.peek(), self.EOF)
memio.seek(len(buf))
self.assertEqual(memio.peek(), abc[:1])
self.assertEqual(memio.peek(-1), abc)
self.assertEqual(memio.peek(len(abc) + 100), abc)
self.assertEqual(memio.tell(), len(buf))
self.assertRaises(ValueError, memio.peek)

def test_unicode(self):
memio = self.ioclass()

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add :meth:`io.BytesIO.peek`.
48 changes: 45 additions & 3 deletions Modules/_io/bytesio.c
Original file line number Diff line number Diff line change
Expand Up @@ -394,8 +394,9 @@ _io_BytesIO_tell_impl(bytesio *self)
return PyLong_FromSsize_t(self->pos);
}

// Read without advancing position
static PyObject *
read_bytes(bytesio *self, Py_ssize_t size)
peek_bytes(bytesio *self, Py_ssize_t size)
{
const char *output;

Expand All @@ -404,15 +405,23 @@ read_bytes(bytesio *self, Py_ssize_t size)
if (size > 1 &&
marcelm marked this conversation as resolved.
Show resolved Hide resolved
self->pos == 0 && size == PyBytes_GET_SIZE(self->buf) &&
self->exports == 0) {
self->pos += size;
return Py_NewRef(self->buf);
}

output = PyBytes_AS_STRING(self->buf) + self->pos;
self->pos += size;
return PyBytes_FromStringAndSize(output, size);
}

static PyObject *
read_bytes(bytesio *self, Py_ssize_t size)
{
PyObject *bytes = peek_bytes(self, size);
if (bytes != NULL) {
self->pos += size;
}
return bytes;
}

/*[clinic input]
_io.BytesIO.read
size: Py_ssize_t(accept={int, NoneType}) = -1
Expand Down Expand Up @@ -462,6 +471,38 @@ _io_BytesIO_read1_impl(bytesio *self, Py_ssize_t size)
return _io_BytesIO_read_impl(self, size);
}


/*[clinic input]
_io.BytesIO.peek
size: Py_ssize_t = 1
/

Return bytes from the stream without advancing the position.

If the size argument is zero or negative, read until EOF is reached.
Return an empty bytes object at EOF.
[clinic start generated code]*/

static PyObject *
_io_BytesIO_peek_impl(bytesio *self, Py_ssize_t size)
/*[clinic end generated code: output=fa4d8ce28b35db9b input=cb06614a3ed0496e]*/
{
CHECK_CLOSED(self);

/* adjust invalid sizes */
Py_ssize_t n = self->string_size - self->pos;
if (size < 1 || size > n) {
size = n;
/* size can be negative after truncate() or seek() */
if (size < 0) {
size = 0;
}
}
return peek_bytes(self, size);
}



/*[clinic input]
_io.BytesIO.readline
size: Py_ssize_t(accept={int, NoneType}) = -1
Expand Down Expand Up @@ -1019,6 +1060,7 @@ static struct PyMethodDef bytesio_methods[] = {
_IO_BYTESIO_READLINE_METHODDEF
_IO_BYTESIO_READLINES_METHODDEF
_IO_BYTESIO_READ_METHODDEF
_IO_BYTESIO_PEEK_METHODDEF
_IO_BYTESIO_GETBUFFER_METHODDEF
_IO_BYTESIO_GETVALUE_METHODDEF
_IO_BYTESIO_SEEK_METHODDEF
Expand Down
48 changes: 47 additions & 1 deletion Modules/_io/clinic/bytesio.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.