Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seek offset semantics don't match those of Python for text-mode file-like objects #8

Open
smheidrich opened this issue Dec 6, 2022 · 1 comment

Comments

@smheidrich
Copy link
Contributor

pyo3-file's implementation of seek is currently a simple wrapper around Python file-like objects' seek() method. That makes perfect sense for binary files.

For text-mode file-like objects, however, even though the method signatures match up, the semantics are quite different between what the Seek trait prescribes and what happens in Python:

The docs of seek from the Seek trait explicitly state that the offsets involved should be in bytes.

Meanwhile, for Python file-like objects opened in text mode, TextIOBase.seek() states that

offset must either be a number returned by TextIOBase.tell(), or zero.

and that the returned offset is an "opaque number". What that means exactly is made more explicit in the tutorial:

In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)) and the only valid offset values are those returned from the f.tell(), or zero. Any other offset value produces undefined behaviour.

So for these kinds of files we can't in general interpret the values given to and returned by seek() as being byte offsets, and hence I'm not sure if it's correct to allow these in the trait implementation or if that calls for a new trait describing opaque seeking. IMO this issue should at least be mentioned in the docs somewhere, even if the implementation is left as-is.

@smheidrich
Copy link
Contributor Author

Just FYI, I put an OpaqueSeek trait in smheidrich/py-json-stream-rs-tokenizer#50 which shows what I think an interface that takes the opaqueness into account could look like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant