-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added buffered reading to tokenizer #46
base: master
Are you sure you want to change the base?
Conversation
As you already mentioned in #45 (comment), this partly intersects with the changes that would be needed to fix #30. To be more precise, not doing any buffering is the most simple (but also least performant, at least in Rust) way of fixing #30, as I mentioned here, and my suggested fix for #30 (smheidrich/py-json-stream-rs-tokenizer#50) uses it when cursor positions in sync with the tokenization progress are requested but the underlying Python stream isn't seekable. So for the Rust tokenizer I'm currently thinking about merging smheidrich/py-json-stream-rs-tokenizer#50 first (except with the new constructor parameter |
# Conflicts: # src/json_stream/tests/test_buffering.py
@smheidrich This was fixed in this commit. If you've already ported this code, you will also have ported this bug! |
I have also, in response to your comment in the rust repo committed a proposed new interface for |
All right so I've finally gotten around to writing the parallel PR to this for the Rust tokenizer: smheidrich/py-json-stream-rs-tokenizer#87 I tested it locally with the test case you modified here and there are no errors so I guess it basically works? Although there are a lot of different cases depending on e.g. whether the underlying Python stream returns strings or bytes, whether it's seekable, etc., so I might write another test on my end for those. UPDATE: Tests on my side are done now as well. |
In response to #45