Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stream overconsumption by undoing readahead via seek when possible #50

Merged
merged 27 commits into from
Jul 5, 2023

Conversation

smheidrich
Copy link
Owner

@smheidrich smheidrich commented Dec 14, 2022

Fixes #47 when done

@smheidrich smheidrich force-pushed the fix-stream-overconsumption-seek-if-possible branch from 986aef5 to f6f036a Compare December 14, 2022 00:22
@smheidrich smheidrich changed the title Fix stream overconsumption by undoing readahead via seek when possible WIP: Fix stream overconsumption by undoing readahead via seek when possible Dec 14, 2022
@smheidrich smheidrich force-pushed the fix-stream-overconsumption-seek-if-possible branch 2 times, most recently from dfd74cc to 7396e2e Compare December 16, 2022 20:09
@smheidrich smheidrich force-pushed the fix-stream-overconsumption-seek-if-possible branch 2 times, most recently from 699faf8 to 3ce611a Compare December 25, 2022 16:57
smheidrich added a commit to smheidrich/json-stream that referenced this pull request Dec 25, 2022
Proof of concept only. Maybe there is a more elegant solution that
doesn't require putting `level` everywhere to find out when the
top-level document ends.

Requires
smheidrich/py-json-stream-rs-tokenizer#50
to be merged to actually work.
@smheidrich smheidrich force-pushed the fix-stream-overconsumption-seek-if-possible branch 2 times, most recently from 16a61fd to 3172869 Compare April 25, 2023 22:03
@smheidrich smheidrich force-pushed the fix-stream-overconsumption-seek-if-possible branch 3 times, most recently from 173789e to c22bff9 Compare June 24, 2023 10:17
@smheidrich smheidrich force-pushed the fix-stream-overconsumption-seek-if-possible branch from c22bff9 to a5225c0 Compare June 26, 2023 16:42
smheidrich added 12 commits July 4, 2023 22:44
Pretty horrible fix that slows things down tremendously (speedup lowered
from ~10 to ~2-3).
These issues remain:

- seeking from the chunk start position instead of the very beginning of
  the stream doesn't work yet as there is nothing that resets the char
  counter when a new read() is performed
- there must be some bug regarding re-initializing the BufReader after a
  readahead-undo (park_cursor() in the code) has been performed, because
  subsequent uses of this BufReader will yield wrong chars (visible e.g.
  in tests on random data like in the benchmarks but there should be an
  explicit test inside Rust instead)
- only Python text streams are supported, none of this garbage will work for
  Python byte streams
Remaining issues:

- get it to work for byte streams
- clean up Rust warnings
- remove unused code
- look into degraded performance (speedup now only 8...), e.g. try to
  zero-copy strings from Python by putting them in refcells or sth.
- handle case of non-seekable stream
Fixes performance degradation issue (lol...)
No switch to choose text/bytes yet => up next.
@smheidrich smheidrich force-pushed the fix-stream-overconsumption-seek-if-possible branch from a5225c0 to ea44f7e Compare July 4, 2023 20:44
Avoids collisions with parameters introduced by json-stream's Python
tokenizer.
@smheidrich smheidrich changed the title WIP: Fix stream overconsumption by undoing readahead via seek when possible Fix stream overconsumption by undoing readahead via seek when possible Jul 5, 2023
@smheidrich smheidrich force-pushed the fix-stream-overconsumption-seek-if-possible branch from b0f2123 to 890fffa Compare July 5, 2023 20:49
@smheidrich smheidrich merged commit 252d5a7 into main Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Input stream unusable after reading a JSON document
1 participant