-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming interface #215
Comments
I’m generally in favour of streaming/incrementality, but I think there are a couple of challenges here we’ll have to overcome for this to actually result in any performance improvements:
|
|
I think I must be misunderstanding you; wouldn’t that mean we’ve got the entire file in memory twice? |
Nope, copying a streaming bytestring is O(1). It doesn’t involve the actual copying of concrete chunks, just the constructors associated with them. |
Ah, ok; but we still end up having the whole thing paged into memory, right? Which kind of obviates the value of streaming, no? |
Answering my own question: we don’t keep the whole thing paged in at once unless we slice the whole thing. We (presumably? I don’t actually know this for sure) lose O(1) slicing, but we limit space consumption for the common case of slicing only parts of the document out. Losing O(1) slicing could still be a challenge, but it’s one we can probably work with since we don’t tend to traverse the tree random-access. Plus we might also be able to take advantage of tree-sitter’s incrementality at some point, albeit at the cost of keeping the Ok, I think I have my head wrapped around this now; sounds like a plan! |
In production don't we already have a maximum file size that we're willing to process? I think that's true both when we receive blob content via a network RPC call and when we load the blob from disk. (/cc @tclem @joshvera to confirm) If so, the limit should be small enough that paging an entire copy of the file into memory isn't that big of an issue. We're not dealing with multi-GB data files here. |
Right now, all parsing assumes you have the entire source file paged into memory as a strict ByteString. I would love to see an option that incorporated
streaming-bytestring
, which is IME the best way to express lazily-read bytestreams:The text was updated successfully, but these errors were encountered: