-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive memory consumption on sorting #10511
Comments
@samuelcolvin can you share the query plan for this query? Specifically what is the output of this query? explain select span_name from records order by bit_length(attributes) desc limit 20 I would also not expect it to consume 20GB of memory |
Sorry for the delay, here we go:
|
🤔 that certainly seems like it is doing a Top(K) with 14 cores -- so I would expect this would hold at most 20 * 14 batches
🤔 |
Interestingly, the same data in CSV works fine, but Parquet causes a crash |
take |
After #13377 was merged, this is now much more memory-efficient and, in fact, works on just 1 GB for me 🎉 (Although I had to change I think we can close this issue |
Is it possible this memory issue is related to |
I think this was recently merged |
Nice! So shall we close this ticket? Or is there still more work to do? |
IMO we can close it if @samuelcolvin doesn't mind 🙂 |
Closing, can always create a new issue if someone reproduces. |
Thank you so much for working on this. |
Describe the bug
I'm running the following query:
And it's running out of memory with 20GB memory limit (
RuntimeConfig::new().with_memory_limit(20 * 1024 * 1024 * 1024, 0.8)
), and passing with 30GB allowed.Error message is:
The point is that in theory this query only needs to hold the
span_name
s of the 20 records with the longestattributes
in memory.But even if it chose to hold all
span_name
in memory, it shouldn't need this much memory:sum(bit_length(span_name)) = 1_038_805_400
aka ~1GB, for all rowsTo Reproduce
The dataset and code aren't public, but It shouldn't be too hard to reproduce with a table containing 2 text columns
Expected behavior
Ideally a query like this would have a far more modest memory foot print.
Additional context
Using datafusion v38.0.0, same error with mimalloc and without.
For comparison, duckdb runs this query fine with a
1GB
memory limit, but fails with500MB
.The text was updated successfully, but these errors were encountered: