Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive memory consumption on sorting #10511

Closed
samuelcolvin opened this issue May 14, 2024 · 13 comments
Closed

Excessive memory consumption on sorting #10511

samuelcolvin opened this issue May 14, 2024 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@samuelcolvin
Copy link
Contributor

samuelcolvin commented May 14, 2024

Describe the bug

I'm running the following query:

select span_name from records order by bit_length(attributes) desc limit 20

And it's running out of memory with 20GB memory limit (RuntimeConfig::new().with_memory_limit(20 * 1024 * 1024 * 1024, 0.8)), and passing with 30GB allowed.

Error message is:

Failed to allocate additional 25887088 bytes for ExternalSorterMerge[1] with 585120448 bytes already allocated - maximum available is 23605759

The point is that in theory this query only needs to hold the span_names of the 20 records with the longest attributes in memory.

But even if it chose to hold all span_name in memory, it shouldn't need this much memory:

  • there's "only" 12_980_628 rows
  • with sum(bit_length(span_name)) = 1_038_805_400 aka ~1GB, for all rows

To Reproduce

The dataset and code aren't public, but It shouldn't be too hard to reproduce with a table containing 2 text columns

Expected behavior

Ideally a query like this would have a far more modest memory foot print.

Additional context

Using datafusion v38.0.0, same error with mimalloc and without.

For comparison, duckdb runs this query fine with a 1GB memory limit, but fails with 500MB.

@alamb
Copy link
Contributor

alamb commented May 16, 2024

@samuelcolvin can you share the query plan for this query? Specifically what is the output of this query?

explain select span_name from records order by bit_length(attributes) desc limit 20

I would also not expect it to consume 20GB of memory

@samuelcolvin
Copy link
Contributor Author

Sorry for the delay, here we go:

logical_plan

 Projection: records_store.span_name
  Limit: skip=0, fetch=20
    Sort: bit_length(records_store.attributes) DESC NULLS FIRST, fetch=20
      TableScan: records_store projection=[span_name, attributes]

physical_plan

 ProjectionExec: expr=[span_name@0 as span_name]
  GlobalLimitExec: skip=0, fetch=20
    SortPreservingMergeExec: [bit_length(attributes@1) DESC], fetch=20
      SortExec: TopK(fetch=20), expr=[bit_length(attributes@1) DESC], preserve_partitioning=[true]
        ParquetExec: file_groups={14 groups: [[Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_400_0001_2139d404-1e6c-4903-90f1-775727315226.parquet:0..4619991, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_400_0001_41744ee3-e741-43cf-ab17-6b986d785a58.parquet:0..4631041, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_401_0001_91993b2d-9555-4d85-8dd9-71ad352a7066.parquet:0..16945773, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_401_0001_f611b930-13d2-4311-acf2-489585ca7e2a.parquet:0..16920185, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_402_0001_3f208d74-4b67-44ec-aa4a-1ed30d881b7f.parquet:0..18130069, ...], [Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_403_0001_780473ad-7867-4bcf-b63d-a52460f466cf.parquet:16513148..17675405, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_404_0001_d11bf279-599e-4e2c-9a60-14d13045c8dd.parquet:0..16597007, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_405_0001_456e9ee4-35d7-4483-813e-73c29d8023d2.parquet:0..15207780, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_406_0001_5becfa29-66d8-44ff-b61c-b82987f3d060.parquet:0..16885514, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_407_0001_1a81e83c-a226-4f29-b00b-54f849eb46cd.parquet:0..15583994, ...], [Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_408_0001_ec8d95f7-1464-4adf-a975-6857937592e1.parquet:12323655..15398417, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_409_0001_f27ecf09-e64b-4fc3-9cb9-cdd53b2200a6.parquet:0..15601539, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_410_0001_bd93a8e8-dd6f-48ae-8622-fd439f9d27f9.parquet:0..15515518, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_411_0001_5ec01d68-8a3a-4d21-b0df-3d507bea2d1c.parquet:0..15347993, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_412_0001_3611c025-d788-4d26-b77d-424e421fcd98.parquet:0..16151170, ...], [Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_413_0001_17decbd1-13d0-4a01-9213-8fee2ba46b7b.parquet:12069225..16928575, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_414_0001_d7924295-f735-4b34-af7f-3975ad3f91b6.parquet:0..16952022, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_415_0001_7a7e0d3e-de3e-462b-9fb5-264277a2c74b.parquet:0..16826232, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_416_0001_da93e867-a912-454a-a758-c9e27dcf74df.parquet:0..17230023, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_417_0001_577d3fa8-ea7d-471f-a33a-0d65faa54902.parquet:0..17533201, ...], [Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_418_0001_b5f013f2-d3ed-4f98-bb9c-cb58b059b5fc.parquet:4359379..17560075, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_419_0001_22e4dd59-1639-4c86-980b-12745c86aef6.parquet:0..17509164, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_420_0001_24661f8e-bef4-4efd-84fb-55f531e8a495.parquet:0..18239445, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_421_0001_75401751-5740-4086-a2ec-0e9c23bdc9a7.parquet:0..18045516, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_422_0001_2b55263e-f63b-402c-8024-9e85fc8c3d03.parquet:0..10765386], ...]}, projection=[span_name, attributes]

@alamb
Copy link
Contributor

alamb commented May 22, 2024

🤔 that certainly seems like it is doing a Top(K) with 14 cores -- so I would expect this would hold at most 20 * 14 batches

  • 20 is the k
  • 14 is the number of file groups (aka partitions)

🤔

@blaginin
Copy link
Contributor

blaginin commented Nov 9, 2024

Interestingly, the same data in CSV works fine, but Parquet causes a crash

@blaginin
Copy link
Contributor

blaginin commented Nov 9, 2024

take

@blaginin
Copy link
Contributor

blaginin commented Nov 16, 2024

After #13377 was merged, this is now much more memory-efficient and, in fact, works on just 1 GB for me 🎉 (Although I had to change bit_length to another function because we switched to string view when reading, but this type is not yet supported by bit_length)

I think we can close this issue

image

@jayzhan211
Copy link
Contributor

Although I had to change bit_length to another function because we switched to string view when reading, but this type is not yet supported by bit_length

Is it possible this memory issue is related to bit_length only? Maybe we can support string view for bit_length then test it again

@alamb
Copy link
Contributor

alamb commented Nov 19, 2024

Is it possible this memory issue is related to bit_length only? Maybe we can support string view for bit_length then test it again

I think this was recently merged

@blaginin
Copy link
Contributor

Just tested, everything works well with bit_length too

image

@alamb
Copy link
Contributor

alamb commented Nov 26, 2024

Nice!

So shall we close this ticket? Or is there still more work to do?

@blaginin
Copy link
Contributor

IMO we can close it if @samuelcolvin doesn't mind 🙂

@samuelcolvin
Copy link
Contributor Author

Closing, can always create a new issue if someone reproduces.

@samuelcolvin
Copy link
Contributor Author

Thank you so much for working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants