Compare memory needs for dataframe vs read/write dataframe in lmdb and aws s3 #2135
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
During experiments it was determined that we can relate creation of a dataframe with read/write processes with lmdb and aws s3 store.
If a dataframe construction tooks X ammount mem the read/write processes might tak from 1.x to 2.5 timex X ammount of that memory for dataframes . The bigger the number of rows the bigger the multiplier.
The test tries to showcase an approach for creating tests that could address some concerns in future and the example below shows that this was an actual case.
NOTE: my s3 Wifi connection reported by windows is approx 1 MBytes/s
NOTE 2: The approach uses memory_profiler which I added to needed libs.
LINK TO GITHUB EXECUTION: https://github.com/man-group/ArcticDB/actions/runs/12946758440/job/36112967427
Log from a failed execution which explains perhaps more than words
========================================================================================================= test session starts =========================================================================================================
platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0
rootdir: /home/grusev/source/dependencies_fix
configfile: pyproject.toml
plugins: memray-1.7.0, cpp-2.6.0, xdist-3.6.1, hypothesis-6.72.4, timeout-2.3.1
collected 1 item
python/tests/stress/arcticdb/version_store/test_mem_comparison.py INFO:MemCompare:Creating dataframe with 2000000 rows
INFO:MemCompare:START: create_dataframe
INFO:MemCompare:Time took: 1.589540958404541
INFO:MemCompare:create_dataframe 1820.5390625MB
INFO:MemCompare:START: mem_write_dataframe_arctic_lmdb
INFO:MemCompare:Time took: 2.4767887592315674
INFO:MemCompare:mem_write_dataframe_arctic_lmdb 2754.63671875MB
INFO:MemCompare:START: mem_read_dataframe_arctic_lmdb
INFO:MemCompare:Time took: 2.5483663082122803
INFO:MemCompare:mem_read_dataframe_arctic_lmdb 3388.60546875MB
INFO:MemCompare:START: mem_write_dataframe_arctic_aws_s3
20250123 14:26:45.704482 89818 W arcticdb | Failed to find segment for key 'r:test_dataframe' : No response body.
20250123 14:26:45.772885 89818 W arcticdb | Failed to find segment for key 'V:test_dataframe' : No response body.
20250123 14:28:59.478910 89818 W arcticdb | Failed to find segment for key 'r:test_dataframe' : No response body.
20250123 14:28:59.544956 89818 W arcticdb | Failed to find segment for key 'V:test_dataframe' : No response body.
INFO:MemCompare:Time took: 134.48248195648193
INFO:MemCompare:mem_write_dataframe_arctic_aws_s3 3769.94140625MB
INFO:MemCompare:START: mem_read_dataframe_arctic_aws_s3
INFO:MemCompare:Time took: 136.6903235912323
INFO:MemCompare:mem_read_dataframe_arctic_aws_s3 4069.9921875MB
INFO:MemCompare:REPORTED MEM USAGE: 1042000128 .. NOTE: This is not reliable
INFO:MemCompare:{<function test_read_write_memory_compare..create_dataframe at 0x7f8e33435120>: 1820.5390625, <function test_read_write_memory_compare..mem_write_dataframe_arctic_lmdb at 0x7f8e334351b0>: 2754.63671875, <function test_read_write_memory_compare..mem_read_dataframe_arctic_lmdb at 0x7f8e33435240>: 3388.60546875, <function test_read_write_memory_compare..mem_write_dataframe_arctic_aws_s3 at 0x7f8e334352d0>: 3769.94140625, <function test_read_write_memory_compare..mem_read_dataframe_arctic_aws_s3 at 0x7f8e33435360>: 4069.9921875}
INFO:MemCompare:We assume <function test_read_write_memory_compare..create_dataframe at 0x7f8e33435120> is 1.8 times more efficient than <function test_read_write_memory_compare..mem_write_dataframe_arctic_lmdb at 0x7f8e334351b0>
INFO:MemCompare:ACTUAL Efficiency factor is : 1.5130884997146279
INFO:MemCompare:Check OK for: mem_write_dataframe_arctic_lmdb
INFO:MemCompare:We assume <function test_read_write_memory_compare..create_dataframe at 0x7f8e33435120> is 1.8 times more efficient than <function test_read_write_memory_compare..mem_read_dataframe_arctic_lmdb at 0x7f8e33435240>
INFO:MemCompare:ACTUAL Efficiency factor is : 1.8613198357286003
ERROR:MemCompare:Too big memory for mem_read_dataframe_arctic_lmdb [3388.60546875] MB compared to calculated threshold 3276.9703125 MB
[base was 1820.5390625],
File: /home/grusev/source/dependencies_fix/python/tests/stress/arcticdb/version_store/test_mem_comparison.py:139
INFO:MemCompare:We assume <function test_read_write_memory_compare..create_dataframe at 0x7f8e33435120> is 1.8 times more efficient than <function test_read_write_memory_compare..mem_write_dataframe_arctic_aws_s3 at 0x7f8e334352d0>
INFO:MemCompare:ACTUAL Efficiency factor is : 2.070783035587845
ERROR:MemCompare:Too big memory for mem_write_dataframe_arctic_aws_s3 [3769.94140625] MB compared to calculated threshold 3276.9703125 MB
[base was 1820.5390625],
File: /home/grusev/source/dependencies_fix/python/tests/stress/arcticdb/version_store/test_mem_comparison.py:140
INFO:MemCompare:We assume <function test_read_write_memory_compare..create_dataframe at 0x7f8e33435120> is 1.8 times more efficient than <function test_read_write_memory_compare..mem_read_dataframe_arctic_aws_s3 at 0x7f8e33435360>
INFO:MemCompare:ACTUAL Efficiency factor is : 2.235597286174682
ERROR:MemCompare:Too big memory for mem_read_dataframe_arctic_aws_s3 [4069.9921875] MB compared to calculated threshold 3276.9703125 MB
[base was 1820.5390625],
File: /home/grusev/source/dependencies_fix/python/tests/stress/arcticdb/version_store/test_mem_comparison.py:141
INFO:MemCompare:We assume <function test_read_write_memory_compare..mem_write_dataframe_arctic_lmdb at 0x7f8e334351b0> is 1.4 times more efficient than <function test_read_write_memory_compare..mem_write_dataframe_arctic_aws_s3 at 0x7f8e334352d0>
INFO:MemCompare:ACTUAL Efficiency factor is : 1.3685802489268803
INFO:MemCompare:Check OK for: mem_write_dataframe_arctic_aws_s3
INFO:MemCompare:We assume <function test_read_write_memory_compare..mem_read_dataframe_arctic_lmdb at 0x7f8e33435240> is 1.4 times more efficient than <function test_read_write_memory_compare..mem_read_dataframe_arctic_aws_s3 at 0x7f8e33435360>
INFO:MemCompare:ACTUAL Efficiency factor is : 1.2010817503051934
INFO:MemCompare:Check OK for: mem_read_dataframe_arctic_aws_s3
FDelete library : test_read_write_memory_compare.413_2025-01-23T12_25_43_362884
============================================================================================================== FAILURES ===============================================================================================================
___________________________________________________________________________________________________ test_read_write_memory_compare ____________________________________________________________________________________________________
lmdb_library = Library(Arctic(config=LMDB(path=/tmp/pytest-of-grusev/pytest-25/test_read_write_memory_compare0)), path=test_read_write_memory_compare.413_2025-01-23T12_25_43_362884, storage=lmdb_storage)
real_s3_library = Library(Arctic(config=S3(endpoint=s3.eu-west-1.amazonaws.com, bucket=arcticdb-ci-test-bucket-02)), path=test_read_write_memory_compare.413_2025-01-23T12_25_43_362884, storage=s3_storage)
E AssertionError: Errors ['Too big memory for mem_read_dataframe_arctic_lmdb [3388.60546875] MB compared to calculated threshold 3276.9703125 MB \n [base was 1820.5390625],\n File: /home/grusev/source/dependencies_fix/python/tests/stress/arcticdb/version_store/test_mem_comparison.py:139\n\n', 'Too big memory for mem_write_dataframe_arctic_aws_s3 [3769.94140625] MB compared to calculated threshold 3276.9703125 MB \n [base was 1820.5390625],\n File: /home/grusev/source/dependencies_fix/python/tests/stress/arcticdb/version_store/test_mem_comparison.py:140\n\n', 'Too big memory for mem_read_dataframe_arctic_aws_s3 [4069.9921875] MB compared to calculated threshold 3276.9703125 MB \n [base was 1820.5390625],\n File: /home/grusev/source/dependencies_fix/python/tests/stress/arcticdb/version_store/test_mem_comparison.py:141\n\n']
E assert False
python/tests/stress/arcticdb/version_store/test_mem_comparison.py:146: AssertionError
---------------------------------------------------------------------------------------------------------- Captured log call ----------------------------------------------------------------------------------------------------------
INFO MemCompare:test_mem_comparison.py:60 Creating dataframe with 2000000 rows
INFO MemCompare:test_mem_comparison.py:123 START: create_dataframe
INFO MemCompare:test_mem_comparison.py:126 Time took: 1.589540958404541
INFO MemCompare:test_mem_comparison.py:129 create_dataframe 1820.5390625MB
INFO MemCompare:test_mem_comparison.py:123 START: mem_write_dataframe_arctic_lmdb
INFO MemCompare:test_mem_comparison.py:126 Time took: 2.4767887592315674
INFO MemCompare:test_mem_comparison.py:129 mem_write_dataframe_arctic_lmdb 2754.63671875MB
INFO MemCompare:test_mem_comparison.py:123 START: mem_read_dataframe_arctic_lmdb
INFO MemCompare:test_mem_comparison.py:126 Time took: 2.5483663082122803
INFO MemCompare:test_mem_comparison.py:129 mem_read_dataframe_arctic_lmdb 3388.60546875MB
INFO MemCompare:test_mem_comparison.py:123 START: mem_write_dataframe_arctic_aws_s3
INFO MemCompare:test_mem_comparison.py:126 Time took: 134.48248195648193
INFO MemCompare:test_mem_comparison.py:129 mem_write_dataframe_arctic_aws_s3 3769.94140625MB
INFO MemCompare:test_mem_comparison.py:123 START: mem_read_dataframe_arctic_aws_s3
INFO MemCompare:test_mem_comparison.py:126 Time took: 136.6903235912323
INFO MemCompare:test_mem_comparison.py:129 mem_read_dataframe_arctic_aws_s3 4069.9921875MB
INFO MemCompare:test_mem_comparison.py:134 REPORTED MEM USAGE: 1042000128 .. NOTE: This is not reliable
INFO MemCompare:test_mem_comparison.py:136 {<function test_read_write_memory_compare..create_dataframe at 0x7f8e33435120>: 1820.5390625, <function test_read_write_memory_compare..mem_write_dataframe_arctic_lmdb at 0x7f8e334351b0>: 2754.63671875, <function test_read_write_memory_compare..mem_read_dataframe_arctic_lmdb at 0x7f8e33435240>: 3388.60546875, <function test_read_write_memory_compare..mem_write_dataframe_arctic_aws_s3 at 0x7f8e334352d0>: 3769.94140625, <function test_read_write_memory_compare..mem_read_dataframe_arctic_aws_s3 at 0x7f8e33435360>: 4069.9921875}
INFO MemCompare:test_mem_comparison.py:89 We assume <function test_read_write_memory_compare..create_dataframe at 0x7f8e33435120> is 1.8 times more efficient than <function test_read_write_memory_compare..mem_write_dataframe_arctic_lmdb at 0x7f8e334351b0>
INFO MemCompare:test_mem_comparison.py:90 ACTUAL Efficiency factor is : 1.5130884997146279
INFO MemCompare:test_mem_comparison.py:98 Check OK for: mem_write_dataframe_arctic_lmdb
INFO MemCompare:test_mem_comparison.py:89 We assume <function test_read_write_memory_compare..create_dataframe at 0x7f8e33435120> is 1.8 times more efficient than <function test_read_write_memory_compare..mem_read_dataframe_arctic_lmdb at 0x7f8e33435240>
INFO MemCompare:test_mem_comparison.py:90 ACTUAL Efficiency factor is : 1.8613198357286003
ERROR MemCompare:test_mem_comparison.py:96 Too big memory for mem_read_dataframe_arctic_lmdb [3388.60546875] MB compared to calculated threshold 3276.9703125 MB
[base was 1820.5390625],
File: /home/grusev/source/dependencies_fix/python/tests/stress/arcticdb/version_store/test_mem_comparison.py:139
INFO MemCompare:test_mem_comparison.py:89 We assume <function test_read_write_memory_compare..create_dataframe at 0x7f8e33435120> is 1.8 times more efficient than <function test_read_write_memory_compare..mem_write_dataframe_arctic_aws_s3 at 0x7f8e334352d0>
INFO MemCompare:test_mem_comparison.py:90 ACTUAL Efficiency factor is : 2.070783035587845
ERROR MemCompare:test_mem_comparison.py:96 Too big memory for mem_write_dataframe_arctic_aws_s3 [3769.94140625] MB compared to calculated threshold 3276.9703125 MB
[base was 1820.5390625],
File: /home/grusev/source/dependencies_fix/python/tests/stress/arcticdb/version_store/test_mem_comparison.py:140
INFO MemCompare:test_mem_comparison.py:89 We assume <function test_read_write_memory_compare..create_dataframe at 0x7f8e33435120> is 1.8 times more efficient than <function test_read_write_memory_compare..mem_read_dataframe_arctic_aws_s3 at 0x7f8e33435360>
INFO MemCompare:test_mem_comparison.py:90 ACTUAL Efficiency factor is : 2.235597286174682
ERROR MemCompare:test_mem_comparison.py:96 Too big memory for mem_read_dataframe_arctic_aws_s3 [4069.9921875] MB compared to calculated threshold 3276.9703125 MB
[base was 1820.5390625],
File: /home/grusev/source/dependencies_fix/python/tests/stress/arcticdb/version_store/test_mem_comparison.py:141
INFO MemCompare:test_mem_comparison.py:89 We assume <function test_read_write_memory_compare..mem_write_dataframe_arctic_lmdb at 0x7f8e334351b0> is 1.4 times more efficient than <function test_read_write_memory_compare..mem_write_dataframe_arctic_aws_s3 at 0x7f8e334352d0>
INFO MemCompare:test_mem_comparison.py:90 ACTUAL Efficiency factor is : 1.3685802489268803
INFO MemCompare:test_mem_comparison.py:98 Check OK for: mem_write_dataframe_arctic_aws_s3
INFO MemCompare:test_mem_comparison.py:89 We assume <function test_read_write_memory_compare..mem_read_dataframe_arctic_lmdb at 0x7f8e33435240> is 1.4 times more efficient than <function test_read_write_memory_compare..mem_read_dataframe_arctic_aws_s3 at 0x7f8e33435360>
INFO MemCompare:test_mem_comparison.py:90 ACTUAL Efficiency factor is : 1.2010817503051934
INFO MemCompare:test_mem_comparison.py:98 Check OK for: mem_read_dataframe_arctic_aws_s3
========================================================================================================== warnings summary ===========================================================================================================
python/tests/stress/arcticdb/version_store/test_mem_comparison.py::test_read_write_memory_compare
python/tests/stress/arcticdb/version_store/test_mem_comparison.py::test_read_write_memory_compare
/home/grusev/venvs/310/lib/python3.10/site-packages/pandas/core/frame.py:717: DeprecationWarning: Passing a BlockManagerUnconsolidated to DataFrame is deprecated and will raise in a future version. Use public APIs instead.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================================================================= short test summary info =======================================================================================================
FAILED python/tests/stress/arcticdb/version_store/test_mem_comparison.py::test_read_write_memory_compare - AssertionError: Errors ['Too big memory for mem_read_dataframe_arctic_lmdb [3388.60546875] MB compared to calculated threshold 3276.9703125 MB \n [base was 1820.5390625],\n File: /home/grusev/source/dependencies_fix/python/t...
============================================================================================== 1 failed, 2 warnings in 338.61s (0:05:38) ==============================================================================================
What does this implement or fix?
Any other comments?
Checklist
Checklist for code changes...