-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Too much RAM consumption when using take
on a memory-mapped table
#37766
Labels
Comments
Has anyone looked into this issue? |
Mem: 16GB Got it to pass at 1850 write iters (14.8GB), and fail (iPython, Killed) at 1900 (15.2GB). I can look a bit more into this tomorrow. Pass: In [1]: import numpy as np
...: import pyarrow as pa
...: from pyarrow import feather
...:
...: rng = np.random.default_rng(1337)
...: data = rng.normal(size=(1000000,))
...: table = pa.table({'data': data})
...: sink = pa.output_stream('data.feather')
...: schema = pa.schema([('data', pa.float64())])
...: with pa.ipc.new_file(sink, schema) as writer:
...: for i in range(1850):
...: writer.write_table(table)
...:
...: table = feather.read_table('data.feather', memory_map=True)
...: print(table.take([0]))
pyarrow.Table
data: double
----
data: [[0.03826822283041585]] Fail: In [5]: import numpy as np
...: import pyarrow as pa
...: from pyarrow import feather
...:
...: rng = np.random.default_rng(1337)
...: data = rng.normal(size=(1000000,))
...: table = pa.table({'data': data})
...: sink = pa.output_stream('data.feather')
...: schema = pa.schema([('data', pa.float64())])
...: with pa.ipc.new_file(sink, schema) as writer:
...: for i in range(1900):
...: writer.write_table(table)
...:
...: table = feather.read_table('data.feather', memory_map=True)
...: print(table.take([0]))
Killed |
kou
changed the title
Too much RAM consumption when using
[Python] Too much RAM consumption when using Oct 19, 2023
take
on a memory-mapped tabletake
on a memory-mapped table
This might be caused by |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug, including details regarding any error messages, version, and platform.
I created a random array and wrote it repeatedly to an Arrow IPC file so that the whole array was too large to fit in RAM. Then, I read it by memory mapping. I could
slice
it without any problem, but when I tried to access the rows based on an arbitrary list of indices by usingtake
, the RAM usage went up until the computer hung. The code is as follows (in which the array length and the number of writes may be adjusted according to your disk space and RAM size):Component(s)
Python
The text was updated successfully, but these errors were encountered: