You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In python/lib.rs, the first thing that happens on write_to_deltalake is to collect to batches to a Vec. This loads all RecordBatches into RAM, no? This seems like not a good thing to me. I think the main reason is that write.rs tries to get the schema from the batches, but the schema would have been known in python anyway, so why not pass it directly?
Use Case
I don't want to waste resources ;)
Related Issue(s)
The text was updated successfully, but these errors were encountered:
@aersam correct, it's not the efficient way to do that :) Will already mentioned an improvement over that, which I've logged here, no one is working on that yet, so if you want to pick it up feel free :D #1984
Description
In python/lib.rs, the first thing that happens on
write_to_deltalake
is to collect to batches to a Vec. This loads all RecordBatches into RAM, no? This seems like not a good thing to me. I think the main reason is that write.rs tries to get the schema from the batches, but the schema would have been known in python anyway, so why not pass it directly?Use Case
I don't want to waste resources ;)
Related Issue(s)
The text was updated successfully, but these errors were encountered: