You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 17, 2021. It is now read-only.
Currently, space complexity for transferring data from one node to another scales linearly with the amount of data in the shard being transferred: peak memory usage (space complexity) = O(bytesInShard)
We can push peak memory usage down to constant space complexity by using something like node.js pipe function.
Instead of fs.readFile we can pipe the data in a readStream to a tcp stream.
The JSONStream library we are using solves the problem of larger JSON objects getting parsed when being split into two when being transferred over streams. It seems like it does this by holding the JSON in memory and delaying the trigger of the data event on the JSONStream until it has received a full JSON object. That means that JSONStream's peak memory usage also scales linearly with the size of the JSON object being sent to it. I need to verify this suspicion with their source code, though.
The problem with piping smaller JSON objects that each contain a portion of the total shard data is that the shard data needs to be written in the order it was received, which is hard to manage when the data is written via event handlers.
Essentially what we need to do is write multiple chunks that are not received in order without storing all chunks in memory.
The text was updated successfully, but these errors were encountered:
Currently, space complexity for transferring data from one node to another scales linearly with the amount of data in the shard being transferred:
peak memory usage (space complexity) = O(bytesInShard)
We can push peak memory usage down to constant space complexity by using something like node.js
pipe
function.Instead of
fs.readFile
we can pipe the data in a readStream to a tcp stream.The JSONStream library we are using solves the problem of larger JSON objects getting parsed when being split into two when being transferred over streams. It seems like it does this by holding the JSON in memory and delaying the trigger of the
data
event on the JSONStream until it has received a full JSON object. That means that JSONStream's peak memory usage also scales linearly with the size of the JSON object being sent to it. I need to verify this suspicion with their source code, though.The problem with piping smaller JSON objects that each contain a portion of the total shard data is that the shard data needs to be written in the order it was received, which is hard to manage when the data is written via event handlers.
Essentially what we need to do is write multiple chunks that are not received in order without storing all chunks in memory.
The text was updated successfully, but these errors were encountered: