Optimize file streaming #51

WilfredTA · 2018-04-09T20:28:45Z

Currently, space complexity for transferring data from one node to another scales linearly with the amount of data in the shard being transferred: peak memory usage (space complexity) = O(bytesInShard)

We can push peak memory usage down to constant space complexity by using something like node.js pipe function.

Instead of fs.readFile we can pipe the data in a readStream to a tcp stream.

The JSONStream library we are using solves the problem of larger JSON objects getting parsed when being split into two when being transferred over streams. It seems like it does this by holding the JSON in memory and delaying the trigger of the data event on the JSONStream until it has received a full JSON object. That means that JSONStream's peak memory usage also scales linearly with the size of the JSON object being sent to it. I need to verify this suspicion with their source code, though.

The problem with piping smaller JSON objects that each contain a portion of the total shard data is that the shard data needs to be written in the order it was received, which is hard to manage when the data is written via event handlers.

Essentially what we need to do is write multiple chunks that are not received in order without storing all chunks in memory.

The text was updated successfully, but these errors were encountered:

WilfredTA added the Optimization Space or time complexity improvement label Apr 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize file streaming #51

Optimize file streaming #51

WilfredTA commented Apr 9, 2018

Optimize file streaming #51

Optimize file streaming #51

Comments

WilfredTA commented Apr 9, 2018