Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DecryptSteam causes memory bloat when working with in-memory pipelines #116

Open
aboone-fusion opened this issue Feb 9, 2021 · 3 comments

Comments

@aboone-fusion
Copy link

aboone-fusion commented Feb 9, 2021

All versions of DecryptSteam end up calling Stream.PipeAll which reads the entire input and writes it to the output stream immediately. That works fine when the output stream is intended to immediately accept the entire data streams payload (i.e a filestream for write), but either creates memory pressure for in-memory (pipeline) use cases or requires intermediate I/O to flush the result then open the result as a new stream. (i.e. DecryptStream is more akin to CopyTo, then the traditional case of a stream constructor)

My use case was files (filename.csv.gz.pgp) that are decrypted, decompressed, and then parsed for content.

To work around, I copied a DecryptStream implementation, but returned a BufferedStream wrapping Ld.GetInputStream() and a managed Dispose(), leaving out the PipeAll.

The output stream could then be passed to another stream constructor (e.g. GZipStream) and would not load the entire file in memory at any time. This approach consumes around 50k memory for large files during the input filestream's life (my usage case was 700mb files). And fits well with existing platforms/streams.

workflow example for a function returning an IEnumerable of data records, without all the dispose/using context
Stream pipe = new System.IO.FileStream(path, FileMode.Open); pipe = new PgpCoreWrapper.DecryptionStream(pipe, keys, password); pipe = new System.IO.Compression.GZipStream(pipe, CompressionMode.Decompress); pipe = new System.IO.StreamReader(pipe); csv = new CsvHelper.CsvReader(pipe, config); ...parse csv data... yield return new record(data)

@mattosaurus
Copy link
Owner

Nice, I tried to reduce memory usage previously by tweaking the stream usage but obviously wasn't completly successful.

Is this something you'd be able to do and submit a PR for?

Or if not provide a full worked example that I can base my own PR off of.

@aboone-fusion
Copy link
Author

I put in a PR, for an example/some structure. Its only the one decrypt workflow. More work is needed to be fully featured, but there is framework to implement other workflows. The only tricky part, I think, is handling the IsIntegrityProtected logic after the stream has been fully read. I've left behind a a question of if Stream.CopyTo internally calls Stream.Read, or bypasses it. I've assumed Read is bypassed, but if not, then it is calling the integrity check twice.

@mattosaurus
Copy link
Owner

Thanks for that, I'll take a look at this when I get a chance and hopefully update PgpCore to be a bit more efficent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants