Replies: 3 comments 1 reply
-
A few questions to kick off more discussion...
Have to admit: I think I'm initially skeptical that this would be useful enough to be worth it. It's partially from ignorance about sendfile, but from a short nosey around https://man7.org/linux/man-pages/man2/sendfile.2.html the source I don't think can be a socket. So you have to have it as a "real" file on the filesystem. So it sort of defeats a lot of the purpose of stream-zip - not having to have anything on disk (unless it's a ramdisk-y thing, but they would have to be copied to get there...) |
Beta Was this translation helpful? Give feedback.
-
Ah yes - so not just streaming the output, but the fact that this happens while streaming input. So right now, stream-zip doesn't depend on even having all the source data of even the first file locally when you start to output. There is no random access to any source data. It's iterables of bytes in -> an iterable of bytes out.
Ah understood
From https://man7.org/linux/man-pages/man2/sendfile.2.html it looks like the answer is no:
So I think main main thoughts are around:
But... you would have to get the files onto the disk in the first place. If this is all in a web server of some application, by and large the source data would not already be on that web server(?) It'll be elsewhere: S3, some other remote HTTP service, maybe a database etc. You could save them to the local disk deliberately, but that'll negate the benefits? And then also each file would have to be on the disk in their entirety before starting to output its corresponding part of the ZIP, so adding a potential timeout issue for big files. And: I have just realised as well: you need the CRC32 of each member which adds another layer of complication and stuff to happen before it hits that disk... So I think cases where this would have a performance benefit all of the below would have to be true:
This feels like such a specific case. Maybe (as in your example if I'm understanding) a few "static" files might satisfy these, but I guess I am thinking the benefits would be extremely minor in all but the most extreme cases. Plus... We would need some way to be able to sendfile multiple files, and with some other bytes in between them. I'm sort of assuming this is all for HTTP responses (or requests?). Do any existing HTTP servers (or clients?) support this? I am still a bit curious as to what the code would look like that uses this feature, were it to exist. |
Beta Was this translation helpful? Give feedback.
-
So I might only now be understanding some of this... So yes: could stream and save to disk somewhere at the same time, ready to make it more efficient next time by using sendfile. But: it's not exactly friendly to horizontally scaling? The user would need to want some of the same member files, and they would have to hit the same server as they did before for there to be a benefit. And, you've hit the problem of cache invalidation too. And, you need to make really sure you never end up sending someone else's files accidentally... And you're then also limited by the disk space on each server. And, it would all have to be shared between all users. Or put another way - it's not very 12 factor app? I'm skeptical doing all this just for sendfile is worth it for the vast majority of cases... Or the right direction of travel to depend more on local disk rather than less. But also, even if there is a case out there where it's the best thing for some reason, unless the change in stream-zip to support this is almost trivial somehow (where I would probably say "why not?") I think I'm leaning to it being beyond the scope of the project. stream-zip is (streaming) memory to (streaming) memory. This is wonderfully both performant and flexible enough for quite a lot of cases, with a reasonably simple API. I think I fear anything that'll complicate this API or the internals, and ultimately result in things being worse for the memory -> memory case. |
Beta Was this translation helpful? Give feedback.
-
The sendfile API, uses the kernel to send all the bytes of a file over the network, with an optional offset to start at.
If the entries of the zip file that represent each file were actual files, sendfile could be used to send them using very little resources. We could probably still send other stuff like metadata manually as there's not much of it.
Beta Was this translation helpful? Give feedback.
All reactions