You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is costly to pull down files and write them to disk unnecessarily. For sufficiently large files, this will break the ingest/derivative pipeline. This is made worse by attempts at job parallelization, where each job (potentially serviced on a different worker box) incurs this cost. But it is possible to avoid this problem.
Even though we are forking to shell for many of the non-ruby derivative processors, we should avoid forcing the input (and ideally output) to be literal filesystem files, when there is no such legitimate need:
This also allows optimizations for processors that don't use the bulk of a large file (e.g., only the metadata and first 2 minutes of, say, a 6 hour video). They can read until satisfied and then reset/close the IO. Most of the GBs are never pulled down, never put in memory, and never written to disk.
With a cloud-based platform like Hyku, it is very conceivable that this derivatives code is the tightest bottleneck in supporting large files.
The text was updated successfully, but these errors were encountered:
It is costly to pull down files and write them to disk unnecessarily. For sufficiently large files, this will break the ingest/derivative pipeline. This is made worse by attempts at job parallelization, where each job (potentially serviced on a different worker box) incurs this cost. But it is possible to avoid this problem.
Even though we are forking to shell for many of the non-ruby derivative processors, we should avoid forcing the input (and ideally output) to be literal filesystem files, when there is no such legitimate need:
This also allows optimizations for processors that don't use the bulk of a large file (e.g., only the metadata and first 2 minutes of, say, a 6 hour video). They can read until satisfied and then reset/close the IO. Most of the GBs are never pulled down, never put in memory, and never written to disk.
With a cloud-based platform like Hyku, it is very conceivable that this derivatives code is the tightest bottleneck in supporting large files.
The text was updated successfully, but these errors were encountered: