You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In highly parallel workloads, I almost always recommend storage = "worker" and retrieval = "worker" so that the parallel workers manage the data instead of putting the burden on the main process. The default values for both are "main" only because I envisioned cloud-based workloads where data may need to travel over the network. In practice, cloud pipelines almost always have some kind of object storage (or database storage) such as an S3 bucket. So I think I should set "worker" as the default for both. But first, I would just like to check with those of you who follow the discussions to see if this would have any unintended consequences.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Help
Description
In highly parallel workloads, I almost always recommend
storage = "worker"
andretrieval = "worker"
so that the parallel workers manage the data instead of putting the burden on the main process. The default values for both are"main"
only because I envisioned cloud-based workloads where data may need to travel over the network. In practice, cloud pipelines almost always have some kind of object storage (or database storage) such as an S3 bucket. So I think I should set"worker"
as the default for both. But first, I would just like to check with those of you who follow the discussions to see if this would have any unintended consequences.Beta Was this translation helpful? Give feedback.
All reactions