Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load balancing debugging info #728

Open
angelhof opened this issue Nov 11, 2024 · 0 comments
Open

Load balancing debugging info #728

angelhof opened this issue Nov 11, 2024 · 0 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@angelhof
Copy link
Member

PaSh currently does not do any rebalancing of outputs between stages of the pipeline with the same width. This could end up in pathological scenarios, e.g., when the input of a program cat IN | cmd1 | cmd2 is one line, cmd1 and cmd2 are both stateless, and cmd1 creates a bunch of lines that can then be processed by cmd2 in parallel, PaSh will not get any parallelism in this case.

To help identify such pathological scenarios it would be great if we could add a flag that adds logging nodes in parts of the dataflow that print how many lines and bytes go through them.

The steps to get this done would be to:

  1. Implement a command that simply forwards its input to its output (no buffering like dgsh-tee), but also measures and prints the number of bytes and lines at the end.
  2. Add this command after each stage of the dataflow graph to get the load for each different parallel line of the graph
  3. (optional) Create a simple post-processing tool that can present the output in a nice way (relative loads or even in a plot, see for example --graphviz option in current PaSh).
@angelhof angelhof added enhancement New feature or request help wanted Extra attention is needed labels Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant