Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI improvements for one-shot mode (--disconnect-after-job) #3053

Open
maleadt opened this issue Oct 24, 2024 · 2 comments
Open

UI improvements for one-shot mode (--disconnect-after-job) #3053

maleadt opened this issue Oct 24, 2024 · 2 comments

Comments

@maleadt
Copy link

maleadt commented Oct 24, 2024

I'm using buildkite-agent with --disconnect-after-job for containerization reasons, running all of buildkite-agent under sandbox-exec or docker. In this scenario, the WebUI always reports at most one job per agent, as apparently reconnecting the agent (which I do after every job) results in a new agent ID.

For example, every idle agent reports:

image

... while an active agent will only report a single job, i.e., the one it's currently running:

image

This is not great, as it complicates debugging worker issues, with no easy way to inspect all runs by this agent.
I'd be happy to have to specify my own UID for this, so that the WebUI can correctly group runs.

@DrJosh9000
Copy link
Contributor

DrJosh9000 commented Oct 30, 2024

Hi @maleadt, thanks for raising the issue.

I'm not sure what we're able to do about this, however: if an agent is started in an ephemeral environment, it's not clear to me how we'd be able to link it to agents that ran in previous ephemeral environments.

  • Unless set, the hostname that the OS reports to the agent probably varies between Docker containers. (If the hostname is set to some fixed name, that's a whiff of non-ephemerality).
  • The underlying node running a particular string of executions might change in an environment like Kubernetes, and might not be exposed inside the container.
  • Each agent registration token is designed to be used for multiple agents, whether they are in parallel or in serial one-shot runs.

(Edit after thinking more about the suggestion for a user-supplied ID for this:) I do agree there could be more ways to display and group agents in the web UI, and maybe tags would work as a way to define groupings?

@maleadt
Copy link
Author

maleadt commented Oct 31, 2024

maybe tags would work as a way to define groupings?

No easily; in our case we have many workers with identical tags, and I wanted to use the CI to debug a specific worker's inability to complete jobs (after an OS upgrade broke it). So grouping according to tags, while useful in some cases, wouldn't have helped me.

I'd be fine with fixing the container's hostname, or specify anything else to indicate to the agent that different ephemeral environments should map onto a single one. In my case, we have a single systemd service for each agent we launch on a host, e.g., buildkite-agent@$HOST.$AGENT. That service basically launches buildkite-agent --disconnect-after-job in a while loop, under Docker for containerization, so that'd be a straightforward place to add --agent-id=$HOST.$AGENT

That could of course be implemented using tags by adding an "artificial" agent=$HOST.AGENT tag, but that feels a little off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants