UI improvements for one-shot mode (--disconnect-after-job) #3053

maleadt · 2024-10-24T09:24:33Z

I'm using buildkite-agent with --disconnect-after-job for containerization reasons, running all of buildkite-agent under sandbox-exec or docker. In this scenario, the WebUI always reports at most one job per agent, as apparently reconnecting the agent (which I do after every job) results in a new agent ID.

For example, every idle agent reports:

... while an active agent will only report a single job, i.e., the one it's currently running:

This is not great, as it complicates debugging worker issues, with no easy way to inspect all runs by this agent.
I'd be happy to have to specify my own UID for this, so that the WebUI can correctly group runs.

The text was updated successfully, but these errors were encountered:

DrJosh9000 · 2024-10-30T03:23:27Z

Hi @maleadt, thanks for raising the issue.

I'm not sure what we're able to do about this, however: if an agent is started in an ephemeral environment, it's not clear to me how we'd be able to link it to agents that ran in previous ephemeral environments.

Unless set, the hostname that the OS reports to the agent probably varies between Docker containers. (If the hostname is set to some fixed name, that's a whiff of non-ephemerality).
The underlying node running a particular string of executions might change in an environment like Kubernetes, and might not be exposed inside the container.
Each agent registration token is designed to be used for multiple agents, whether they are in parallel or in serial one-shot runs.

(Edit after thinking more about the suggestion for a user-supplied ID for this:) I do agree there could be more ways to display and group agents in the web UI, and maybe tags would work as a way to define groupings?

maleadt · 2024-10-31T18:18:48Z

maybe tags would work as a way to define groupings?

No easily; in our case we have many workers with identical tags, and I wanted to use the CI to debug a specific worker's inability to complete jobs (after an OS upgrade broke it). So grouping according to tags, while useful in some cases, wouldn't have helped me.

I'd be fine with fixing the container's hostname, or specify anything else to indicate to the agent that different ephemeral environments should map onto a single one. In my case, we have a single systemd service for each agent we launch on a host, e.g., buildkite-agent@$HOST.$AGENT. That service basically launches buildkite-agent --disconnect-after-job in a while loop, under Docker for containerization, so that'd be a straightforward place to add --agent-id=$HOST.$AGENT

That could of course be implemented using tags by adding an "artificial" agent=$HOST.AGENT tag, but that feels a little off.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UI improvements for one-shot mode (--disconnect-after-job) #3053

UI improvements for one-shot mode (--disconnect-after-job) #3053

maleadt commented Oct 24, 2024

DrJosh9000 commented Oct 30, 2024 •

edited

Loading

maleadt commented Oct 31, 2024

UI improvements for one-shot mode (--disconnect-after-job) #3053

UI improvements for one-shot mode (--disconnect-after-job) #3053

Comments

maleadt commented Oct 24, 2024

DrJosh9000 commented Oct 30, 2024 • edited Loading

maleadt commented Oct 31, 2024

DrJosh9000 commented Oct 30, 2024 •

edited

Loading