You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are using Ansible installed on a remote machine, which is accessed by an Ansible task in Azure Pipelines with the “ansibleInterface: ‘remoteMachine’” setting. In this scenario, node.exe initiates, establishes an SSH connection to the remote machine, and executes commands over this SSH connection.
However, we’ve noticed an issue where, if a transient network disruption occurs between the Agent machine and the Ansible machine while node.exe is awaiting a response indicating command completion from Ansible, node.exe continues to wait indefinitely for the response. This persistent waiting ultimately leads to the job being cancelled due to reaching the job’s timeout limit.
This issue needs to be addressed to prevent unnecessary job cancellations and to improve the robustness of the system against transient network issues.
Environment type (Please select at least one enviroment where you face this issue)
Self-Hosted
Microsoft Hosted
VMSS Pool
Container
Azure DevOps Server type
dev.azure.com (formerly visualstudio.com)
Azure DevOps Server Version (if applicable)
N/A
Operation system
Windows 10 for the agent / Linux for Ansible
Relevant log output
// Ansible's logIn the remote machine, the commands completed after 35 minutes. (Roughly translated from Japanese to English)----------[2024-06-11 15:44:38] [2024-06-11 15:44:38] Job "xxxxx"."xxxxx" completed successfully, Tuesday June 11th 15:44:36 2024, elapsed 0 00:39:14.----------// Pipeline logWe can see that the job was just canceled without any errors.----------2024-06-11T06:06:28.4122622Z ##[section]Starting: Run ******2024-06-11T06:06:28.4289370Z ==============================================================================2024-06-11T06:06:28.4290181Z Task : Ansible2024-06-11T06:06:28.4290734Z Description : This task executes an Ansible playbook using a specified inventory via command line interface2024-06-11T06:06:28.4291266Z Version : 0.230.22024-06-11T06:06:28.4291712Z Author : Microsoft Corporation2024-06-11T06:06:28.4292277Z Help : [More Information](https://go.microsoft.com/fwlink/?linkid=853835)2024-06-11T06:06:28.4292835Z ==============================================================================2024-06-11T06:06:29.2645926Z Trying to setup SSH connection to ***@10.***.***.4:22.....2024-06-11T06:06:30.6515686Z2024-06-11T07:06:22.7701158Z ##[error]The operation was canceled.----------// Agent's Worker.log
We can see that node.exe was started and then killed including child processes due to the job cancellation request.
----------
[2024-06-11 06:06:28Z INFO ProcessInvokerWrapper] Starting process:
[2024-06-11 06:06:28Z INFO ProcessInvokerWrapper] File name: 'D:\agent\selfagent01\externals\node\bin\node.exe'
[2024-06-11 06:06:28Z INFO ProcessInvokerWrapper] Arguments: '"D:\agent\selfagent01\_work\_tasks\Ansible_6f650d20-9c5d-4cce-ad66-e68742ceadf5\0.230.2\main.js"'
.....
[2024-06-11 06:06:28Z INFO ProcessInvokerWrapper] Process started with process id 17228, waiting for process exit.
.....
[2024-06-11 06:06:44Z INFO JobServerQueue] Stop aggressive process web console line queue.
[2024-06-11 07:06:22Z INFO Worker] Cancellation/Shutdown message received.
[2024-06-11 07:06:22Z INFO ExpressionManager] Evaluating: SucceededNode()
[2024-06-11 07:06:22Z INFO ExpressionManager] Result: False
[2024-06-11 07:06:22Z INFO StepsRunner] Cancel current running step.
[2024-06-11 07:06:22Z INFO ProcessInvokerWrapper] Sending CTRL_C to process 17228.
[2024-06-11 07:06:22Z INFO ProcessInvokerWrapper] Successfully sent CTRL_C to process 17228.
[2024-06-11 07:06:22Z INFO ProcessInvokerWrapper] Waiting for process exit or 7.5 seconds after CTRL_C signal fired.
[2024-06-11 07:06:22Z INFO ProcessInvokerWrapper] Ignore Ctrl+C to current process.
[2024-06-11 07:06:22Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2024-06-11 07:06:22Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2024-06-11 07:06:22Z INFO ProcessInvokerWrapper] Kill entire process tree since both cancel and terminate signal has been ignored by the target process.
[2024-06-11 07:06:22Z INFO ProcessInvokerWrapper] Exited process 17228 with exit code -1073741510
[2024-06-11 07:06:22Z INFO ProcessInvokerWrapper] Finished process 17228 with exit code -1073741510, and elapsed time 00:59:53.7050483.
----------
Full task logs with system.debug enabled
N/A
Repro steps
1) Configure an agent to your machine.2) Install Ansible in the different machine from the agent.3) Create a pipeline which uses the self-hosted agent and has the Ansible task like this. It would be better if the command(s) would take longer to complete.
- task: Ansible@0inputs:
ansibleInterface: 'remoteMachine'connectionOverSsh: 'connectionToAnsible'playbookSourceRemoteMachine: 'ansibleMachine'playbookPathAnsibleMachineOnRemoteMachine: *****.ymlinventoriesRemoteMachine: 'file'inventoryFileSourceRemoteMachine: 'ansibleMachine'inventoryFileAnsibleMachineOnRemoteMachine: *****.txtargs: --extra-vars "*****"failOnStdErr: falsedisplayName: *****4) Run the pipeline. 5) Disconnect between the machines.
The text was updated successfully, but these errors were encountered:
New issue checklist
Extension name
Ansible
Extension version
0.230.2
Issue Description
We are using Ansible installed on a remote machine, which is accessed by an Ansible task in Azure Pipelines with the “ansibleInterface: ‘remoteMachine’” setting. In this scenario, node.exe initiates, establishes an SSH connection to the remote machine, and executes commands over this SSH connection.
However, we’ve noticed an issue where, if a transient network disruption occurs between the Agent machine and the Ansible machine while node.exe is awaiting a response indicating command completion from Ansible, node.exe continues to wait indefinitely for the response. This persistent waiting ultimately leads to the job being cancelled due to reaching the job’s timeout limit.
This issue needs to be addressed to prevent unnecessary job cancellations and to improve the robustness of the system against transient network issues.
Environment type (Please select at least one enviroment where you face this issue)
Azure DevOps Server type
dev.azure.com (formerly visualstudio.com)
Azure DevOps Server Version (if applicable)
N/A
Operation system
Windows 10 for the agent / Linux for Ansible
Relevant log output
Full task logs with system.debug enabled
N/A
Repro steps
The text was updated successfully, but these errors were encountered: