You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NOTE: To get the best support experience for bug fixes, please go to https://cloud.google.com/support-hub and follow the instructions. In comparison, Bug reports filed in this repo only have best effort support, and do not have guaranteed response / resolution SLOs
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Create a custom image with Ops Agent (2.45.0@1) baked with 3rd party application
Create instance template with startup scripts which copy custom config files for Ops Agent configuration and restart ops agent
Create a MIG with autoscaling enabled, when new VMs are created due to autoscaling, ops agent in some VMs fails to restart
Expected behavior
Ops Agent should restart because our autoscaling is based on custom prometheus metrics fetched by ops-agent
Environment (please complete the following information):
VM distro / OS: windows-server-2022-dc
Ops Agent version [e.g. 2.14.0] 2.45.0@1
Ops Agent configuration - prometheus receiver
Ops Agent log INFO 2024-01-16T14:04:35.492451600Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: Restarting Ops Agent to Load the Config
INFO 2024-01-16T14:04:35.692183900Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: Restart-Service : Service 'Google Cloud Ops Agent (google-cloud-ops-agent)' cannot be stopped due to the following
INFO 2024-01-16T14:04:35.727365100Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: error: Cannot stop google-cloud-ops-agent-opentelemetry-collector service on computer '.'.
INFO 2024-01-16T14:04:35.743086300Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: At C:\UNICOM\Scripts\Specialize-IntelligenceGoogleOpsAgent.ps1:54 char:5
INFO 2024-01-16T14:04:35.774301700Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: + Restart-Service google-cloud-ops-agent -Force
INFO 2024-01-16T14:04:35.806250700Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We would need to see your startup scripts in order to troubleshoot this effectively.
That being said, Restart-Service is notoriously fragile if it races with a service that's already in the process of starting up, which may be the case here. You can try inserting the following line before any Restart-Service to let it complete startup before attempting to restart:
The '00:03:00' is a timeout so that it doesn't block forever (e.g. in case there's a bad config or some other problem preventing normal startup); adjust this timeout as you need.
Since you're already creating a custom image anyway, another thing you can try instead is change the service startup type for all Ops Agent services to Manual. This should prevent the startup race and give your script a chance to copy in the configs before it starts up. Then you would call Start-Service instead of Restart-Service after copying the configs.
@jefferbrecht Thanks for getting back, we will try the below solution and will update you.
Since you're already creating a custom image anyway, another thing you can try instead is change the service startup type for all Ops Agent services to Manual. This should prevent the startup race and give your script a chance to copy in the configs before it starts up. Then you would call Start-Service instead of Restart-Service after copying the configs.
NOTE: To get the best support experience for bug fixes, please go to https://cloud.google.com/support-hub and follow the instructions. In comparison, Bug reports filed in this repo only have best effort support, and do not have guaranteed response / resolution SLOs
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Ops Agent should restart because our autoscaling is based on custom prometheus metrics fetched by ops-agent
Environment (please complete the following information):
INFO 2024-01-16T14:04:35.692183900Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: Restart-Service : Service 'Google Cloud Ops Agent (google-cloud-ops-agent)' cannot be stopped due to the following
INFO 2024-01-16T14:04:35.727365100Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: error: Cannot stop google-cloud-ops-agent-opentelemetry-collector service on computer '.'.
INFO 2024-01-16T14:04:35.743086300Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: At C:\UNICOM\Scripts\Specialize-IntelligenceGoogleOpsAgent.ps1:54 char:5
INFO 2024-01-16T14:04:35.774301700Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: + Restart-Service google-cloud-ops-agent -Force
INFO 2024-01-16T14:04:35.806250700Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
INFO 2024-01-16T14:04:35.837901700Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: + CategoryInfo : CloseError: (System.ServiceProcess.ServiceController:ServiceController) [Restart-Service
INFO 2024-01-16T14:04:35.870150600Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: ], ServiceCommandException
INFO 2024-01-16T14:04:35.896522Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: + FullyQualifiedErrorId : CouldNotStopService,Microsoft.PowerShell.Commands.RestartServiceCommand
INFO 2024-01-16T14:04:35.931813400Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1:
Additional context
Add any other context about the problem here. - Google Case 49102632
The text was updated successfully, but these errors were encountered: