Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ansible awx web, operator and task pods doenst work correctly #15301

Open
5 of 11 tasks
Arjunasvr opened this issue Jun 26, 2024 · 10 comments
Open
5 of 11 tasks

Ansible awx web, operator and task pods doenst work correctly #15301

Arjunasvr opened this issue Jun 26, 2024 · 10 comments

Comments

@Arjunasvr
Copy link

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I might not receive a timely response.
  • I am NOT reporting a (potential) security vulnerability. (These should be emailed to [email protected] instead.)

Bug Summary

I tried upgrading to version 2.19.0, but the task en web pods doesnt exist anymore. I cannot access the web anymore. In minikube I cannot see that the pods are running. They just vanished. Also when I try to downgrade to 2.12.0 the task container doesnt work anymore. Can someone pls assist me in getting awx up and running again.

AWX version

operator 2.19.0

Select the relevant components

  • UI
  • UI (tech preview)
  • API
  • Docs
  • Collection
  • CLI
  • Other

Installation method

minikube

Modifications

no

Ansible version

No response

Operating system

ubuntu 22.04 lts

Web browser

Firefox, Chrome, Safari, Edge

Steps to reproduce

upgrade to awx 2.19.0 and wait

Expected results

Awx UI will be shown and container such as the task and web are running

Actual results

The task and web container is not running and not showing in the namespace for the pods.

Additional information

No response

@cnfrancis
Copy link

to confirm, you have the operator running within the same namespace right?

@Arjunasvr
Copy link
Author

to confirm, you have the operator running within the same namespace right?

Yes it is.

@mandeepmails
Copy link

@Arjunasvr
Could you share events for the pod related to awx-task-XXXXXXXX

kubectl -n awx describe pod awx-task-XXXXXXXX

I suspect your pvc is pointing to the un-shareable volume and getting deleted.

@gleupold
Copy link

gleupold commented Jul 1, 2024

Hey @Arjunasvr ,
we encountered an issue that could help you. In our scenario the configs (crds) werent updated and the 'web_manage_replicas' was undefined. There are logs within the operator while upgrading where you can find this error.
TASK [Apply deployment resources] ******************************** fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'web_manage_replicas' is undefined. 'web_manage_replicas' is undefined. 'web_manage_replicas' is undefined. 'web_manage_replicas' is undefined\n\nThe error appears to be in '/opt/ansible/roles/installer/tasks/resources_configuration.yml': line 248, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Apply deployment resources\n ^ here\n"}
After we executed:
kubectl apply --server-side -k "github.com/ansible/awx-operator/config/crd?ref=2.19.0"
The migration started right away.
See also:
ansible/awx-operator@8ead140#diff-8230d07440a5d33c9608211b63791ef41f935652ca8b8ec3d9f3c68b5ed8cc98

@Arjunasvr
Copy link
Author

@Arjunasvr Could you share events for the pod related to awx-task-XXXXXXXX

kubectl -n awx describe pod awx-task-XXXXXXXX

I suspect your pvc is pointing to the un-shareable volume and getting deleted.

I am sorry I cant do this because there is no awx-task pod

@Arjunasvr
Copy link
Author

Arjunasvr commented Jul 2, 2024

Hey @Arjunasvr , we encountered an issue that could help you. In our scenario the configs (crds) werent updated and the 'web_manage_replicas' was undefined. There are logs within the operator while upgrading where you can find this error. TASK [Apply deployment resources] ******************************** fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'web_manage_replicas' is undefined. 'web_manage_replicas' is undefined. 'web_manage_replicas' is undefined. 'web_manage_replicas' is undefined\n\nThe error appears to be in '/opt/ansible/roles/installer/tasks/resources_configuration.yml': line 248, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Apply deployment resources\n ^ here\n"} After we executed: kubectl apply --server-side -k "github.com/ansible/awx-operator/config/crd?ref=2.19.0" The migration started right away. See also: ansible/awx-operator@8ead140#diff-8230d07440a5d33c9608211b63791ef41f935652ca8b8ec3d9f3c68b5ed8cc98

Hi I tried this and it didnt work I checked some logging from the operator pod and saw this error:

5788921982606687203","name":"awx-server","namespace":"awx","error":"exit status 2","stacktrace":"github.com/operator-framework/ansible-operator-plugins/internal/ansible/runner.(*runner).Run.func1\n\tansible-operator-plugins/internal/ansible/runner/runner.go:269"}

And also I saw this:

ASK [installer : Stream backup from pg_dump to the new postgresql container] ***
task path: /opt/ansible/roles/installer/tasks/upgrade_postgres.yml:99


{"level":"info","ts":"2024-07-02T06:55:23Z","logger":"logging_event_handler","msg":"[playbook task start]","name":"awx-server","namespace":"awx","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"231178893729865755","EventData.Name":"installer : Stream backup from pg_dump to the new postgresql container"}
{"level":"info","ts":"2024-07-02T06:55:23Z","logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/api/v1/namespaces/awx/pods/awx-server-postgres-15-0","Verb":"get","APIPrefix":"api","APIGroup":"","APIVersion":"v1","Namespace":"awx","Resource":"pods","Subresource":"","Name":"awx-server-postgres-15-0","Parts":["pods","awx-server-postgres-15-0"]}}

--------------------------- Ansible Task StdOut -------------------------------

TASK [Stream backup from pg_dump to the new postgresql container] ********************************
fatal: [localhost]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": true}

Someone a new idea?

@mandeepmails
Copy link

@Arjunasvr Could you share events for the pod related to awx-task-XXXXXXXX
kubectl -n awx describe pod awx-task-XXXXXXXX
I suspect your pvc is pointing to the un-shareable volume and getting deleted.

I am sorry I cant do this because there is no awx-task pod

was it on Minikube ?
or limited hardware setup ?

I can tell usual behavior, even if it's normal (not minimal) hardware with k8s, it usually takes between 40-60 minutes for the aws-task-XXXXXXXXX pods to appear. feel free to try on another hardware. good luck

@Arjunasvr
Copy link
Author

@Arjunasvr Could you share events for the pod related to awx-task-XXXXXXXX
kubectl -n awx describe pod awx-task-XXXXXXXX
I suspect your pvc is pointing to the un-shareable volume and getting deleted.

I am sorry I cant do this because there is no awx-task pod

was it on Minikube ? or limited hardware setup ?

I can tell usual behavior, even if it's normal (not minimal) hardware with k8s, it usually takes between 40-60 minutes for the aws-task-XXXXXXXXX pods to appear. feel free to try on another hardware. good luck

It was on minikube indeed. Normally the awx-task-xxx pod spins up in 5/10 minutes. I even had the upgrade on more than 2 days and even then the task and web wouldnt show when I execute kubectl get pods -n awx

@fosterseth
Copy link
Member

@Arjunasvr can you set no_log: False in your awx spec? that way the operator shows more details of what is failing.

@Arjunasvr
Copy link
Author

@Arjunasvr can you set no_log: False in your awx spec? that way the operator shows more details of what is failing.

Hi @fosterseth I did, no change in the pod log getting still the same errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants