Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with spawning new builders caused by subscription-manager #3380

Closed
FrostyX opened this issue Aug 21, 2024 · 17 comments
Closed

Problems with spawning new builders caused by subscription-manager #3380

FrostyX opened this issue Aug 21, 2024 · 17 comments
Assignees
Labels

Comments

@FrostyX
Copy link
Member

FrostyX commented Aug 21, 2024

There is something wrong with our subscription-manager. Spawning new builders fails with:

TASK [Check that we have successfully finished the subscription] ***************
Wednesday 21 August 2024  12:24:58 +0000 (0:00:03.538)       0:03:33.162 ******
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (100 retries left).
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (99 retries left).
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (98 retries left).
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (97 retries left).
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (96 retries left).
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (95 retries left).
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (94 retries left).
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (93 retries left).
...
@xsuchy
Copy link
Member

xsuchy commented Aug 21, 2024

The problem is that subscription-manager fails because

When I open subscription.rhsm.redhat.com I got error SEC_ERROR_UNKNOWN_ISSUER .

I reported it to chainsaw team (subscription manager team) and opened Red Hat IT ticket. Unfortunatelly it has very low priority (because only one reports). If you can get this resolved, then please open ticket at https://help.redhat.com - The more ticket will be open, the higher priority it will have.

@ekohl
Copy link

ekohl commented Aug 21, 2024

Normally the certificate is shipped in subscription-manager-rhsm-certificates. On CentOS Stream 9:

# dnf install subscription-manager-rhsm-certificates -yq
# rpm -qv subscription-manager-rhsm-certificates
subscription-manager-rhsm-certificates-20220623-1.el9.noarch
# curl --cacert /etc/rhsm/ca/redhat-uep.pem https://subscription.rhsm.redhat.com
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>openresty</center>
</body>
</html>

Perhaps there's an old version of subscription-manager-rhsm-certificates installed?

@sbrivio-rh
Copy link

ppc64le builders started now? (https://copr.fedorainfracloud.org/coprs/sbrivio/passt/build/7930789/)

@praiskup
Copy link
Member

I think RHSM got fixed; we already have ~160 builders now.

@pgnd
Copy link

pgnd commented Aug 21, 2024

I think RHSM got fixed; we already have ~160 builders now.

Do we need to cancel & resubmit builds that are still stuck @ 'pending'? Or just wait until they're automatically picked up?

@FrostyX
Copy link
Member Author

FrostyX commented Aug 21, 2024

Or just wait until they're automatically picked up?

They should get picked up.

@sbrivio-rh
Copy link

My pending ones were picked up, or are being picked up (https://copr.fedorainfracloud.org/coprs/sbrivio/passt/build/7930789/).

@pgnd
Copy link

pgnd commented Aug 21, 2024

noted, thx. looks like a busy backlog -- yours submitted ~ 7hrs ago?
mine's just

Status:
    pending - Build is waiting in queue for a backend worker. 
Submitted:
    2024-08-21 14:20 EDT (32 minutes ago) 

as usual, bad-timing on my part ;-)

@sbrivio-rh
Copy link

I guess the notice on Copr could be removed now...?

@praiskup
Copy link
Member

No, actually the problem is back .... :-( the queue is growing now again.

@xsuchy
Copy link
Member

xsuchy commented Aug 22, 2024

We are seing two issues now: 1) subscription-manager takes long time to proceed (4 minutes). 2) Sometimes fails with internal server error response from server.

@praiskup
Copy link
Member

These problems no longer occur, but we have no feedback from RH IT. 🤷

@praiskup
Copy link
Member

Reopening. The problem is back.

@praiskup praiskup reopened this Aug 29, 2024
@praiskup
Copy link
Member

And RHSM works again.

@praiskup
Copy link
Member

praiskup commented Sep 2, 2024

And the problem is back again :-(

@FrostyX FrostyX removed their assignment Sep 9, 2024
@praiskup
Copy link
Member

This has been fixed on the RHSM side (for now at least); and from the info we got off-list these problems came as high/peak RHSM use (or DDoS). We were told to monitor status.redhat.com for the next time, and subscribe there to be informed about possible future outages.

@praiskup
Copy link
Member

And back again - reopening to let users know mostly.

@praiskup praiskup reopened this Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

7 participants