Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: builder restarted while constructing the clusters #1512

Closed
JKBGIT1 opened this issue Sep 19, 2024 · 1 comment
Closed

Bug: builder restarted while constructing the clusters #1512

JKBGIT1 opened this issue Sep 19, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@JKBGIT1
Copy link
Contributor

JKBGIT1 commented Sep 19, 2024

Current Behaviour

The builder restarted during the clusters' constructions.

NAME                                READY   STATUS    RESTARTS        AGE
ansibler-599cb5b7b7-25hzm           1/1     Running   0               6h14m
builder-657499cc75-zsqw2            1/1     Running   1 (5h49m ago)   6h14m
claudie-operator-7b88589ff9-lhwlf   1/1     Running   0               6h14m

As you can see on the logs below it finished building the GCP cluster in the test-set-no1. It was supposed to start building the OCI cluster in the test-set-no1 right after it finished the GCP cluster. But it didn't and waited for about 2 hours.

...
2024-09-19T10:10:17Z INF ../go/services/builder/domain/usecases/config_processor_v2.go:94 > Finished processing task "1b77fa23-c7cd-48b3-ac0a-802de7ce57ff" for cluster "ts1-gcp-clus
ter-test-set-no1" config "claudie-9cb8ac3-2971-test-set1" module=builder
2024-09-19T10:45:07Z DBG ../go/services/builder/domain/usecases/workflow_helpers.go:164 > updating task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cluster-test-set-no-5" for c
onfig "claudie-9cb8ac3-2971-test-set5" with state: stage:ANSIBLER status:IN_PROGRESS module=builder
2024-09-19T10:45:07Z DBG ../go/services/builder/domain/usecases/workflow_helpers.go:164 > updating task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cluster-test-set-no-5" for c
onfig "claudie-9cb8ac3-2971-test-set5" with state: stage:DESTROY_TERRAFORMER status:IN_PROGRESS description:"destroying infrastructure" module=builder
2024-09-19T10:45:07Z INF ../go/services/builder/domain/usecases/terraformer_caller.go:62 > Calling DestroyInfrastructure on Terraformer cluster=hybrid-cluster-test-set-no-
5-zkqpq84 module=builder project=claudie-9cb8ac3-2971-test-set5
2024-09-19T10:46:33Z INF ../go/services/builder/domain/usecases/terraformer_caller.go:66 > DestroyInfrastructure on Terraformer finished successfully cluster=hybrid-cluste
r-test-set-no-5-zkqpq84 module=builder project=claudie-9cb8ac3-2971-test-set5
2024-09-19T10:46:33Z DBG ../go/services/builder/domain/usecases/workflow_helpers.go:164 > updating task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cluster-test-set-no-5" for c
onfig "claudie-9cb8ac3-2971-test-set5" with state: stage:DESTROY_TERRAFORMER status:IN_PROGRESS module=builder
2024-09-19T10:46:33Z DBG ../go/services/builder/domain/usecases/workflow_helpers.go:164 > updating task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cluster-test-set-no-5" for c
onfig "claudie-9cb8ac3-2971-test-set5" with state: stage:DESTROY_KUBER status:IN_PROGRESS description:"deleting kubeconfig secret" module=builder
2024-09-19T10:46:33Z INF ../go/services/builder/domain/usecases/kuber_caller.go:137 > Calling DeleteKubeconfig on Kuber cluster=hybrid-cluster-test-set-no-5-zkqpq84 mo
dule=builder project=claudie-9cb8ac3-2971-test-set5
2024-09-19T10:46:33Z DBG ../go/services/builder/domain/usecases/workflow_helpers.go:164 > updating task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cluster-test-set-no-5" for c
onfig "claudie-9cb8ac3-2971-test-set5" with state: stage:DESTROY_KUBER status:IN_PROGRESS description:"deleting cluster metadata secret" module=builder
2024-09-19T10:46:33Z INF ../go/services/builder/domain/usecases/kuber_caller.go:144 > Calling DeleteClusterMetadata on kuber cluster=hybrid-cluster-test-set-no-5-zkqpq84 
module=builder project=claudie-9cb8ac3-2971-test-set5
2024-09-19T10:46:33Z INF ../go/services/builder/domain/usecases/kuber_caller.go:148 > DeleteKubeconfig on Kuber finished successfully cluster=hybrid-cluster-test-set-no-5-
zkqpq84 module=builder project=claudie-9cb8ac3-2971-test-set5
2024-09-19T10:46:33Z DBG ../go/services/builder/domain/usecases/workflow_helpers.go:164 > updating task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cluster-test-set-no-5" for c
onfig "claudie-9cb8ac3-2971-test-set5" with state: stage:KUBER status:IN_PROGRESS module=builder
2024-09-19T10:46:33Z INF ../go/services/builder/domain/usecases/config_processor_v2.go:52 > successfully processed task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cl
uster-test-set-no-5" for config "claudie-9cb8ac3-2971-test-set5" module=builder
2024-09-19T10:46:33Z DBG ../go/services/builder/domain/usecases/config_processor_v2.go:60 > updating current state for cluster "hybrid-cluster-test-set-no-5" for config "claudie-9cb8ac3-2971-
test-set5" task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" module=builder
2024-09-19T10:46:33Z DBG ../go/services/builder/domain/usecases/config_processor_v2.go:77 > updating task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-cluster-test-set-no-5" for
 config "claudie-9cb8ac3-2971-test-set5" with status: DONE module=builder
2024-09-19T10:46:33Z INF ../go/services/builder/domain/usecases/config_processor_v2.go:94 > Finished processing task "d7079d7b-83dd-4207-980c-71b33b2d2b7c" for cluster "hybrid-clust
er-test-set-no-5" config "claudie-9cb8ac3-2971-test-set5" module=builder
2024-09-19T11:54:20Z DBG ../go/services/builder/domain/usecases/config_processor_v2.go:133 > [task "371c8509-6fb1-459f-b700-cc485da1a4a8"] Update operation "ts1-oci-cluster-test-set-no1" from
 config "claudie-9cb8ac3-2971-test-set1" module=builder
2024-09-19T11:54:20Z DBG ../go/services/builder/domain/usecases/workflow_helpers.go:164 > updating task "371c8509-6fb1-459f-b700-cc485da1a4a8" for cluster "ts1-oci-cluster-test-set-no1" for c
onfig "claudie-9cb8ac3-2971-test-set1" with state: stage:TERRAFORMER status:IN_PROGRESS description:"building infrastructure" module=builder
2024-09-19T11:54:20Z INF ../go/services/builder/domain/usecases/terraformer_caller.go:27 > Calling BuildInfrastructure on Terraformer cluster=ts1-oci-cluster-test-set-no1-
050cz7r module=builder project=claudie-9cb8ac3-2971-test-set1
2024-09-19T11:55:39Z INF ../go/services/builder/domain/usecases/terraformer_caller.go:32 > BuildInfrastructure on Terraformer finished successfully cluster=ts1-oci-cluster
-test-set-no1-050cz7r module=builder project=claudie-9cb8ac3-2971-test-set1
...

This resulted in the stuck building infrastructure for the OCI cluster in the test-set-no1.

Status:
  Clusters:
	ts1-aws-cluster-test-set-no1:
  	Message:  Finished successfully
  	Phase:	NONE
  	State:	DONE
	ts1-azr-cluster-test-set-no1:
  	Message:  Finished successfully
  	Phase:	NONE
  	State:	DONE
	ts1-gcp-cluster-test-set-no1:
  	Message:  Finished successfully
  	Phase:	NONE
  	State:	DONE
	ts1-htz-cluster-test-set-no1:
  	Message:  Finished successfully
  	Phase:	NONE
  	State:	DONE
	ts1-oci-cluster-test-set-no1:
  	Message:  building infrastructure
  	Phase:	TERRAFORMER
  	State:	IN_PROGRESS
  State:    	IN_PROGRESS
Events:     	<none>

Besides that, the e2e pipeline failed because it took too long to finish the test sets.

2024-09-19T09:06:27Z ERR claudie_test.go:125 > Error in test sets test-set3  error="error while monitoring manifest 1.yaml from test set test-set3 : test took too long... Aborting after 8000 seconds" module=testing-framework
2024-09-19T09:11:19Z ERR claudie_test.go:125 > Error in test sets test-set2  error="error while monitoring manifest 1.yaml from test set test-set2 : test took too long... Aborting after 8000 seconds" module=testing-framework
2024-09-19T09:13:10Z ERR claudie_test.go:147 > Error in test sets autoscaling-1 error="error while performing additional test for manifest 1.yaml from autoscaling-1 : test took too long... Aborting after 8000 seconds" module=testing-framework
panic: test timed out after 3h0m0s
	running tests:
    	TestClaudie (3h0m0s)

Expected Behaviour

At first builder should finish the process of building the cluster. Then it can restart.

Steps To Reproduce

I have no idea.

Anything else to note

Nothing.

@JKBGIT1 JKBGIT1 added the bug Something isn't working label Sep 19, 2024
@Despire
Copy link
Contributor

Despire commented Dec 6, 2024

Couldn't reproduce as this happened only one time.

@Despire Despire closed this as not planned Won't fix, can't repro, duplicate, stale Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants