Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in logging and must gathers for failed jobs #37

Open
paigerube14 opened this issue Dec 23, 2021 · 5 comments
Open

Add in logging and must gathers for failed jobs #37

paigerube14 opened this issue Dec 23, 2021 · 5 comments

Comments

@paigerube14
Copy link
Contributor

Similar to the Upgrade Ci we need to be able to run lots of jobs and log issues with out the cluster being around

It would be helpful to print off logs and maybe a must gather in certain cases to be able to properly open bugs

Some thoughts:

  • Add in describe or logs of machineset/node in case of cluster-workers-scaling failure
  • Adding in new must-gather branch to be called when failed scale-ci or upgrade job
  • Logs of failing pods in scale-ci jobs
@paigerube14
Copy link
Contributor Author

This issue Simon opened and the attached open PR covers the first bullet: #89

@paigerube14
Copy link
Contributor Author

@skordas @rpattath @mffiedler @qiliRedHat

What are your thoughts about adding in all job calling from the loaded-upgrade job itself? i.e. have loaded upgrade do the calls to flexy(already set up), scale up (currently in each scale-ci job specifically), the scale job itself (done), cluster check (wanting to add based on this issue) upgrade (done) and destroy if necessary.

I think this would create a lot less PR's and make things easier to manage in each of the branches. For this issue specifically, I was thinking I want to add a call to cerberus to do 1 iteration to do certain checks on the cluster after the load is complete to make sure we should continue. Currently I would have to add a PR to each of the scale-ci jobs and if any parameters ever needed to be added to the cerberus job I would have to do Pr's in each scale-ci job for that update. I am suggesting adding in the call from the base loaded-upgrade job so that the loaded-upgrade job is all encompassing.

For example, the env_vars variable weren't being passed to the scale worker job from some of the different scale-ci jobs. So I had to create a PR for each place in the scale-ci jobs that were calling the scale-worker job to add in the specific variable.

Cons I can think of: lots more clicking around to each job that ran off of the main loaded-upgrade run

Overall, what are your thoughts?

@skordas
Copy link

skordas commented Mar 9, 2022

I think we can have a long pipeline if everything would be good documented - that mostly for new team members. We all are growing with the changes :)

my few cents:

  • (about clicking around) - Setting default values in this way to only click build - and will work (that of course would be tricky)
  • (about clicking around) - setting parameters input in the same order how we are running them.
  • must-gather - just before flexy-destroy - always true if some step failed.

@qiliRedHat
Copy link
Contributor

@paigerube14 I agree with your thinking about reducing PR. I also think we can even use loaded-upgrade job in our scale-ci test runs. This can make 2 steps 1)calling Flexy-install job 2) trigger the scale-ci job into one and save some effort. But we have to be able to pass more configurations from loaded-upgrade job to Flexy-install job, like VARIABLES_LOCATION and LAUNCHER_VARSs. At lease vm_type should be configured as in some large scale cluster, we need bigger master and worker type, like for cluster density test https://deploy-preview-41943--osdocs.netlify.app/openshift-enterprise/latest/scalability_and_performance/recommended-host-practices.html#mast[…]ing_

@qiliRedHat
Copy link
Contributor

@paigerube14 I created an Epic for our scale-ci automation enhancement work in 4.11 https://issues.redhat.com/browse/OCPQE-9141 to better organize our 4.11 work. I added your Jira task for this issue under it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants