Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-2305] E2E test: Upload stdout logs to Github Action artifact #758

Closed
wants to merge 4 commits into from

Conversation

chenyulin0719
Copy link
Contributor

@chenyulin0719 chenyulin0719 commented Jan 4, 2024

What is this PR for?

(2024-01-09 Updated Version 2)

In the current e2e test, the cluster log dumps are mixup with ginkgo.by and gingo.GinkgoWriter logs. This makes the logs difficult to read, and sometimes there are too many logs to display.

This PR now dumps cluster-info files to build/e2e/{suite} and uploads files to Github Action artifact.
For each failed test, 3 files will be generated under build/e2e/{suite}/ (Both for interactive and GitHub Action environment):

  1. {specName}_k8sClusterInfo.txt
  2. {specName}_ykContainerLog.txt
  3. {specName}_ykFullStateDump.json

What type of PR is it?

  • - Bug Fix
  • - Improvement
  • - Feature
  • - Documentation
  • - Hot Fix
  • - Refactoring

Todos

Found some issues, could create other Jira to fix it:

  • - Found some e2e test that didn't dump cluster status when the test failed. (recovery_and_restart/persistent_volume)

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-2305

How should this be tested?

All existing e2e test should pass.

Could also check Github Action result in my shim repo:
Suceess workflow: https://github.com/chenyulin0719/yunikorn-k8shim/actions/runs/7456435170
Failed workflow: https://github.com/chenyulin0719/yunikorn-k8shim/actions/runs/7456536846

Screenshots (if appropriate)

YUNIKORN-2305 Version 2

Questions:

NA

Copy link

codecov bot commented Jan 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (dbfae6e) 71.36% compared to head (df159a1) 71.30%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #758      +/-   ##
==========================================
- Coverage   71.36%   71.30%   -0.07%     
==========================================
  Files          43       43              
  Lines        7600     7600              
==========================================
- Hits         5424     5419       -5     
- Misses       1975     1979       +4     
- Partials      201      202       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@craigcondit craigcondit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be made optional, and off by default as it interferes with running the e2e tests interactively. My suggestion would be to use the existing implementation unless the ENV var for the path is set. This will allow doing the uploads as part of the pre-commit workflow but still allow interactive use.

@pbacsko pbacsko requested review from manirajv06 and pbacsko January 4, 2024 21:11
Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenyulin0719 maybe there's a misunderstanding, but I think the default logging from the tests is perfectly fine. What we need to upload is the Yunikorn log and cluster info, which is printed to the the console in the current implementation. That is a LOT of text and it's very inconvenient to read all kinds of output mixed together.

So what we need to do is writing YK logs and cluster info to a separate file (or even better, files) then upload it when the test completes.

@craigcondit
Copy link
Contributor

I agree with @pbacsko, the current logging is fine until an error occurs, and then the dump becomes nearly unreadable, especially locally (but even on GH I've seen it overflow the allowed size). I'd be a big fan of capturing that output to files (within the source tree under output/) and then uploading those to GH if we're running in a workflow. I think separate files for state dump, yunikorn logs, and cluster diagnostics would be ideal.

@chenyulin0719
Copy link
Contributor Author

chenyulin0719 commented Jan 5, 2024

Hi @pbacsko, @craigcondit

@chenyulin0719 maybe there's a misunderstanding, but I think the default logging from the tests is perfectly fine. What we need to upload is the Yunikorn log and cluster info, which is printed to the the console in the current implementation. That is a LOT of text and it's very inconvenient to read all kinds of output mixed together.
So what we need to do is writing YK logs and cluster info to a separate file (or even better, files) then upload it when the test completes.

I agree with @pbacsko, the current logging is fine until an error occurs, and then the dump becomes nearly unreadable, especially locally (but even on GH I've seen it overflow the allowed size).

Noted, will keep the default logging.

The new thought in my mind is we will have 1 new logging method for file.

  1. ginkgo.By. (to console) (Existing)
  2. ginkgo.GinkgoWriter. (To console) (Existing)
  3. File Writer (To file) (New) (file path is from github workflow ENV variable, should have default path for local run. ex: /tmp/e2e-test-artifacts)

I should move "state dump, yunikorn logs, and cluster diagnostics" from #1,#2 to #3 File Writer.(Instead of changing ginkgo.GinkgoWriter to file). Please correct me if I didn't understand it well.

The further question is the artifact's file structure.

I'd be a big fan of capturing that output to files (within the source tree under output/) and then uploading those to GH if we're running in a workflow. I think separate files for state dump, yunikorn logs, and cluster diagnostics would be ideal.

@craigcondit In my opinion, the example 's structure in this PR is clear enough, I organized all the file logs into a single file per suite. (Printing the spec name first allows us to search by spec name.)
Always write to ARTIFACT_PATH and keeping one file per suite is my favorite. If this is acceptible I will keep this implementation.

image

@craigcondit
Copy link
Contributor

craigcondit commented Jan 5, 2024

I'd prefer to see a directory per test, with the individual artifacts (cluster dump, YK logs, state dump) separated out because some of these are json, and others are text. Splitting them out allows for tooling to process them more easily. Also, we should not be using static names in /tmp for the output. Instead, place them in build/e2e/{test_suite}/ instead.

@chenyulin0719
Copy link
Contributor Author

chenyulin0719 commented Jan 9, 2024

Hi @pbacsko, @craigcondit,

New version updated. Each failed test will generate 3 files under build/e2e/{suite}/ (Both for interactive and GitHub Action environments):

  1. {specName}_k8sClusterInfo.txt
  2. {specName}_ykContainerLog.txt
  3. {specName}_ykFullStateDump.json

You can check the artifact in this workflow run: (I make some tests in plugin mode fail.)
https://github.com/chenyulin0719/yunikorn-k8shim/actions/runs/7456536846

Please let me know if there have anything to be improved.

image

GroupUsagePath = "ws/v1/partition/%s/usage/group/%s"
HealthCheckPath = "ws/v1/scheduler/healthcheck"
ValidateConfPath = "ws/v1/validate-conf"
FullStateDumpPath = "ws/v1/fullstatedump"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous ykRest logs(queue/node/app) are replaced by fullstatedump and write to {specName}_ykFullStateDump.json".

@pbacsko
Copy link
Contributor

pbacsko commented Jan 9, 2024

@chenyulin0719 please rebase the PR, there are some minor conflicts.

@pbacsko
Copy link
Contributor

pbacsko commented Jan 9, 2024

Hi @pbacsko, @craigcondit,

New version updated. Each failed test will generate 3 files under build/e2e/{suite}/ (Both for interactive and GitHub Action environments):

  1. {specName}_k8sClusterInfo.txt
  2. {specName}_ykContainerLog.txt
  3. {specName}_ykFullStateDump.json

You can check the artifact in this workflow run: (I make some tests in plugin mode fail.) https://github.com/chenyulin0719/yunikorn-k8shim/actions/runs/7456536846

Please let me know if there have anything to be improved.

The approach LGTM. I just found some minor things.

@chenyulin0719
Copy link
Contributor Author

chenyulin0719 commented Jan 10, 2024

@chenyulin0719 please rebase the PR, there are some minor conflicts.

Hi @pbacsko.
Just rebase and pushed. e2e test looks good in my side.

https://github.com/chenyulin0719/yunikorn-k8shim/actions/runs/7471481942

@chenyulin0719
Copy link
Contributor Author

chenyulin0719 commented Jan 10, 2024

Checked the failed test is related to preemption e2e test. I'm curretly troubleshooting it in another issue.
(https://issues.apache.org/jira/browse/YUNIKORN-2313)

https://github.com/apache/yunikorn-k8shim/actions/runs/7471482170/job/20349923016?pr=758#step:6:1480

Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@pbacsko pbacsko closed this in ce8ac51 Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants