[YUNIKORN-2305] E2E test: Upload stdout logs to Github Action artifact #758

chenyulin0719 · 2024-01-04T16:37:15Z

What is this PR for?

(2024-01-09 Updated Version 2)

In the current e2e test, the cluster log dumps are mixup with ginkgo.by and gingo.GinkgoWriter logs. This makes the logs difficult to read, and sometimes there are too many logs to display.

This PR now dumps cluster-info files to build/e2e/{suite} and uploads files to Github Action artifact.
For each failed test, 3 files will be generated under build/e2e/{suite}/ (Both for interactive and GitHub Action environment):

{specName}_k8sClusterInfo.txt
{specName}_ykContainerLog.txt
{specName}_ykFullStateDump.json

What type of PR is it?

Todos

Found some issues, could create other Jira to fix it:

- Found some e2e test that didn't dump cluster status when the test failed. (recovery_and_restart/persistent_volume)

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-2305

How should this be tested?

All existing e2e test should pass.

Could also check Github Action result in my shim repo:
Suceess workflow: https://github.com/chenyulin0719/yunikorn-k8shim/actions/runs/7456435170
Failed workflow: https://github.com/chenyulin0719/yunikorn-k8shim/actions/runs/7456536846

Screenshots (if appropriate)

Questions:

NA

test/e2e/bin_packing/bin_packing_test.go

test/e2e/framework/helpers/ginkgo_writer/ginkgo_writer_setup.go

codecov · 2024-01-04T16:56:00Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (dbfae6e) 71.36% compared to head (df159a1) 71.30%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #758      +/-   ##
==========================================
- Coverage   71.36%   71.30%   -0.07%     
==========================================
  Files          43       43              
  Lines        7600     7600              
==========================================
- Hits         5424     5419       -5     
- Misses       1975     1979       +4     
- Partials      201      202       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

craigcondit

This needs to be made optional, and off by default as it interferes with running the e2e tests interactively. My suggestion would be to use the existing implementation unless the ENV var for the path is set. This will allow doing the uploads as part of the pre-commit workflow but still allow interactive use.

pbacsko

@chenyulin0719 maybe there's a misunderstanding, but I think the default logging from the tests is perfectly fine. What we need to upload is the Yunikorn log and cluster info, which is printed to the the console in the current implementation. That is a LOT of text and it's very inconvenient to read all kinds of output mixed together.

So what we need to do is writing YK logs and cluster info to a separate file (or even better, files) then upload it when the test completes.

craigcondit · 2024-01-04T21:41:33Z

I agree with @pbacsko, the current logging is fine until an error occurs, and then the dump becomes nearly unreadable, especially locally (but even on GH I've seen it overflow the allowed size). I'd be a big fan of capturing that output to files (within the source tree under output/) and then uploading those to GH if we're running in a workflow. I think separate files for state dump, yunikorn logs, and cluster diagnostics would be ideal.

chenyulin0719 · 2024-01-05T03:54:43Z

Hi @pbacsko, @craigcondit

@chenyulin0719 maybe there's a misunderstanding, but I think the default logging from the tests is perfectly fine. What we need to upload is the Yunikorn log and cluster info, which is printed to the the console in the current implementation. That is a LOT of text and it's very inconvenient to read all kinds of output mixed together.
So what we need to do is writing YK logs and cluster info to a separate file (or even better, files) then upload it when the test completes.

I agree with @pbacsko, the current logging is fine until an error occurs, and then the dump becomes nearly unreadable, especially locally (but even on GH I've seen it overflow the allowed size).

Noted, will keep the default logging.

The new thought in my mind is we will have 1 new logging method for file.

ginkgo.By. (to console) (Existing)
ginkgo.GinkgoWriter. (To console) (Existing)
File Writer (To file) (New) (file path is from github workflow ENV variable, should have default path for local run. ex: /tmp/e2e-test-artifacts)

I should move "state dump, yunikorn logs, and cluster diagnostics" from #1,#2 to #3 File Writer.(Instead of changing ginkgo.GinkgoWriter to file). Please correct me if I didn't understand it well.

The further question is the artifact's file structure.

I'd be a big fan of capturing that output to files (within the source tree under output/) and then uploading those to GH if we're running in a workflow. I think separate files for state dump, yunikorn logs, and cluster diagnostics would be ideal.

@craigcondit In my opinion, the example 's structure in this PR is clear enough, I organized all the file logs into a single file per suite. (Printing the spec name first allows us to search by spec name.)
Always write to ARTIFACT_PATH and keeping one file per suite is my favorite. If this is acceptible I will keep this implementation.

craigcondit · 2024-01-05T17:33:39Z

I'd prefer to see a directory per test, with the individual artifacts (cluster dump, YK logs, state dump) separated out because some of these are json, and others are text. Splitting them out allows for tooling to process them more easily. Also, we should not be using static names in /tmp for the output. Instead, place them in build/e2e/{test_suite}/ instead.

chenyulin0719 · 2024-01-09T06:31:17Z

Hi @pbacsko, @craigcondit,

New version updated. Each failed test will generate 3 files under build/e2e/{suite}/ (Both for interactive and GitHub Action environments):

{specName}_k8sClusterInfo.txt
{specName}_ykContainerLog.txt
{specName}_ykFullStateDump.json

You can check the artifact in this workflow run: (I make some tests in plugin mode fail.)
https://github.com/chenyulin0719/yunikorn-k8shim/actions/runs/7456536846

Please let me know if there have anything to be improved.

chenyulin0719 · 2024-01-09T06:52:12Z

test/e2e/framework/configmanager/constants.go

+	GroupUsagePath    = "ws/v1/partition/%s/usage/group/%s"
+	HealthCheckPath   = "ws/v1/scheduler/healthcheck"
+	ValidateConfPath  = "ws/v1/validate-conf"
+	FullStateDumpPath = "ws/v1/fullstatedump"


The previous ykRest logs(queue/node/app) are replaced by fullstatedump and write to {specName}_ykFullStateDump.json".

pbacsko · 2024-01-09T22:23:19Z

@chenyulin0719 please rebase the PR, there are some minor conflicts.

test/e2e/admission_controller/admission_controller_suite_test.go

test/e2e/framework/helpers/common/utils.go

pbacsko · 2024-01-09T22:40:19Z

Hi @pbacsko, @craigcondit,

New version updated. Each failed test will generate 3 files under build/e2e/{suite}/ (Both for interactive and GitHub Action environments):

{specName}_k8sClusterInfo.txt

{specName}_ykContainerLog.txt

{specName}_ykFullStateDump.json

You can check the artifact in this workflow run: (I make some tests in plugin mode fail.) https://github.com/chenyulin0719/yunikorn-k8shim/actions/runs/7456536846

Please let me know if there have anything to be improved.

The approach LGTM. I just found some minor things.

… artifact" This reverts commit 80fe23b.

chenyulin0719 · 2024-01-10T09:11:02Z

@chenyulin0719 please rebase the PR, there are some minor conflicts.

Hi @pbacsko.
Just rebase and pushed. e2e test looks good in my side.

https://github.com/chenyulin0719/yunikorn-k8shim/actions/runs/7471481942

chenyulin0719 · 2024-01-10T17:25:52Z

Checked the failed test is related to preemption e2e test. I'm curretly troubleshooting it in another issue.
(https://issues.apache.org/jira/browse/YUNIKORN-2313)

https://github.com/apache/yunikorn-k8shim/actions/runs/7471482170/job/20349923016?pr=758#step:6:1480

pbacsko

+1

chenyulin0719 commented Jan 4, 2024

View reviewed changes

test/e2e/bin_packing/bin_packing_test.go Outdated Show resolved Hide resolved

chenyulin0719 commented Jan 4, 2024

View reviewed changes

test/e2e/framework/helpers/ginkgo_writer/ginkgo_writer_setup.go Outdated Show resolved Hide resolved

chenyulin0719 commented Jan 4, 2024

View reviewed changes

test/e2e/framework/helpers/ginkgo_writer/ginkgo_writer_setup.go Outdated Show resolved Hide resolved

pbacsko assigned chenyulin0719 Jan 4, 2024

craigcondit requested changes Jan 4, 2024

View reviewed changes

pbacsko requested review from manirajv06 and pbacsko January 4, 2024 21:11

pbacsko requested changes Jan 4, 2024

View reviewed changes

chenyulin0719 requested review from craigcondit and pbacsko January 9, 2024 06:48

chenyulin0719 commented Jan 9, 2024

View reviewed changes

pbacsko requested changes Jan 9, 2024

View reviewed changes

test/e2e/admission_controller/admission_controller_suite_test.go Show resolved Hide resolved

test/e2e/framework/helpers/common/utils.go Outdated Show resolved Hide resolved

chenyulin0719 added 4 commits January 10, 2024 15:06

[YUNIKORN-2305] E2E test: Upload stdout logs to Github Action artifact

e2cd692

Revert "[YUNIKORN-2305] E2E test: Upload stdout logs to Github Action…

708d50b

… artifact" This reverts commit 80fe23b.

v2: dump k8sClusterInfo/ykFullStateDump/ykContainerLog to separate file

323fa0e

fix typo

df159a1

chenyulin0719 force-pushed the YUNIKORN-2305 branch from 23b6f3a to df159a1 Compare January 10, 2024 07:09

chenyulin0719 requested a review from pbacsko January 10, 2024 09:09

pbacsko approved these changes Jan 10, 2024

View reviewed changes

pbacsko closed this in ce8ac51 Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YUNIKORN-2305] E2E test: Upload stdout logs to Github Action artifact #758

[YUNIKORN-2305] E2E test: Upload stdout logs to Github Action artifact #758

chenyulin0719 commented Jan 4, 2024 •

edited

Loading

codecov bot commented Jan 4, 2024 •

edited

Loading

craigcondit left a comment

pbacsko left a comment •

edited

Loading

craigcondit commented Jan 4, 2024

chenyulin0719 commented Jan 5, 2024 •

edited

Loading

craigcondit commented Jan 5, 2024 •

edited

Loading

chenyulin0719 commented Jan 9, 2024 •

edited

Loading

chenyulin0719 Jan 9, 2024

pbacsko commented Jan 9, 2024

pbacsko commented Jan 9, 2024

chenyulin0719 commented Jan 10, 2024 •

edited

Loading

chenyulin0719 commented Jan 10, 2024 •

edited

Loading

pbacsko left a comment

[YUNIKORN-2305] E2E test: Upload stdout logs to Github Action artifact #758

[YUNIKORN-2305] E2E test: Upload stdout logs to Github Action artifact #758

Conversation

chenyulin0719 commented Jan 4, 2024 • edited Loading

What is this PR for?

What type of PR is it?

Todos

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

codecov bot commented Jan 4, 2024 • edited Loading

Codecov Report

craigcondit left a comment

Choose a reason for hiding this comment

pbacsko left a comment • edited Loading

Choose a reason for hiding this comment

craigcondit commented Jan 4, 2024

chenyulin0719 commented Jan 5, 2024 • edited Loading

craigcondit commented Jan 5, 2024 • edited Loading

chenyulin0719 commented Jan 9, 2024 • edited Loading

chenyulin0719 Jan 9, 2024

Choose a reason for hiding this comment

pbacsko commented Jan 9, 2024

pbacsko commented Jan 9, 2024

chenyulin0719 commented Jan 10, 2024 • edited Loading

chenyulin0719 commented Jan 10, 2024 • edited Loading

pbacsko left a comment

Choose a reason for hiding this comment

chenyulin0719 commented Jan 4, 2024 •

edited

Loading

codecov bot commented Jan 4, 2024 •

edited

Loading

pbacsko left a comment •

edited

Loading

chenyulin0719 commented Jan 5, 2024 •

edited

Loading

craigcondit commented Jan 5, 2024 •

edited

Loading

chenyulin0719 commented Jan 9, 2024 •

edited

Loading

chenyulin0719 commented Jan 10, 2024 •

edited

Loading

chenyulin0719 commented Jan 10, 2024 •

edited

Loading