Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/resource_control: enable trace log to help diagnose #8545

Merged
merged 2 commits into from
Aug 22, 2024

Conversation

HuSharp
Copy link
Member

@HuSharp HuSharp commented Aug 19, 2024

What problem does this PR solve?

Issue Number: Ref #8512

What is changed and how does it work?

Previously, when an error was encountered, we could only get a smaller logs and it was harder to see what went wrong

2024-08-09T10:31:54.0348800Z === RUN   TestResourceManagerClientTestSuite/TestResourceGroupController
2024-08-09T10:31:54.0353345Z [2024/08/09 10:31:15.283 +00:00] [INFO] [controller.go:199] ["load resource controller config"] [config="{\"degraded-mode-wait-duration\":\"1s\",\"ltb-max-wait-duration\":\"30s\",\"ltb-token-rpc-max-delay\":\"1s\",\"request-unit\":{\"read-base-cost\":1,\"read-per-batch-base-cost\":0,\"read-cost-per-byte\":1,\"write-base-cost\":1,\"write-per-batch-base-cost\":0,\"write-cost-per-byte\":1,\"read-cpu-ms-cost\":1},\"enable-controller-trace-log\":\"false\",\"token-rpc-params\":{\"wait-retry-interval\":\"50ms\",\"wait-retry-times\":20}}"] [ru-config="{\"ReadBaseCost\":1,\"ReadPerBatchBaseCost\":0,\"ReadBytesCost\":1,\"WriteBaseCost\":1,\"WritePerBatchBaseCost\":0,\"WriteBytesCost\":1,\"CPUMsCost\":1,\"LTBMaxWaitDuration\":30000000000,\"WaitRetryInterval\":50000000,\"WaitRetryTimes\":20,\"DegradedModeWaitDuration\":1000000000}"]
2024-08-09T10:31:54.0357525Z [2024/08/09 10:31:15.287 +00:00] [INFO] [grpc_service.go:99] ["watch request"] [key=resource_group/settings] [range-end=resource_group/settingt] [start-revision=248]
2024-08-09T10:31:54.0359119Z [2024/08/09 10:31:15.288 +00:00] [INFO] [grpc_service.go:99] ["watch request"] [key=resource_group/controller] [range-end=resource_group/controlles] [start-revision=248]
2024-08-09T10:31:54.0360560Z [2024/08/09 10:31:16.043 +00:00] [INFO] [client.go:157] ["server starts to synchronize with leader"] [server=pd1] [leader=pd2] [request-index=0]
2024-08-09T10:31:54.0362046Z [2024/08/09 10:31:16.044 +00:00] [ERROR] [client.go:162] ["region sync with leader meet error"] [error="[PD:grpc:ErrGRPCRecv]receive response error: rpc error: code = Unavailable desc = server not started"]
2024-08-09T10:31:54.0363789Z [2024/08/09 10:31:16.285 +00:00] [INFO] [controller.go:500] ["[resource group controller] create resource group cost controller"] [name=controller_test]
2024-08-09T10:31:54.0365153Z [2024/08/09 10:31:17.045 +00:00] [INFO] [client.go:157] ["server starts to synchronize with leader"] [server=pd1] [leader=pd2] [request-index=0]
2024-08-09T10:31:54.0366706Z [2024/08/09 10:31:17.045 +00:00] [ERROR] [client.go:162] ["region sync with leader meet error"] [error="[PD:grpc:ErrGRPCRecv]receive response error: rpc error: code = Unavailable desc = server not started"]
2024-08-09T10:31:54.0367812Z     resource_manager_test.go:457: 
2024-08-09T10:31:54.0368853Z         	Error Trace:	/home/runner/work/pd/pd/tests/integrations/mcs/resourcemanager/resource_manager_test.go:457
2024-08-09T10:31:54.0369937Z         	Error:      	"492.521605ms" is not less than or equal to "300ms"
2024-08-09T10:31:54.0370779Z         	Test:       	TestResourceManagerClientTestSuite/TestResourceGroupController
2024-08-09T10:31:54.0372094Z === RUN   TestResourceManagerClientTestSuite/TestResourceGroupControllerConfigChanged

After that, we can get more logs to help check the test

[2024/08/19 11:35:40.090 +08:00] [INFO] [client.go:157] ["server starts to synchronize with leader"] [server=pd1] [leader=pd2] [request-index=0]
[2024/08/19 11:35:40.091 +08:00] [ERROR] [client.go:162] ["region sync with leader meet error"] [error="[PD:grpc:ErrGRPCRecv]receive response error: rpc error: code = Unavailable desc = server not started"]
[2024/08/19 11:35:40.225 +08:00] [INFO] [controller.go:499] ["[resource group controller] create resource group cost controller"] [name=controller_test]
[2024/08/19 11:35:40.259 +08:00] [INFO] [controller.go:63] ["[resource group controller] update run state"] [name=controller_test] [request-unit-consumption="r_r_u:1450 w_r_u:580 read_bytes:696 write_bytes:551 total_cpu_time_ms:725 kv_read_rpc_count:29 kv_write_rpc_count:29 "] [is-throttled=false]
[2024/08/19 11:35:40.259 +08:00] [INFO] [controller.go:63] ["[resource group controller] update avg ru per sec"] [name=controller_test] [type=RU] [avg-ru-per-sec=29962.399107893918] [is-throttled=false]
[2024/08/19 11:35:40.259 +08:00] [INFO] [controller.go:63] ["[resource group controller] send token bucket request"] [now=2024/08/19 11:35:40.259 +08:00] [req="[{\"resource_group_name\":\"controller_test\",\"Request\":{\"ru_items\":{\"request_r_u\":[{\"value\":156823.19509341655}]}},\"consumption_since_last_request\":{\"r_r_u\":1450,\"w_r_u\":580,\"read_bytes\":696,\"write_bytes\":551,\"total_cpu_time_ms\":725,\"kv_read_rpc_count\":29,\"kv_write_rpc_count\":29}}]"] [source=low_ru]
[2024/08/19 11:35:40.260 +08:00] [INFO] [controller.go:63] ["[resource group controller] token bucket response"] [now=2024/08/19 11:35:40.260 +08:00] [resp="[{\"resource_group_name\":\"controller_test\",\"granted_r_u_tokens\":[{\"granted_tokens\":{\"settings\":{},\"tokens\":156823.19509341655},\"trickle_time_ms\":5000}]}]"] [source=low_ru] [latency=513.25µs]
[2024/08/19 11:35:40.260 +08:00] [INFO] [controller.go:63] ["[resource group controller] update run state"] [name=controller_test] [request-unit-consumption="r_r_u:1450 w_r_u:580 read_bytes:696 write_bytes:551 total_cpu_time_ms:725 kv_read_rpc_count:29 kv_write_rpc_count:29 "] [is-throttled=false]
[2024/08/19 11:35:40.260 +08:00] [INFO] [controller.go:63] ["[resource group controller] before reconfigure"] [name=controller_test] [old-tokens=7970] [old-rate=0] [old-notify-threshold=0] [old-burst=0]
[2024/08/19 11:35:40.260 +08:00] [INFO] [controller.go:63] ["[resource group controller] after reconfigure"] [name=controller_test] [tokens=7970] [rate=31364.63901868331] [notify-threshold=0] [burst=0]
[2024/08/19 11:35:40.345 +08:00] [INFO] [resource_manager_test.go:457] ["finished test case"] [index=0] [sum=3.618337ms] [waitDuration=0s]
[2024/08/19 11:35:41.091 +08:00] [INFO] [client.go:157] ["server starts to synchronize with leader"] [server=pd1] [leader=pd2] [request-index=0]
 enable trace log to help diagnose

Check List

Tests

  • Integration test
  • Manual test (add detailed scripts or steps below)

Release note

None.

Signed-off-by: husharp <[email protected]>
@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 19, 2024
Copy link
Contributor

ti-chi-bot bot commented Aug 19, 2024

@HuSharp: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-integration-realcluster-test f15b156 link false /test pull-integration-realcluster-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link

codecov bot commented Aug 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.64%. Comparing base (b132ea6) to head (5570562).
Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8545      +/-   ##
==========================================
+ Coverage   77.56%   77.64%   +0.08%     
==========================================
  Files         474      474              
  Lines       61876    61877       +1     
==========================================
+ Hits        47995    48047      +52     
+ Misses      10344    10290      -54     
- Partials     3537     3540       +3     
Flag Coverage Δ
unittests 77.64% <100.00%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Aug 20, 2024
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Aug 22, 2024
Copy link
Contributor

ti-chi-bot bot commented Aug 22, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-08-20 07:22:45.93805382 +0000 UTC m=+250161.072503941: ☑️ agreed by okJiang.
  • 2024-08-22 02:59:14.049097019 +0000 UTC m=+407149.183547140: ☑️ agreed by nolouch.

Copy link
Contributor

ti-chi-bot bot commented Aug 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: niubell, nolouch, okJiang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Aug 22, 2024
@ti-chi-bot ti-chi-bot bot merged commit c963e77 into tikv:master Aug 22, 2024
20 of 21 checks passed
@HuSharp HuSharp deleted the improve_rg_test_diagnose branch August 22, 2024 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants