Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Test] Microsoft.Diagnostics.Monitoring.EventPipe.UnitTests.EventCounterTriggerWithEventPipePipelineTest fails intermittently #4782

Open
mikem8361 opened this issue Jul 10, 2024 · 2 comments
Assignees
Labels
Milestone

Comments

@mikem8361
Copy link
Member

Re-enable this test after it is fixed.

Failure:

Microsoft.Diagnostics.Monitoring.EventPipe.UnitTests.EventCounterTriggerTests.EventCounterTriggerWithEventPipePipelineTest(config: projectk.sdk.prebuilt.6.0.30) 
      Duration: 2 min

      Message: 
System.Threading.Tasks.TaskCanceledException : A task was canceled.

      Stack Trace: 
<<ExecutePipelineWithTracee>b__0>d.MoveNext() line 86
--- End of stack trace from previous location ---
TestRunnerUtilities.ExecuteCollection(Task collectionTask, TestRunner testRunner, CancellationToken token, Func`2 waitForPipeline) line 47
PipelineTestUtilities.ExecutePipelineWithTracee[TPipeline](TPipeline pipeline, Func`3 startPipelineAsync, TestRunner testRunner, CancellationToken token, TaskCompletionSource`1 waitTaskSource) line 93
PipelineTestUtilities.ExecutePipelineWithTracee[T](EventSourcePipeline`1 pipeline, TestRunner testRunner, TaskCompletionSource`1 waitTaskSource) line 45
EventCounterTriggerTests.EventCounterTriggerWithEventPipePipelineTest(TestConfiguration config) line 359
EventCounterTriggerTests.EventCounterTriggerWithEventPipePipelineTest(TestConfiguration config) line 364
EventCounterTriggerTests.EventCounterTriggerWithEventPipePipelineTest(TestConfiguration config) line 365
--- End of stack trace from previous location ---

      Standard Output: 
[7/10/2024 4:25:49 PM] Test runner created
Running Process: C:\ssd\diagnostics\.dotnet-test\dotnet.exe --fx-version 6.0.30 C:\ssd\diagnostics\artifacts\bin\EventPipeTracee\Debug\net6.0\EventPipeTracee.dll 3d3f0f7b-dfab-4f64-91f4-62cc00ab3a2d TriggerRemoteTest SpinWait10
Working Directory: 
Additional Environment Variables: DOTNET_MULTILEVEL_LOOKUP=0, DOTNET_ROOT=C:\ssd\diagnostics\.dotnet-test
{
    [00:00.005] 5584 Successfully started process
    [00:00.005] 5584 WaitForTracee
    00:00.088: 5584 EventPipeTracee: DiagMetrics False
    00:00.088: 5584 EventPipeTracee: start process
    00:00.088: 5584 EventPipeTracee: connecting to pipe
    [00:00.093] 5584 WaitForTracee: DONE
    00:00.090: 5584 EventPipeTracee: connected to pipe
    [00:00.099] 5584 WakeupTracee
    00:00.138: 5584 EventPipeTracee: 7/10/2024 11:25:50 PM Awaiting start
    [00:00.144] 5584 WakeupTracee DONE
    [00:00.144] 5584 WaitForSignal
    00:00.139: 5584 7/10/2024 11:25:50 PM Starting test body '42'
    00:00.147: 5584 EventPipeTracee: signal end of test data
    [00:00.151] 5584 WaitForSignal DONE 31
    00:00.489: 5584 Spin waiting...
    00:00.824: 5584 Spin waiting...
    00:01.157: 5584 Spin waiting...
    00:01.495: 5584 Spin waiting...
    00:01.833: 5584 Spin waiting...
    00:02.173: 5584 Spin waiting...
    00:02.571: 5584 Spin waiting...
    00:03.103: 5584 Spin waiting...
    00:03.562: 5584 Spin waiting...
    00:03.948: 5584 Spin waiting...
    00:04.350: 5584 Spin waiting...
    00:04.759: 5584 Spin waiting...
    00:05.119: 5584 Spin waiting...
    00:05.479: 5584 Spin waiting...
    00:05.823: 5584 Spin waiting...
    00:06.165: 5584 Spin waiting...
    00:06.501: 5584 Spin waiting...
    00:06.836: 5584 Spin waiting...
    00:07.176: 5584 Spin waiting...
    00:07.510: 5584 Spin waiting...
    00:07.848: 5584 Spin waiting...
    00:08.182: 5584 Spin waiting...
    00:08.515: 5584 Spin waiting...
    00:08.848: 5584 Spin waiting...
    00:09.181: 5584 Spin waiting...
    00:09.515: 5584 Spin waiting...
    00:09.848: 5584 Spin waiting...
    00:10.147: 5584 7/10/2024 11:26:00 PM Awaiting end
}
Process 5584 exited -1 (02:00.034 elapsed)

Killing process 5584: 02:00.039 - Process timed out
    [02:00.107] 5584 Did not receive completion signal before cancellation.
    [02:00.107] 5584 WakeupTracee
    [02:00.108] 5584 WakeupTracee DONE
    [02:00.108] 5584 Disposing
    [02:00.108] 5584 WakeupTracee
    [02:00.108] 5584 WakeupTracee DONE
    [02:00.108] 5584 WaitForExitAsync
    [02:00.108] 5584 WaitForExitAsync DONE
@mikem8361 mikem8361 added this to the 9.0.0 milestone Jul 10, 2024
@mdh1418 mdh1418 self-assigned this Aug 13, 2024
@mdh1418
Copy link
Member

mdh1418 commented Aug 15, 2024

This test timeout reproduces fairly consistently (maybe only a flakey success once in a while).


Attempts to find the culprit change

Since this test was noted to have started failing while #4743 was bumping the runtime version, I first started to use a locally built runtime based on the commit hash 8fac5af2b11dc98fa0504f6fd06df790164ec958 that the PR bumped from following the instructions at https://github.com/dotnet/diagnostics/blob/main/documentation/privatebuildtesting.md. As the timeout still reproduced, I additionally built the libs subset and copied the entire runtime\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\* as well, but the timeout still reproduced.

Moreover, I rebuilt the diagnostics repo after reverting the runtime bump pr #4743, and still the timeout reproduced.

As such, its unclear how far back the timeout began, or when the culprit change really occurred to start causing the test to timeout.


Timeout investigation

Debugging the test timeout itself, it seems to consistently hit an exception at


where the token is cancelled and gets caught at
catch (Exception ex) when (TryFailCompletionSourcesReturnFalse(ex))
. It's not clear to me exactly why the token is being cancelled, but seeing as how the MicrosoftDiagnosticsTracingTraceEventVersion version hasn't changed in a year, it doesn't seem related to that package.

@noahfalk @davmason Do y'all have any suspicions as to how the token might be getting cancelled?
I believe these were the async callstacks from DiagnosticsEventPipeProcessor.Process().
diagnostics_tests_cancelled

@davmason
Copy link
Member

I am not super familiar with this part of the code, I don't have an idea for what is causing the token to be cancelled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants