test: split up test-async-context-frame into individual tests #54818

jasnell · 2024-09-06T17:15:13Z

Using the test runner on these appears to be flaky on windows. This is an experiment to see if splitting these up shows the same flakiness in order to help determine if the issue is with the async-content-frame implementation or the test runner itself.

This PR is opened for testing purposes. It may or may not land. I'm opening it so that I can test the flakiness of these changes in CI.

I do not expect this PR to land as is, and there are several steps to this test meant to use a process of elimination to identify the specifc issue at play. There are three distinct components here that need to be evaluated:

Use of experimental-async-context-frame
Spawning multiple python test runner instances in parallel
Use of node:test to structure the test

Each of these individually or in combination could be leading to the access violation on windows and I've been unable to reproduce the problem locally, so this experiment will work through each using stress tests on the CI

Refs: #54808

Using the test runner on these appears to be flaky on windows. This is an experiment to see if splitting these up shows the same flakiness in order to help determine if the issue is with the async-content-frame implementation or the test runner itself.

jasnell · 2024-09-06T17:35:01Z

Stress Test 1: https://ci.nodejs.org/job/node-stress-single-test/528/ (hopefully this still works)

Qard · 2024-09-06T21:08:45Z

Seems like the stress test thing isn't working anymore. Seems to have a bunch of unrelated visual studio project file errors. I'll just kick off a regular CI and see how it goes.

cjihrig · 2024-09-06T21:12:49Z

I didn't scroll through all of the logs, so maybe I missed something, but it looks like it ran successfully:

17:06:10 TAP version 13
17:06:10 1..2
17:06:10 ok 1 async-hooks/test-async-local-storage-socket-frame
17:06:10   ---
17:06:10   duration_ms: 259.01700
17:06:10   ...
17:06:10 ok 2 async-hooks/test-async-local-storage-socket
17:06:10   ---
17:06:10   duration_ms: 310.98800
17:06:10   ...
17:06:10 
17:06:10 All tests passed.
17:06:10 1000   OK: 1000   NOT OK: 0   TOTAL: 1000

jasnell · 2024-09-06T21:13:46Z

I was able to get at least part of it working: https://ci.nodejs.org/job/node-stress-single-test/528/ ... with only a couple of modified tests (running the *-socket and *-socket-frame variants 1000 times each. Zero flakiness reported over 1000 runs, which is a good sign that the issue is not with the async-context-frame changes. I'm really thinking the issue lies more with the spawning of the parallel python test-runner processes, in which case tracking down the specific problem is likely going to be difficult. I propose that if the regular CI run on this PR is good, we go ahead and merge this PR with the individual tests split out into individual files. We can separate keep investigating as time allows what may be happening with the python runner. Any objections to that approach? @Qard @cjihrig @anonrig ?

anonrig · 2024-09-06T21:14:59Z

Sounds good to me 👍

Qard · 2024-09-06T21:16:41Z

I'm not a huge fan of the doubled tests. Can we first try eliminating the parallelism of the original test and see if that is sufficient? I suspect it's a matter of we're spinning up a ton of different tests at the same time and perhaps just running out of memory from so many things running at once. 🤔

nodejs-github-bot · 2024-09-06T21:19:44Z

CI: https://ci.nodejs.org/job/node-test-pull-request/62077/

Qard · 2024-09-06T21:21:16Z

I've opened #54823 to test my assumption that it might just be the concurrency that's problematic.

jasnell requested review from Qard and anonrig September 6, 2024 17:15

nodejs-github-bot added async_hooks Issues and PRs related to the async hooks subsystem. needs-ci PRs that need a full CI run. test Issues and PRs related to the tests. labels Sep 6, 2024

anonrig approved these changes Sep 6, 2024

View reviewed changes

jasnell mentioned this pull request Sep 6, 2024

test-async-context-frame is flaky #54808

Open

jasnell requested a review from cjihrig September 6, 2024 17:32

This comment was marked as outdated.

Sign in to view

Qard added the request-ci Add this label to start a Jenkins CI on a PR. label Sep 6, 2024

github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Sep 6, 2024

Qard mentioned this pull request Sep 6, 2024

test: fix Windows async-context-frame memory failure #54823

Merged

jasnell closed this Sep 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: split up test-async-context-frame into individual tests #54818

test: split up test-async-context-frame into individual tests #54818

jasnell commented Sep 6, 2024 •

edited

Loading

jasnell commented Sep 6, 2024

This comment was marked as outdated.

Qard commented Sep 6, 2024

cjihrig commented Sep 6, 2024

jasnell commented Sep 6, 2024

anonrig commented Sep 6, 2024

Qard commented Sep 6, 2024

nodejs-github-bot commented Sep 6, 2024

Qard commented Sep 6, 2024

test: split up test-async-context-frame into individual tests #54818

test: split up test-async-context-frame into individual tests #54818

Conversation

jasnell commented Sep 6, 2024 • edited Loading

jasnell commented Sep 6, 2024

This comment was marked as outdated.

Qard commented Sep 6, 2024

cjihrig commented Sep 6, 2024

jasnell commented Sep 6, 2024

anonrig commented Sep 6, 2024

Qard commented Sep 6, 2024

nodejs-github-bot commented Sep 6, 2024

Qard commented Sep 6, 2024

jasnell commented Sep 6, 2024 •

edited

Loading