Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command.Result blocks indefinitely in xunit #104

Closed
ForNeVeR opened this issue Jul 17, 2023 · 14 comments
Closed

Command.Result blocks indefinitely in xunit #104

ForNeVeR opened this issue Jul 17, 2023 · 14 comments

Comments

@ForNeVeR
Copy link

ForNeVeR commented Jul 17, 2023

I have been debugging this for a while and finally found a cause of my tests blocking on xunit.

I've been writing a simple xunit test:

using Medallion.Shell;

namespace TestProject1;

public class UnitTest1
{
    [Fact]
    public void Test1()
    {
        const string consoleAppPath = @"T:\Temp\ConsoleApp99";
        const string consoleAppProjectFileName = @"ConsoleApp99.csproj";

        Console.WriteLine(
            Command.Run("dotnet", new[] { "build", "--configuration", "Release", Path.Combine(
                consoleAppPath, consoleAppProjectFileName) }, o=> o.WorkingDirectory(consoleAppPath)).Result.StandardOutput
        );
    }
}

This blocks indefinitely on Result (not on StandardOutput!), even if I kill dotnet build and all the child processes manually.

xunit is known for its sync context, so probably there's a ConfigureAwait(false) missing somewhere.

(I wouldn't ask for this in general, but it's a good idea to use that if you provide a synchronous API.)

@madelson
Copy link
Owner

@ForNeVeR thanks for reporting.

I ran a quick script and I don't see any missing ConfigureAwaits, so I don't think that's the issue.

Rather, under the hood Command.Result is sync-over-async. Ideally it would be sync all the way down but I'm not sure that can be done given that ultimately it has to wait for async writing tasks. Probably we could fix this by ensuring that all tasks are kicked off on the default scheduler (e.g. using Task.Run()) to be immune to local sync contexts. Other ideas are welcome as well. For any fix should be possible to write a test for this that installs a single-threaded sync context and replicates the issue. Is this something you'd be interested in exploring/contributing?

In the meantime, I assume you could make your test async and replace with await Command.Run(...).Task to get the result instead. Hopefully that provides a good workaround?

@ForNeVeR
Copy link
Author

ForNeVeR commented Jul 18, 2023

Is this something you'd be interested in exploring/contributing?

Yup, I'll be happy to take a look at it in a week or so.

I assume you could make your test async and replace with await Command.Run(...).Task to get the result instead. Hopefully that provides a good workaround?

It is not for all cases, but for me it is easy enough to convert to async all the way, so it is not a blocker.

@madelson
Copy link
Owner

Yup, I'll be happy to take a look at it in a week or so.

I'd suggest starting with a test that reproduces the issue. This should be possible by authoring a simple sync context that forces a single thread.

@ForNeVeR
Copy link
Author

ForNeVeR commented Jul 24, 2023

Ok, first of all, the reproduction is not that stable as I thought it would be. Here are the instructions for me to reproduce locally (to not forget at least):

  1. cd T:\Temp\ConsoleApp99 && dotnet clean && dotnet build-server shutdown
  2. dotnet test --logger:"console;verbosity=detailed" in a directory with the test app.
  3. In this case, it reproduces almost always.

I haven't been able to reproduce it in an NUnit project of Medallion Shell so far, even with a custom synchronization context (perhaps because there are no tests on .NET 7? I don't know), but I can debug the test host that gets stuck after a command line run.

There are no running threads that do any work, everything's truly deadlocked. Thanks to VS' Tasks view, I can see what exactly is blocked.

There are a couple of tasks started from the ProcessCommand, and it looks like everyone basically waits for the InternalProcessStreamReader.BufferLoop() to finish, and that one is stuck. But how that happens I haven't yet figured out. The code looks correct all the way down to System.IO.Strategies.BufferedFileStreamStrategy.ReadFromNonSeekableAsync, there are correct ConfigureAwait(false) calls everywhere.

image

Certainly… interesting.

@madelson
Copy link
Owner

I really appreciate the investigation @ForNeVeR .

I wonder if the issue is actually this one, which could be pretty susceptible to environment specifics (test runner, how many other threads are in use, thread pool policy, blocking, etc)? If so, then I have a fix in the works for that.

@madelson
Copy link
Owner

madelson commented Jul 26, 2023

@ForNeVeR just a thought: in your repro, what happens if you first greatly increase the max threadpool threads?

@ForNeVeR
Copy link
Author

ForNeVeR commented Jul 26, 2023

Found another clue. These last two dudes in the "Scheduled" state: to me it looks like we are waiting for them. They are the continuations (or whatever's it called) of the ReadFromNonSeekableAsync method.

And for whatever reason, they stay in this "Scheduled and waiting to run" state indefinitely.

image

I wonder if the issue is actually this one

Hard to say, yet.

in your repro, what happens if you first greatly increase the max threadpool threads?

Yep, I thought about that too. It's kinda tricky to figure out what exactly controls the thread pool size in xunit (after all, its settings being confusing is actually the core reason I opened the issue 😅), but I'm trying to figure that out.

Not sure if it's the xunit threading settings and not of the system thread pool, though, but I'll experiment a bit.

Ideally, xunit's threading should be unrelated because all the related code (be it system one or Medallion Shell) seems to call ConfigureAwait(false) when necessary.

But perhaps it occupies the thread pool too much?

But then that's confusing. AFAIK, the general behavior of the thread pool is to create new threads if a task hadn't been able to be scheduled for some time (like 500 ms by default?).

Also, my computer has 16 cores, and there are there are total of 17 active tasks (and two in that weird "Scheduled" state).

A coincidence? 🤔

@ForNeVeR
Copy link
Author

ThreadPool.GetMaxThreads(out var w, out var t); yields a result of 32767 and 1000 respectively, the same in a console app and in an xunit test project.

I have changed maxParallelThreads in xunit to 10, can see it picked up my settings, but the result is unchanged.

@ForNeVeR
Copy link
Author

Alright, this all task business seems to be just a red herring. What I found is that if I call dotnet build-server shutdown in my shell while the test is running, then it immediately ends.

So, it may be just a case of unclosed pipes from child processes that's being held open and thus causing the whole StdOut to never terminate while the child processes are alive.

But I still cannot replicate this behavior in a simple console app, it only reproduces in xunit tests. WTF.

@ForNeVeR
Copy link
Author

ForNeVeR commented Jul 26, 2023

I think I've found the difference.

This is a child process of msbuild started from dotnet test → testhost → dotnet build:

image

And this one is from a console app with exactly the same code:
image

Everything's the same, but the latter shows Process in a job.

It looks like they are doing something weird with job objects in one case, but don't do in another one, and that causes the difference.

Perhaps when I do everything from a console app, then the initial dotnet run, my console program and the whole build host share the same job object, but when it's started from a unit test, then there's no job object and thus something works differently?

UPD: I've tried running ConsoleApp98.exe manually, and if the build server starts from under it, then there's no job object. It still works, but there's no job object. Which kinda breaks my previous hypothesis.

I am confused, but I believe that this puzzle has something to do with job objects; that's very likely.

@ForNeVeR
Copy link
Author

ForNeVeR commented Jul 26, 2023

Reproduced in NUnit project as well. xunit is not involved. Most likely, this is due to the way the test host is started, and SDK starts its processes.

I think that dotnet test starts the test runner process with redirected stdout (i.e. no inheritance), this may change… something.

TODO

  • Verify different SDK versions
  • Check if it's reproducible in async
  • Try redirecting console app stdout somewhere in case if that triggers the issue
  • Figure out what exactly is happening with handle inheritance in different build start modes

@ForNeVeR
Copy link
Author

ForNeVeR commented Jul 30, 2023

  1. Earlier, I was testing on .NET 8, but just found that .NET 7 is also affected.

    And this is unlikely to be a bug in Medallion.Shell. But I have no repro without it, and thus will investigate further.

  2. Reproduces on async test as well 😱

    using Medallion.Shell;
    
    namespace TestProject2;
    
    public class Tests
    {
        [SetUp]
        public void Setup()
        {
        }
    
        [Test]
        public async Task Test1()
        {
            const string consoleAppPath = @"G:\Projects\Medallion.Shell.Bug104\ConsoleApp99";
            const string consoleAppProjectFileName = @"ConsoleApp99.csproj";
    
            ThreadPool.GetMaxThreads(out var w, out var t);
            
            Console.WriteLine("threads worker " + w);
            Console.WriteLine("threads completion pool " + t);
    
            Console.WriteLine(SynchronizationContext.Current?.ToString());
            Console.WriteLine(TaskScheduler.Current.ToString());
    
            var result = await Command.Run("dotnet", new[] { "build", "--configuration", "Release", Path.Combine(
                    consoleAppPath, consoleAppProjectFileName) }, o=> o.WorkingDirectory(consoleAppPath)).Task;
            Console.WriteLine(
                result.StandardOutput
            );
        }
    }

@ForNeVeR
Copy link
Author

At least I've figured out that this issue is definitely unrelated to MedallionShell, so I've posted it to .NET SDK issue tracker: dotnet/sdk#34653

Still no idea whatever happens there, but let's at least free MedallionShell of this burden for now.

@ForNeVeR ForNeVeR closed this as not planned Won't fix, can't repro, duplicate, stale Aug 15, 2023
@madelson
Copy link
Owner

Awesome. Thanks for the in-depth investigation @ForNeVeR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants