Description
When a thread is created while a directory handle is open, a fchdir
is done (according to strace
output under GNU Linux), affecting other threads. The issue can be reproduced on various platforms (GNU Linux, Android, macOS) with the following testcase, where stat
randomly fails with a "No such file or directory" error (ENOENT
):
#!/usr/bin/env perl
# Create a directory with files in it, for instance with
# mkdir test && cd test && touch `seq 999` && cd ..
# then run this Perl script with the directory name in argument.
# But one can get failures even with an empty directory, because
# there are at least . and .. in a directory.
use strict;
use threads;
@ARGV == 1 || @ARGV == 2 or die "Usage: $0 <dir> [ <maxthreads> ]\n";
my ($dir,$maxthreads) = @ARGV;
-d $dir or die "$0: $dir is not a directory\n";
if (defined $maxthreads)
{
$maxthreads =~ /^\d+$/ && $maxthreads >= 1 && $maxthreads <= 32
or die "$0: maxthreads must be an integer between 1 and 32\n";
}
else
{
$maxthreads = 2;
}
sub stat_test ($) {
foreach my $i (1..100)
{
stat "$dir/$_[0]"
or warn("$0: can't stat $_[0] ($!, i = $i)\n"), last;
}
}
my $nthreads = 0;
sub join_threads () {
my @thr;
0 until @thr = threads->list(threads::joinable);
foreach my $thr (@thr)
{ $thr->join(); }
$nthreads -= @thr;
}
opendir DIR, $dir or die "$0: opendir failed ($!)\n";
while (my $file = readdir DIR)
{
$nthreads < $maxthreads or join_threads;
$nthreads++ < $maxthreads or die "$0: internal error\n";
threads->create(\&stat_test, $file);
}
closedir DIR or die "$0: closedir failed ($!)\n";
join_threads while $nthreads;
__END__
"
Example of failure:
./dir-stat2: can't stat 2 (No such file or directory, i = 18)
./dir-stat2: can't stat 6 (No such file or directory, i = 17)
"
In short, for each file in the directory, a thread is created, which does 100 stat
on the file. This script uses no more than 2 worker threads at the same time by default: the main loop waits for a worker thread to terminate before a new one is created.
To reproduce the issue more easily, create a directory with many files:
mkdir test && cd test && touch `seq 999` && cd ..
Then run this script with the directory name (e.g. test
) in argument. Failures are much more likely to occur with strace -f
(as usual, also use the -o
option to redirect the output to a file).
In the strace
output under GNU Linux, one can see for instance:
[...]
692379 newfstatat(AT_FDCWD, "test/275", <unfinished ...>
692373 <... openat resumed>) = 5
692379 <... newfstatat resumed>{st_mode=S_IFREG|0644, st_size=0, ...}, 0) = 0
692373 fstat(5, <unfinished ...>
692379 newfstatat(AT_FDCWD, "test/275", <unfinished ...>
692373 <... fstat resumed>{st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
692379 <... newfstatat resumed>{st_mode=S_IFREG|0644, st_size=0, ...}, 0) = 0
692373 fchdir(4 <unfinished ...>
692379 newfstatat(AT_FDCWD, "test/275", <unfinished ...>
692373 <... fchdir resumed>) = 0
692379 <... newfstatat resumed>0x5557afc648b8, 0) = -1 ENOENT (No such file or directory)
692373 openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 6
692379 write(2, "./dir-stat2: can't stat 275 (No "..., 64 <unfinished ...>
[...]
This excerpt shows 3 newfstatat
on the same file test/275
. The first two succeeded, but the third one failed with ENOENT
(No such file or directory).
I think that this bug comes from the fix of #10387 (old rt.perl.org bug 75174), 11a11ec, which does a fchdir
in Perl_dirp_dup
from sv.c
, so Perl versions since 2010 should be affected, and this issue is still present in the repository. As the current working directory is global to the process, this affects other threads. Even though the current working directory is set back to the old value, this is a race condition, which can affect real scripts (this is how I identified this bug). Note that this is a vulnerability as the directory may be an untrusted one, so really bad things could happen (even when the directory is trusted, BTW).
I could reproduce the issue under GNU Linux with several file systems (ext4, tmpfs, NFS), and also on my Android phone (using Termux) and on a macOS machine (from the cfarm project).
Past bug reports:
- Debian bug 1098226
- Linux-nfs mailing-list (I had initially blamed NFS for this issue, due to the strangeness of the error message).