Description
In GitLab by @alt-romes on Sep 29, 2022, 24:12
Hi!
This is issue is meant to discuss some performance tuning and ideas.
I was recently trying to write a programming using directory
and filepath
for filesystem manipulation.
Something initially in the spirit of du
from coreutils.
The problem is the performance is not great.
In writing a recursive directory traversal I must check constantly whether a path is a directory or not to know whether to recurse into it. The function doesDirectoryExist
takes 80% of runtime -- which might be right due to having to read the metadata.
I attempted to write a faster fastIsDir
by addressing benchmark results.
Here's an attempt:
fastIsDir :: OsPath -> IO Bool
fastIsDir (getPosixString . getOsString -> SBS path) = do
fp <- mallocForeignPtrBytes (144)
withForeignPtr fp $ \p -> useAsCString (SBS path) $ \s -> c_stat s p
return (fileTypeIsDirectory $ fileTypeFromMetadata $ FileStatus fp)
This version ignores errno
to go faster, but it clearly shows the use of useAsCString
, which copies O(n) the path into a null-terminated bytearray. Since I want to run this on all the files in the filesystem, I think the cost has some impact.
Now, I think that c_stat
doesn't need a copy of the string (it's read only, right?), so ideally we would pass path
directly to c_stat
:
fastIsDir :: OsPath -> IO Bool
fastIsDir (getPosixString . getOsString -> SBS path) = do
fp <- mallocForeignPtrBytes (144)
let ptr = Ptr (byteArrayContents# path)
withForeignPtr fp $ \p -> c_stat ptr p
return (fileTypeIsDirectory $ fileTypeFromMetadata $ FileStatus fp)
The issue is that path
, I believe, is not null-terminated, which makes this code wrong. The only way I see to add a NULL at the end of the string is copying it into a length+1 array and terminate it manually (which is what useAsCString
does).
What I want to discuss in this issue is:
Is it possible, since PosixString
is opaque up to Internal
, to have PosixString
be under the hood a null-terminated ByteArray#
such that libraries like directory
can directly pass it to read-only functions without the need for copying it into the stack to add a null terminator?
I might be able to attempt an implementation, given some pointers.
Thanks,
Romes