Skip to content

Null-terminated PosixString ByteArray# for performance #168

Open
@hasufell

Description

@hasufell

In GitLab by @alt-romes on Sep 29, 2022, 24:12

Hi!

This is issue is meant to discuss some performance tuning and ideas.

I was recently trying to write a programming using directory and filepath for filesystem manipulation.
Something initially in the spirit of du from coreutils.

The problem is the performance is not great.

In writing a recursive directory traversal I must check constantly whether a path is a directory or not to know whether to recurse into it. The function doesDirectoryExist takes 80% of runtime -- which might be right due to having to read the metadata.

I attempted to write a faster fastIsDir by addressing benchmark results.

Here's an attempt:

fastIsDir :: OsPath -> IO Bool
fastIsDir (getPosixString . getOsString -> SBS path) = do
  fp <- mallocForeignPtrBytes (144)
  withForeignPtr fp $ \p -> useAsCString (SBS path) $ \s -> c_stat s p
  return (fileTypeIsDirectory $ fileTypeFromMetadata $ FileStatus fp)

This version ignores errno to go faster, but it clearly shows the use of useAsCString, which copies O(n) the path into a null-terminated bytearray. Since I want to run this on all the files in the filesystem, I think the cost has some impact.

Now, I think that c_stat doesn't need a copy of the string (it's read only, right?), so ideally we would pass path directly to c_stat:

fastIsDir :: OsPath -> IO Bool
fastIsDir (getPosixString . getOsString -> SBS path) = do
  fp <- mallocForeignPtrBytes (144)
  let ptr = Ptr (byteArrayContents# path)
  withForeignPtr fp $ \p -> c_stat ptr p
  return (fileTypeIsDirectory $ fileTypeFromMetadata $ FileStatus fp)

The issue is that path, I believe, is not null-terminated, which makes this code wrong. The only way I see to add a NULL at the end of the string is copying it into a length+1 array and terminate it manually (which is what useAsCString does).

What I want to discuss in this issue is:

Is it possible, since PosixString is opaque up to Internal, to have PosixString be under the hood a null-terminated ByteArray# such that libraries like directory can directly pass it to read-only functions without the need for copying it into the stack to add a null terminator?

I might be able to attempt an implementation, given some pointers.

Thanks,
Romes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions