FList format #11

muhamadazmy · 2019-03-21T10:16:39Z

It's time we move away from the current flist structure.

May be use sqlite to it's full potential to optimize file query. An entry per directory is very inefficient for file access, and make it very hard to implement the fuse layer using the inode api (although i did it in the rust implementation) i had to implement a caching layer (and had to do lots of data copying) to make it work properly.

I suggest something like

inode | parent-inode | entry

Entry can still be a capnp object but with a more simplified schema

struct Inode {
    name    @0: Text;
    size    @1: UInt64;           # in bytes

    attributes: union {
        dir     @2: Dir;
        file    @3: File;
        link    @4: Link;
        special @5: Special;
    }

    aclkey           @6: Text;    # is pointer to ACL # FIXME: need to be int
    modificationTime @7: UInt32;
    creationTime     @8: UInt32;
}

I would also love to move away from capnp alltogether, although it's very size efficient it's not that good to work against in memory, u still have to copy all your data out to be able to use it.

The text was updated successfully, but these errors were encountered:

zaibon · 2019-03-21T12:17:06Z

@muhamadazmy can you elaborate on what exactly make the current format hard to use.
I though it was already been though out to optimize size of the flist as well as walking the filesystem.

maxux · 2019-03-21T12:33:47Z

As soon as you want to reach a file in a directory, you need to fetch the whole directory contents, iterate over the list of inode to find the corresponding name, and then you have your inode.

You need to do this for any file lookup. Walking is quite easy and fast, but you barely never do that. Caching was a feature needed to not fetch and parse the capnp object each time you try to hit a file in a directory (understand: when you ls a directory, you basically stat all files in the directory).

maxux · 2019-03-21T12:51:20Z

I'm not against changing the format, since even for me when writing zflist it was quite tricky and not easy at all. But I need to admit that in a size point-of-view, this is quite efficient. But it's less efficient to use it without consuming lot of memory for caching.

But in the other side, using one entry per file in sqlite directly will be dramatically worse, for both size and performance. This is not a better idea. For average directory size (let says < 50 files), it will be a lot faster to iterate a list than doing a SELECT query (this needs to be benchmarked, but I'm pretty sure).

We could investigate around other existing serialization (like msgpack or anything binary compliant), but I don't know how we can be lookup efficient without exploding memory or storage-size right now.

zaibon · 2019-03-21T14:10:47Z

So it seems that this is a trade off problem and we need to find the right balance.
Personally I think the size of the flist is less important and I would prefer to have something that give me more performance at runtime.

Now we also need to see what would be the impact on the size of the flist if we change that. See if this stays in the reasonable enough.

muhamadazmy · 2019-05-20T06:29:10Z

very late comment :P

@maxux You might have a point regarding listing a directory. at least for the first time after that the kernel will do a pretty good job caching the directory entries (specially for a ro filesystem).

Also note that low level (faster) fuse api uses inodes, so on accessing files u will probably need to directly retrieve the file info from the db (using inode as key) which is going to be much faster and efficient rather than first loading it's parent directly and then traverse a list (that can be really long) to retrieve the file object.

I don't believe change the format this way will actually affect the size that much, but it's definitely going to increase the runtime performance dramatically specially for traversing the files tree, and opening files (may be not reading)

zaibon added the type_question Extra attention is needed label Mar 21, 2019

maxux added this to the later milestone Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FList format #11

FList format #11

muhamadazmy commented Mar 21, 2019

zaibon commented Mar 21, 2019

maxux commented Mar 21, 2019

maxux commented Mar 21, 2019 •

edited

Loading

zaibon commented Mar 21, 2019

muhamadazmy commented May 20, 2019

FList format #11

FList format #11

Comments

muhamadazmy commented Mar 21, 2019

zaibon commented Mar 21, 2019

maxux commented Mar 21, 2019

maxux commented Mar 21, 2019 • edited Loading

zaibon commented Mar 21, 2019

muhamadazmy commented May 20, 2019

maxux commented Mar 21, 2019 •

edited

Loading