Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large file support #16

Open
isbadawi opened this issue Feb 3, 2020 · 2 comments
Open

Large file support #16

isbadawi opened this issue Feb 3, 2020 · 2 comments
Labels

Comments

@isbadawi
Copy link
Owner

isbadawi commented Feb 3, 2020

i.e. don't read the whole file into memory.

@isbadawi
Copy link
Owner Author

Should do some experiments to figure out the best way to do a regex search on a large file without reading it all into memory.

One idea would be mmap + madvise(MADV_SEQUENTIAL)

@isbadawi
Copy link
Owner Author

With mmap, the memory is not resident until accessed, so just naively using regexec like we do now, matches that occur earlier would use less memory. But if you're at the beginning of the file and the match is near the end of the file, it could still load everything into memory. madvise(MADV_SEQUENTIAL) doesn't seem to have an affect on this (tested on macOS, the resident set size was the same).

libpcre supports partial matching and multi-segment matching: https://www.pcre.org/current/doc/html/pcre2partial.html#SEC4
This would allow us to search in chunks, and explicitly unmap chunks once we're done with them.

Side note while experimenting with this, it looks like vim does read the entire file into memory (tested with a 1GB file).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant