How to index files larger than RAM? #42

marcin-github · 2024-10-30T13:15:10Z

marcin-github
Oct 30, 2024

Hi,
I play with qgrep to make searching on huge files faster. I just hit problem with memory usage while updating project. I tested on file about 78GB and qgrep didn't fit in RAM (16GB), OOM killed it. Is it possible to index such file?
Second question, is it possible to deal with compressed text files? Or this is strong no go option?
Thanks

Answered by zeux

Oct 30, 2024

Do you know how many lines the file has? I looked into large files before; it’s possible to support but the line counters are 32-bit and on a use case I was aware of before the file had something like 7B lines; handling that required changing file formats and more code than I was comfortable with.

My general advice is to split files like this into more manageable chunks.

Using compressed input files isn’t planned; it would require integrating more external libraries and for anything but single file .gz compression it’s structurally cumbersome to implement, as a single file becomes a folder hierarchy.

View full answer

zeux · 2024-10-30T16:23:19Z

zeux
Oct 30, 2024
Maintainer

Do you know how many lines the file has? I looked into large files before; it’s possible to support but the line counters are 32-bit and on a use case I was aware of before the file had something like 7B lines; handling that required changing file formats and more code than I was comfortable with.

My general advice is to split files like this into more manageable chunks.

Using compressed input files isn’t planned; it would require integrating more external libraries and for anything but single file .gz compression it’s structurally cumbersome to implement, as a single file becomes a folder hierarchy.

0 replies

marcin-github · 2024-10-30T16:30:34Z

marcin-github
Oct 30, 2024
Author

1293634516 lines of json (one json per line)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to index files larger than RAM? #42

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How to index files larger than RAM? #42

marcin-github Oct 30, 2024

Replies: 2 comments

zeux Oct 30, 2024 Maintainer

marcin-github Oct 30, 2024 Author

marcin-github
Oct 30, 2024

zeux
Oct 30, 2024
Maintainer

marcin-github
Oct 30, 2024
Author