How to index files larger than RAM? #42
-
Hi, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Do you know how many lines the file has? I looked into large files before; it’s possible to support but the line counters are 32-bit and on a use case I was aware of before the file had something like 7B lines; handling that required changing file formats and more code than I was comfortable with. My general advice is to split files like this into more manageable chunks. Using compressed input files isn’t planned; it would require integrating more external libraries and for anything but single file .gz compression it’s structurally cumbersome to implement, as a single file becomes a folder hierarchy. |
Beta Was this translation helpful? Give feedback.
-
1293634516 lines of json (one json per line) |
Beta Was this translation helpful? Give feedback.
Do you know how many lines the file has? I looked into large files before; it’s possible to support but the line counters are 32-bit and on a use case I was aware of before the file had something like 7B lines; handling that required changing file formats and more code than I was comfortable with.
My general advice is to split files like this into more manageable chunks.
Using compressed input files isn’t planned; it would require integrating more external libraries and for anything but single file .gz compression it’s structurally cumbersome to implement, as a single file becomes a folder hierarchy.