Skip to content

Latest commit

 

History

History
33 lines (24 loc) · 1.24 KB

README.md

File metadata and controls

33 lines (24 loc) · 1.24 KB

BufferedWriter

Troubles writing truck loads of data to thousands of different files?

Operating systems pose limits on the number of files that can be accessed simultaneously. However, opening and closing files for each data element individulally is very slow as the overhead from the file handling operations accumulates and might even lead to crashes if files are accessed too frequently.

The BufferedWriter class offers a simple but efficient, approach to solve this problems: In a dictionary it stores a list of lines for each file. New lines for the files are added to the respective lists first. Only when the length of a files list exceeds a certain limit its contents are written to the file collectively.

Usage

As context manager:

with BufferedWriter(out_dir=out_dir, lines_per_file=i) as writer:
        for line in lines:
                writer.write_line(f"file_{str(line%files_to_write)}", str(line))

Speed test

I generated a list of 100,000 random integer numbers. Each of these numbers should be written to one of 1,000 files according to its residual class. Regular writing, where for each integer number the file is opened and closed, shows to be much slower.