Performance: readData utterly slow for files with many lines of data #57

FObersteiner · 2022-02-24T09:18:26Z

Description

Loading data from small files completes in a decent amount of time. With many lines of data (10k+), the process becomes a "bottleneck".

What I Did

read 4.3k lines of data, ffi1001:

%timeit myfile.readData()
67.9 ms ± 7.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

read 86.6k lines of data, ffi1001:

%timeit myfile.readData()
51.5 s ± 2.54 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

That's nearly a minute per file! If I'd want to load many such files, I'd have to go have a lot of coffee in the meantime ☕👾

tracing the execution of the call to readData, I find

this while loop calls internal read method(s) for each line of data (ok)
for NA1001, this method calls the parser readItemsFromUnknownLines (ok)
that parser has multiple conditionals, while loops etc. (nok ?)
it checks for curly braces in each line of data with a regex (nok ?)

The text was updated successfully, but these errors were encountered:

agstephens · 2022-03-10T10:23:33Z

@FObersteiner, I agree that we should look at this. Do you have publicly downloadable large example files that we could use in unit/integration testing?

FObersteiner · 2022-03-10T10:47:05Z

@agstephens jup, I was about to create some public sample data from our ozone instruments anyway ;-) you can find them here: https://git.scc.kit.edu/FObersteiner/pyFairoproc/-/tree/master/samples.

The one that's problematic in this context (nappy reading data) is the cl_photometer file (~86k lines of data, just one variable).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: readData utterly slow for files with many lines of data #57

Performance: readData utterly slow for files with many lines of data #57

FObersteiner commented Feb 24, 2022 •

edited

Loading

agstephens commented Mar 10, 2022

FObersteiner commented Mar 10, 2022 •

edited

Loading

Performance: readData utterly slow for files with many lines of data #57

Performance: readData utterly slow for files with many lines of data #57

Comments

FObersteiner commented Feb 24, 2022 • edited Loading

Description

What I Did

agstephens commented Mar 10, 2022

FObersteiner commented Mar 10, 2022 • edited Loading

FObersteiner commented Feb 24, 2022 •

edited

Loading

FObersteiner commented Mar 10, 2022 •

edited

Loading