Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up looper CLI use #476

Open
nsheff opened this issue Mar 12, 2024 · 4 comments
Open

Speeding up looper CLI use #476

nsheff opened this issue Mar 12, 2024 · 4 comments
Assignees
Milestone

Comments

@nsheff
Copy link
Contributor

nsheff commented Mar 12, 2024

I'm unsatisified with how long it takes the looper CLI to run. I guess it's because looper imports a bunch of heavy stuff, like pandas, peppy, sqlalchemy (via pephubclient), etc.

A lot of these aren't necessary.

I suggest we see if it's possible to import some of the heaviest things only as needed, instead of at the top of the file as is typically done.

You can profile import time like this:

python -X importtime -c 'import looper'
@nsheff nsheff added this to the v2.0.0 milestone Mar 12, 2024
@nleroy917
Copy link
Member

You can view the output with a cool tool called tuna. I ended up running the following to profile the import time:

python -X importtime -c "from looper.__main__ import main; main()" 2> looper.log

I just did this over at geniml, and remembered this issue so I figured while I was on a roll... Also I was struggling with his when running looper recently. Here is the tuna output

image

seems like pandas (in peppy) is a big issue.

Here is the log output: looper.log if someone wanted to download it and run tuna themselves.

@donaldcampbelljr
Copy link
Contributor

I cannot reproduce those slow import times. I get ~0.4-0.56 seconds during import. I tested a fresh venv as well.

@donaldcampbelljr
Copy link
Contributor

Begun some work towards replacing pandas with polars and doing performance testing.

peppy_branch: https://github.com/pepkit/peppy/tree/dev_replace_pandas_with_polars

importing Peppy, Pandas, Looper 50 times, we see a mean and std in miliseconds for import time of:

Using Pandas
n=50

──────────────────────────────────── Pandas ────────────────────────────────────
mean    188.684421
std    3.665686
──────────────────────────────────── Peppy ─────────────────────────────────────
mean    244.675341
std    20.345653
──────────────────────────────────── Looper ────────────────────────────────────
mean    470.185256
std    27.921771

Replacing pandas with polars in Peppy:
n=50

──────────────────────────────────── Polars ────────────────────────────────────    
mean   51.336722
std   11.519378
──────────────────────────────────── Peppy ─────────────────────────────────────    
mean   185.085058
std   42.192459

Note, I did not test Looper with the polars replacement yet because I realized that Looper will pull in pandas from Peppy, Ubiquerg, and Pipestat so it was becoming difficult to pull out pandas completely.

@donaldcampbelljr
Copy link
Contributor

Attempted to shuffle imports and then measured import time: f91bdad

However, the import time actually increased.

──────────────────────────────────── Looper ────────────────────────────────────
mean    544.00382
std    5.214798

Based on last week's discussion, most gains may come from refactoring Peppy. Therefore, I will move this to a later milestone while we await the Peppy changes.

@donaldcampbelljr donaldcampbelljr modified the milestones: v2.0.0, v2.1.0 Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants