Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the cost of making a dataset instance #322

Open
xbito opened this issue Nov 14, 2018 · 3 comments
Open

Reduce the cost of making a dataset instance #322

xbito opened this issue Nov 14, 2018 · 3 comments
Assignees

Comments

@xbito
Copy link
Contributor

xbito commented Nov 14, 2018

We were exploring using scrunch to produce some count information in an internal website. But we realized that loading the count was pretty slow, taking 5-10 seconds to display the results.

Then we noticed the amount of requests going to Crunch, and found that there are a number of calls that are made at the moment you make an instance that are slowing the process significantly for datasets that are relatively large (tens of thousands of variables):

INFO:__main__:Running: ds = get_mutable_dataset('185264f6f5924235afbcfba1d717f0f7')
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): app.crunch.io:443
DEBUG:urllib3.connectionpool:https://app.crunch.io:443 "GET /api/ HTTP/1.1" 401 168
DEBUG:urllib3.connectionpool:https://app.crunch.io:443 "POST /api/public/login/ HTTP/1.1" 204 0
DEBUG:urllib3.connectionpool:https://app.crunch.io:443 "GET /api/ HTTP/1.1" 200 401
DEBUG:urllib3.connectionpool:https://app.crunch.io:443 "GET /api/feature_flag/?feature_name=old_projects_order HTTP/1.1" 200 160
DEBUG:urllib3.connectionpool:https://app.crunch.io:443 "GET /api/datasets/185264f6f5924235afbcfba1d717f0f7/ HTTP/1.1" 200 919
DEBUG:urllib3.connectionpool:https://app.crunch.io:443 "GET /api/datasets/185264f6f5924235afbcfba1d717f0f7/variables/ HTTP/1.1" 200 588962
DEBUG:urllib3.connectionpool:https://app.crunch.io:443 "GET /api/datasets/185264f6f5924235afbcfba1d717f0f7/variables/hier/ HTTP/1.1" 200 24574
DEBUG:urllib3.connectionpool:https://app.crunch.io:443 "GET /api/datasets/185264f6f5924235afbcfba1d717f0f7/settings/ HTTP/1.1" 200 222
DEBUG:urllib3.connectionpool:https://app.crunch.io:443 "GET /api/datasets/185264f6f5924235afbcfba1d717f0f7/folders/ HTTP/1.1" 200 617
DEBUG:urllib3.connectionpool:https://app.crunch.io:443 "GET /api/datasets/185264f6f5924235afbcfba1d717f0f7/folders/ HTTP/1.1" 200 617
DEBUG:urllib3.connectionpool:https://app.crunch.io:443 "GET /api/datasets/185264f6f5924235afbcfba1d717f0f7/folders/hidden/ HTTP/1.1" 200 168
DEBUG:urllib3.connectionpool:https://app.crunch.io:443 "GET /api/datasets/185264f6f5924235afbcfba1d717f0f7/folders/ HTTP/1.1" 200 617
DEBUG:urllib3.connectionpool:https://app.crunch.io:443 "GET /api/datasets/185264f6f5924235afbcfba1d717f0f7/folders/trash/ HTTP/1.1" 200 166

I believe those are related to loading self.folders, self._vars and self.order at init time. Can we make those lazy loaded?

@jjdelc
Copy link
Contributor

jjdelc commented Nov 16, 2018

What counts are you trying to obtain here?

Why not just use straight pycrunch and avoid all the Scrunch magic that is not necessary here?

@jjdelc
Copy link
Contributor

jjdelc commented Nov 16, 2018

Still, those requests look extremely redundant, it's definitely the usage of chained methods self.folders.hidden' and then self.folders.trash' and such that make the same GET to /folders/ to get the .folders part.

@xbito
Copy link
Contributor Author

xbito commented Nov 17, 2018

What counts are you trying to obtain here?

Why not just use straight pycrunch and avoid all the Scrunch magic that is not necessary here?

We actually took that approach. Though, I feel like we should have the option to make scrunch a bit leaner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants