Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support big data use cases #75

Open
yair4Data opened this issue Feb 15, 2021 · 3 comments
Open

support big data use cases #75

yair4Data opened this issue Feb 15, 2021 · 3 comments

Comments

@yair4Data
Copy link

great library !
it will be great if there was support on big data use cases (integration with dask/ vaex/spark)
my use case has out of memory data set size and great imbalace so if i want to keep original target ratio - i need to support original data size and not down sample the data.

@fbdesignpro
Copy link
Owner

Hello @yair4Data, thank you for the kind words! I hope the library can be useful to you!

Are you saying you are running out of memory one converting to a pandas data frame (e.g. df = df.compute() in dask)?

Or are you getting an error message when running the report, or generating HTML?

@haiyuni
Copy link

haiyuni commented Mar 12, 2021

t have the same probleam too,my data have about billion rows, but it does not work! can use the modin package?

@fbdesignpro
Copy link
Owner

@haiyuni I haven't looked at modin, I will do so and get back here.

Regarding the billion row issue, I am assuming you are referring to the scale issue (#73)? Or is there a specific error I should be looking at?

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants