Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notes on large.Rmd #523

Open
maelle opened this issue Jan 28, 2025 · 0 comments
Open

Notes on large.Rmd #523

maelle opened this issue Jan 28, 2025 · 0 comments

Comments

@maelle
Copy link
Collaborator

maelle commented Jan 28, 2025

  • Maybe add a few sentences explaining that with large data you want (actually I think these could be the three sections of the vignette)

    • input data out of your RAM (so you're probably using a file format that's good for large data, and you're in luck there are functions for inputting this to duckplyr)
    • efficient computation (yay DuckDB, and here you don't even need to learn syntax beyond dplyr)
    • output also out of your RAM unless small (funnel, compute to file)
  • Drawback of large data + duckplyr is that the limits of duckplyr won't be made up for by fallbacks since fallbacks to dplyr necessitate putting data into RAM.

  • In the latter case, if too many fallbacks needed, do we recommend using dbplyr?

  • The paragraph on dbplyr should be in the README.

  • Some functions described in large.Rmd, in particular duckdb_tibble(), don't seem relevant for large data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant