Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider making funnel.Rmd a part of large.Rmd #515

Open
maelle opened this issue Jan 28, 2025 · 11 comments
Open

consider making funnel.Rmd a part of large.Rmd #515

maelle opened this issue Jan 28, 2025 · 11 comments
Assignees

Comments

@maelle
Copy link
Collaborator

maelle commented Jan 28, 2025

No description provided.

@maelle
Copy link
Collaborator Author

maelle commented Jan 28, 2025

And re-organize content.

  • Big picture: big data + RAM = risky. duckplyr's design principles.
  • How to input data for use with duckplyr.
  • A subsection about duckdb request
  • How to compute, including funneling and computation to files.

@maelle
Copy link
Collaborator Author

maelle commented Jan 28, 2025

I'll have a go at this.

@maelle maelle self-assigned this Jan 28, 2025
@maelle maelle mentioned this issue Jan 28, 2025
@krlmlr
Copy link
Member

krlmlr commented Jan 28, 2025

Thanks. I welcome proposals to further split up these vignettes. Right now, both "large" and "funnel" are almost too large on their own, and it would be too scary for me to attempt to read both in one sitting.

What's the motivation to merge these?

@maelle
Copy link
Collaborator Author

maelle commented Jan 28, 2025

The motivation is that both are meant for large data.

I think that with better subsection titles, it'll be easier to skim.

@maelle
Copy link
Collaborator Author

maelle commented Jan 28, 2025

Actually I might split them but differently!

  • how to input data to duckplyr
  • how to handle large data (includes highlighting Parquet etc, and funneling/lazy stuff)

@maelle
Copy link
Collaborator Author

maelle commented Jan 28, 2025

  • how to input data
  • how to compute
  • funnel
  • special tips for large data

@maelle
Copy link
Collaborator Author

maelle commented Jan 28, 2025

@krlmlr I'd consider making funnel a part of "how to compute" so there'd be three vignettes instead of two.

  • How to input data
  • How to compute
  • Special tips for large data: use certain input functions, compute to files, make sure you have funneling in place. Including the "big picture" section that is currently at the end of large.Rmd.

I think the paragraph comparing duckplyr to dbplyr that is currently in the large.Rmd vignette should be moved to the README (I can make a PR) because comparisons to similar tools are sort of usual in READMEs.

I'd remove the "funnel" section from the vignette for developers.

How does this sound?

@krlmlr
Copy link
Member

krlmlr commented Jan 28, 2025

I'd need to think about it. Let's focus on the small wins for now.

@krlmlr
Copy link
Member

krlmlr commented Jan 29, 2025

In my view, data in and data out should be in one vignette.

We can add a third vignette that presents an entirely different view, perhaps a "Getting started" vignette that links to the two others. If there's then material that no longer is a good fit for "large" or "funnel" (soon to be "valve"?), we can rearrange gradually.

@krlmlr
Copy link
Member

krlmlr commented Jan 29, 2025

Would you like to contribute a "Getting started" vignette?

@maelle
Copy link
Collaborator Author

maelle commented Jan 30, 2025

Yes, I'll try! 😸 I'll share the diagram source then, as it could go in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants