Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type = "parquet" #729

Merged
merged 6 commits into from
Mar 6, 2023
Merged

Add type = "parquet" #729

merged 6 commits into from
Mar 6, 2023

Conversation

juliasilge
Copy link
Member

Closes #713

library(pins)
b <- board_temp()
b %>% pin_write(palmerpenguins::penguins, "penguins-have-factors", type = "parquet")
#> Creating new version '20230306T164332Z-cdfce'
#> Writing to pin 'penguins-have-factors'
b %>% pin_read("penguins-have-factors")
#> # A tibble: 344 × 8
#>    species island    bill_length_mm bill_depth_mm flipper_…¹ body_…² sex    year
#>    <fct>   <fct>              <dbl>         <dbl>      <int>   <int> <fct> <int>
#>  1 Adelie  Torgersen           39.1          18.7        181    3750 male   2007
#>  2 Adelie  Torgersen           39.5          17.4        186    3800 fema…  2007
#>  3 Adelie  Torgersen           40.3          18          195    3250 fema…  2007
#>  4 Adelie  Torgersen           NA            NA           NA      NA <NA>   2007
#>  5 Adelie  Torgersen           36.7          19.3        193    3450 fema…  2007
#>  6 Adelie  Torgersen           39.3          20.6        190    3650 male   2007
#>  7 Adelie  Torgersen           38.9          17.8        181    3625 fema…  2007
#>  8 Adelie  Torgersen           39.2          19.6        195    4675 male   2007
#>  9 Adelie  Torgersen           34.1          18.1        193    3475 <NA>   2007
#> 10 Adelie  Torgersen           42            20.2        190    4250 <NA>   2007
#> # … with 334 more rows, and abbreviated variable names ¹​flipper_length_mm,
#> #   ²​body_mass_g

Created on 2023-03-06 with reprex v2.0.2

Notice we get all the original types back.

Comment on lines 66 to 67
- `type = "parquet"` uses `arrow::write_parquet()` to create a Parquet file. [Parquet](https://parquet.apache.org/) is a modern, language-independent, column-oriented file format for efficient data storage and retrieval. Parquet is a storage format used with [Arrow](https://arrow.apache.org), an in-memory columnar format.
- `type = "arrow"` uses `arrow::write_feather()` to create an Arrow/Feather file. Read the [FAQs from the Arrow project](https://arrow.apache.org/faq/) for more on the differences between Arrow and Parquet as file formats.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the language around Parquet and Arrow in the main vignette. Any suggestions for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd soften the language for arrow further (or just remove the second sentence); given the current position on that website, I don't think you'd ever want to use the arrow on-disk format.

@juliasilge juliasilge marked this pull request as ready for review March 6, 2023 16:55
@juliasilge juliasilge requested a review from hadley March 6, 2023 16:55
vignettes/pins.Rmd Outdated Show resolved Hide resolved
Comment on lines 66 to 67
- `type = "parquet"` uses `arrow::write_parquet()` to create a Parquet file. [Parquet](https://parquet.apache.org/) is a modern, language-independent, column-oriented file format for efficient data storage and retrieval. Parquet is a storage format used with [Arrow](https://arrow.apache.org), an in-memory columnar format.
- `type = "arrow"` uses `arrow::write_feather()` to create an Arrow/Feather file. Read the [FAQs from the Arrow project](https://arrow.apache.org/faq/) for more on the differences between Arrow and Parquet as file formats.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd soften the language for arrow further (or just remove the second sentence); given the current position on that website, I don't think you'd ever want to use the arrow on-disk format.

@juliasilge juliasilge merged commit 3406105 into main Mar 6, 2023
@juliasilge juliasilge deleted the add-parquet branch March 6, 2023 17:45
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should pins support parquet?
2 participants