Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs, especially about concurrent writes #817

Merged
merged 2 commits into from
Jan 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,6 @@ The pins package publishes data, models, and other R objects, making it easy to
You can pin objects to a variety of pin *boards*, including folders (to share on a networked drive or with services like DropBox), Posit Connect, Amazon S3, Google Cloud Storage, Azure storage, and Microsoft 365 (OneDrive and SharePoint).
Pins can be automatically versioned, making it straightforward to track changes, re-run analyses on historical data, and undo mistakes.

pins 1.0.0 includes a new more explicit API and greater support for versioning.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pins 1.0.0 was in Oct 2021 so I think it's time to remove this, to avoid confusion for folks coming to this page afresh. The vignette about moving to pins 1.0.0 is still included.

The legacy API (`pin()`, `pin_get()`, and `board_register()`) will continue to work, but new features will only be implemented with the new API, so we encourage you to switch to the modern API as quickly as possible.
Learn more in `vignette("pins-update")`.

You can use pins from Python as well as R. For example, you can use one language to read a pin created with the other. Learn more about [pins for Python](https://rstudio.github.io/pins-python/).

## Installation
Expand Down
25 changes: 16 additions & 9 deletions vignettes/pins.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,17 @@ The first argument is the object to save (usually a data frame, but it can be an
The name is basically equivalent to a file name: you'll use it when you later want to read the data from the pin.
The only rule for a pin name is that it can't contain slashes.

As you can see from the output, pins has chosen to save this data to an `.rds` file.
After you've pinned an object, you can read it back with `pin_read()`:

```{r}
board %>% pin_read("mtcars")
```

You don't need to supply the file type when reading data from a pin because pins automatically stores the file type in the [metadata](#Metadata).

## How and what to store as a pin

As you can see from the output in the previous section, pins has chosen to save this example data to an `.rds` file.
But you can choose another option depending on your goals:

- `type = "rds"` uses `writeRDS()` to create a binary R data file. It can save any R object (including trained models) but it's only readable from R, not other languages.
Expand All @@ -68,19 +78,16 @@ But you can choose another option depending on your goals:
- `type = "json"` uses `jsonlite::write_json()` to create a JSON file. Pretty much every programming language can read json files, but they only work well for nested lists.
- `type = "qs"` uses `qs::qsave()` to create a binary R data file, like `writeRDS()`. This format achieves faster read/write speeds than RDS, and compresses data more efficiently, making it a good choice for larger objects. Read more on the [qs package](https://github.com/traversc/qs).

After you've pinned an object, you can read it back with `pin_read()`:

```{r}
board %>% pin_read("mtcars")
```

You don't need to supply the file type when reading data from a pin because pins automatically stores the file type in the metadata, the topic of the next section.

Note that when the data lives elsewhere, pins takes care of downloading and caching so that it's only re-downloaded when needed.
That said, most boards transmit pins over HTTP, and this is going to be slow and possibly unreliable for very large pins.
As a general rule of thumb, we don't recommend using pins with files over 500 MB.
If you find yourself routinely pinning data larger that this, you might need to reconsider your data engineering pipeline.

Storing your data/object as a pin works well when you write from a single source or process. It is _not_ appropriate when multiple sources or processes need to write to the same pin; since the pins package reads and writes files, it cannot manage concurrent writes.

- **Good** use for pins: an ETL pipeline that stores a model or summarized dataset once a day
- **Bad** use for pins: a Shiny app that collects data from users, who may be using the app at the same time

## Metadata

Every pin is accompanied by some metadata that you can access with `pin_meta()`:
Expand Down
Loading