Skip to content

cnolanminich/pins

 
 

Repository files navigation

pins: Pin, Discover and Share Resources

Build Status AppVeyor Build Status CRAN Status Code Coverage Downloads Lifecycle: experimental Chat GitHub Stars

Overview

You can use the pins package to:

  • Pin remote resources locally with pin(), work offline and cache results.
  • Discover new resources across different boards using pin_find().
  • Share resources in local folders, GitHub, Kaggle, and RStudio Connect by registering new boards with board_register().

Installation

# Install the released version from CRAN:
install.packages("pins")

To get a bug fix, or use a feature from the development version, you can install pins from GitHub.

# install.packages("remotes")
remotes::install_github("rstudio/pins")

Usage

library(pins)

Pin

There are two main ways to pin a resource:

  • Pin a remote file with pin(url). This will download the file and make it available in a local cache:

    url <- "https://raw.githubusercontent.com/facebook/prophet/master/examples/example_retail_sales.csv"
    retail_sales <- read.csv(pin(url))

    This makes subsequent uses much faster and allows you to work offline. If the resource changes, pin() will automatically re-download it; if goes away, pin() will keep the local cache.

  • Pin an expensive local computation with pin(object, name):

    library(dplyr)
    retail_sales %>%
      group_by(month = lubridate::month(ds, T)) %>%
      summarise(total = sum(y)) %>%
      pin("sales_by_month")

    Then later retrieve it with pin_get(name).

    pin_get("sales_by_month")
    #> # A tibble: 12 x 2
    #>    month   total
    #>    <ord>   <int>
    #>  1 Jan   6896303
    #>  2 Feb   6890866
    #>  3 Mar   7800074
    #>  4 Apr   7680417
    #>  5 May   8109219
    #>  6 Jun   7451431
    #>  7 Jul   7470947
    #>  8 Aug   7639700
    #>  9 Sep   7130241
    #> 10 Oct   7363820
    #> 11 Nov   7438702
    #> 12 Dec   8656874

Discover

You can also discover remote resources using pin_find(). It can search for resources in CRAN packages, Kaggle, and RStudio Connect. For instance, we can search datasets mentioning “seattle” in CRAN packages with:

pin_find("seattle", board = "packages")
#> # A tibble: 6 x 4
#>   name               description                               type  board 
#>   <chr>              <chr>                                     <chr> <chr> 
#> 1 hpiR/ex_sales      Subset of Seattle Home Sales from hpiR p… table packa…
#> 2 hpiR/seattle_sales Seattle Home Sales from hpiR package.     table packa…
#> 3 latticeExtra/Seat… Daily Rainfall and Temperature at the Se… table packa…
#> 4 microsynth/seattl… Data for a crime intervention in Seattle… table packa…
#> 5 vegawidget/data_s… Example dataset: Seattle daily weather f… table packa…
#> 6 vegawidget/data_s… Example dataset: Seattle hourly temperat… table packa…

Notice that the full name of a pin is <owner>/<name>. This namespacing allows multiple people (or packages) to create pins with the same name.

You can then retrieve a pin through pin_get():

seattle_sales <- pin_get("hpiR/seattle_sales") %>% print()
#> # A tibble: 43,313 x 16
#>    pinx  sale_id sale_price sale_date  use_type  area lot_sf  wfnt
#>    <chr> <chr>        <int> <date>     <chr>    <int>  <int> <dbl>
#>  1 ..00… 2013..…     289000 2013-02-06 sfr         79   9295     0
#>  2 ..00… 2013..…     356000 2013-07-11 sfr         18   6000     0
#>  3 ..00… 2010..…     333500 2010-12-29 sfr         79   7200     0
#>  4 ..00… 2016..…     577200 2016-03-17 sfr         79   7200     0
#>  5 ..00… 2012..…     237000 2012-05-02 sfr         79   5662     0
#>  6 ..00… 2014..…     347500 2014-03-11 sfr         79   5830     0
#>  7 ..00… 2012..…     429000 2012-09-20 sfr         18  12700     0
#>  8 ..00… 2015..…     653295 2015-07-21 sfr         79   7000     0
#>  9 ..00… 2014..…     427650 2014-02-19 townhou…    79   3072     0
#> 10 ..00… 2015..…     488737 2015-03-19 townhou…    79   3072     0
#> # … with 43,303 more rows, and 8 more variables: bldg_grade <int>,
#> #   tot_sf <int>, beds <int>, baths <dbl>, age <int>, eff_age <int>,
#> #   longitude <dbl>, latitude <dbl>

Or explore additional properties in this pin with pin_info():

pin_info("hpiR/seattle_sales")
#> # Source: packages<hpiR/seattle_sales> [table]
#> # Description: Seattle Home Sales from hpiR package.
#> # Properties:
#> #   - rows: 43313
#> #   - cols: 16

Share

Finally, you can share resources with other users by publishing to Kaggle, GitHub, RStudio Connect, Azure, Google Cloud, S3, DigitalOcean or integrate them to your website as well.

To publish to Kaggle, you would first need to register the Kaggle board by creating a Kaggle API Token:

board_register_kaggle(token = "<path-to-kaggle.json>")

You can then easily publish to Kaggle:

pin(seattle_sales, board = "kaggle")

Learn more in vignette("boards-understanding")

RStudio

Experimental support for pins was introduced in RStudio Connect 1.7.8 so that you can use RStudio and RStudio Connect to discover and share resources within your organization with ease. To enable new boards, use RStudio’s Data Connections to start a new ‘pins’ connection and then select which board to connect to:

Once connected, you can use the connections pane to track the pins you own and preview them with ease. Notice that one connection is created for each board.

To discover remote resources, simply expand the “Addins” menu and select “Find Pin” from the dropdown. This addin allows you to search for pins across all boards, or scope your search to particular ones as well:

You can then share local resources using the RStudio Connect board. Lets use dplyr and the hpiR_seattle_sales pin to analyze this further and then pin our results in RStudio Connect.

board_register_rsconnect(name = "myrsc")
seattle_sales %>%
  group_by(baths = ceiling(baths)) %>%
  summarise(sale = floor(mean(sale_price))) %>%
  pin("sales-by-baths", board = "myrsc")

After a pin is published, you can then browse to the pin’s content from the RStudio Connect web interface.

You can now set the appropriate permissions in RStudio Connect, and voila! From now on, those with access can make use of this remote file locally!

For instance, a colleague can reuse the sales-by-baths pin by retrieving it from RStudio Connect and visualize its contents using ggplot2:

library(ggplot2)
board_register_rsconnect(name = "myrsc")

pin_get("sales-by-baths", board = "myrsc") %>%
  ggplot(aes(x = baths, y = sale)) +
  geom_point() + 
  geom_smooth(method = 'lm', formula = y ~ exp(x))

Pins can also be automated using scheduled R Markdown. This makes it much easier to create Shiny applications that rely on scheduled data updates or to share prepared resources across multiple pieces of content. You no longer have to fuss with file paths on RStudio Connect, mysterious resource URLs, or redeploying application code just to update a dataset!

Python

Experimental support for pins is also available in Python. However, since the Python interface currently makes use of the R package, the R runtime needs to be installed when using pins from Python. To get started, first install the pins module:

pip install git+https://github.com/rstudio/[email protected]#subdirectory=python

Followed by using pins from Python:

import pins
pins.pin_get("hpiR/seattle_sales")

Please make sure to pin visit, rstudio.github.io/pins, where you will find detailed documentation and additional resources.

About

Pin, Discover and Share Resources

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • R 50.4%
  • HTML 24.3%
  • JavaScript 18.2%
  • Python 4.3%
  • CSS 2.8%