[help] what's the best way to interactively develop in an R script after its configured for a targets pipeline? #1363

brndngrhm · 2024-11-01T13:19:44Z

brndngrhm
Nov 1, 2024

Help

I understand and agree to https://books.ropensci.org/targets/help.html.

Description

I have a targets pipepline in which one target queries a database, saves a file to disc and returns a file path. A downstream target reads in that path and does some further analysis.

However i sometimes need to edit the explore_data() target, and so i needed to add a way to load in the saved data interactively since the script is configured to work in a targets pipeline. I have the following if/else structure which loads the data appropriately depending on if tar_make() is running or not.

is this the best way to have a file that both works in a targets pipeline but also allows for interactive development outside of the pipeline?

explore_data <-
  function() {
    # load the analysis data
    analysis_data <-
      if (!exists("tar_runtime")) {
        library(tidyverse)
        library(arrow)
        source("R/util.R")
    
        analysis_data <- read_parquet(path/to/analysis_data.parquet)
    
      } else {
    
        # load the analysis data using target name from pipeline in _targets
        parquet_file_path <- tar_read(analysis_data)
    
        # Read the parquet file into a data frame
        analysis_data <- read_parquet(parquet_file_path)
      }
    
    # rest of the script to explore the data
    analysis_data %>% ...

}

Answered by Aariq

Nov 1, 2024

In the above simple example, if I wanted to work on editing the explore_data() function, I'd probably, in the console, run:

library(targets)
tar_load(parquet_file_path)
tar_load_globals() #to load packages and functions

After running tar_load(parquet_file_path) there's a parquet_file_path object in my environment that prints/returns "path/to/analysis_data.parquet".

Then I'd open up the file with the explore_data() function and because I named my file path target the same as the file argument to that function, I can just run the code inside the function one line at a time until I get to the part I want to edit.

This pattern should work for anything. For example, if you had a function that …

View full answer

Aariq · 2024-11-01T17:44:53Z

Aariq
Nov 1, 2024

What I would typically do is have the explore data function take a file_path argument and in the pipeline have an upstream target with format = "file" that just points to the file. Then, outside of the pipeline you can do something like explore_data(tar_read(file))

So your function would look like:

explore_data <- function(parquet_file_path) {
  analysis_data <- read_parquet(parquet_file_path)
  # rest of the script to explore the data
  analysis_data %>% ...
}

And your list of targets in _targets.R would look like:

list(
 tar_target(parquet_file_path, "path/to/analysis_data.parquet", format = "file"),
 tar_target(exploration, explore_data(parquet_file_path))
)

6 replies

Aariq Nov 1, 2024

I guess maybe I'm not understanding what you mean when you say "the script is configured to work in a targets pipeline"

brndngrhm Nov 1, 2024
Author

By "the script is configured to work in a targets pipeline", i mean the code is wrapped in function() and there are no library calls, etc. which means i can't just open it up, load the data and edit the code like i could a "normal" .R file.

in the simple pipeline example, if i realized i needed to add a new plot to my explore_data function, i would want to load in analysis_data.parquet so I could try out and look at a few different iterations of the new plot. How would i load the data to edit the function outside of a targets pipeline? I am new to targets but as i understand it, parquet_file_path only exists as a function parameter within a targets pipeline, right?

Aariq Nov 1, 2024

In the above simple example, if I wanted to work on editing the explore_data() function, I'd probably, in the console, run:

library(targets)
tar_load(parquet_file_path)
tar_load_globals() #to load packages and functions

After running tar_load(parquet_file_path) there's a parquet_file_path object in my environment that prints/returns "path/to/analysis_data.parquet".

Then I'd open up the file with the explore_data() function and because I named my file path target the same as the file argument to that function, I can just run the code inside the function one line at a time until I get to the part I want to edit.

This pattern should work for anything. For example, if you had a function that depended on both the parquet file and the results of explore_data(), you could write the function like do_something <- function(parquet_file_path, exploration) so that running tar_load(c(parquet_file_path, exploration)) would load the objects you need to run the code inside the function and develop it.

Answer selected by brndngrhm

brndngrhm Nov 3, 2024
Author

makes sense, thank you !

joelnitta Nov 3, 2024

I didn't know about tar_load_globals(), I almost always do source("_targets.R") before I start working interactively. But that sounds good, I'll have to try it out!

I totally agree with @Aariq that this workflow is the way to go for working interactively. It's something that isn't really covered that much in the manual, but you end up doing all the time as you get more used to targets.

brndngrhm Nov 4, 2024
Author

i also didnt know about that fuction, and i started looking into other "meta" functions and made a running list:


# Misc
# tar_make(names = c(target_name)): run specific targets
# tar_manifest(fields = "deps"): textual overview of the dependency order,
# tar_load_globals(): to load packages and functions

# Pipeline Visualization and Inspection
# tar_visnetwork(): Visualizes the pipeline structure as an interactive dependency graph, showing target relationships and execution order.
# tar_glimpse(): Provides a static, non-interactive version of the pipeline visualization, useful for quick overviews or text-based environments.
# tar_manifest(): Displays metadata for each target, such as dependencies and settings. Adding fields lets you see specific details (e.g., fields = "deps" for dependencies).
# tar_progress(): Shows the current progress of a running pipeline, indicating which targets are completed, running, or waiting.

# Pipeline Status and Diagnostics
# tar_outdated(): Lists all targets that are out of date and need rebuilding based on changes in code or dependencies.
# tar_invalidate(): Marks specific targets as outdated, forcing them to be rebuilt the next time the pipeline runs.
# tar_meta(): Accesses metadata about targets, including runtime statistics, dependencies, and storage details.
# tar_sitrep(): Provides a "situation report," summarizing pipeline status and configuration issues, which can help troubleshoot potential issues before running.

# Target Information and Results
# tar_read(target_name): Reads the output of a specific target directly from storage, which is useful for examining results without re-running the pipeline.
# tar_load(target_name): Loads a specific target's output into the environment, allowing you to inspect or use it interactively.
# tar_browse(): Opens a target’s output file or directory in your file browser, useful when targets produce files (e.g., images or reports).

# Pipeline Management
# tar_make(): Runs the entire pipeline, automatically handling dependencies.
# tar_make_clustermq() and tar_make_future(): Specialized tar_make variants for running pipelines in parallel on a cluster or with the future package.
# tar_delete(target_name): Deletes a target’s output from storage, forcing it to rebuild the next time the pipeline runs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[help] what's the best way to interactively develop in an R script after its configured for a targets pipeline? #1363

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

[help] what's the best way to interactively develop in an R script after its configured for a targets pipeline? #1363

brndngrhm Nov 1, 2024

Help

Description

Replies: 1 comment · 6 replies

Aariq Nov 1, 2024

Aariq Nov 1, 2024

brndngrhm Nov 1, 2024 Author

Aariq Nov 1, 2024

brndngrhm Nov 3, 2024 Author

joelnitta Nov 3, 2024

brndngrhm Nov 4, 2024 Author

brndngrhm
Nov 1, 2024

Replies: 1 comment 6 replies

Aariq
Nov 1, 2024

brndngrhm Nov 1, 2024
Author

brndngrhm Nov 3, 2024
Author

brndngrhm Nov 4, 2024
Author