Data #127

jonthegeek · 2024-07-23T15:26:29Z

jonthegeek
Jul 23, 2024
Maintainer

(moved from Teams discussion)

From my current understanding of usage, I think the apps created/deployed with {gsmApp} should expect specific files (possibly including a directory of 1 or more files, for things like participant-level details), loading them reactively as needed.

Those files should be updated independent of the {gsmApp} package, since where they come from will be Gilead-specific (or, for other users of the open-source package, specific to their workflows). But we might want to export helper functions to help set them up (for example, functions that take data.frames and prep them for the app (if needed) and then save them to the expected locations. Separately, we'll write Gilead-specific things to fetch the data from the database, and run those on whatever cadence makes sense for us.

Alternatively, some data could be purely database-side (eg participant-level data), and fetched on demand... but I feel like that would make things much more complicated (users would have to supply data-fetcher functions + optional change-checker functions to gsmApp::run_app()). But that data gets compiled into the higher-level data, right? So, since we need to fetch it to build that, we probably might as well keep it app-ready at that point. I guess if the participant-level data is really large we might want to keep it restricted to the database, but otherwise things will be much more straightforward if we prep and save it for the app.

Does that all make sense? Are there any flaws in my understanding?

jonthegeek · 2024-07-23T15:26:49Z

jonthegeek
Jul 23, 2024
Maintainer Author

@jwildfire said: Let's convert this to a discussion over in GitHub and talk about it a bit there. Not sure I'll be able to write up a long post about my thinking before the design discussion, but I'll try. Worst case, we can talk it through as part of that meeting.

For now, I'll just say the data inputs for gsmApp should follow the specifications for the various data models (mapped/analysis/reporting) defined over in {gsm} as closely as possible. If those data models are not adequate for the {gsmapp} requirements, we should almost certainly consider updating the {gsm} data model as part of a v2.1 release.

As long as we follow those specs, I think the approach to loading the data can be designed to be fairly flexible.

0 replies

jonthegeek · 2024-07-23T15:32:58Z

jonthegeek
Jul 23, 2024
Maintainer Author

I agree that the data should be created via the gsm workflows, but setting that up should be separate from general running of the app. The data should be processed at deploy-time (and then on whatever cadence we want to update the data), not every time the app is launched by a user nor when a new session begins with the app. The app should expect the data to be saved where it wants it to be saved, and not know about anything that happens before that.

0 replies

taylorrodgers · 2024-07-24T11:02:01Z

taylorrodgers
Jul 24, 2024
Maintainer

gsmApp generates UI components in two distinct ways:

gsm data viz and table functions (including widgets)
gsmApp specific functions

This would also be a great way to divide up github tasks.

Updating `gsm` data viz functions + data sources

The gsmApp makes extensive use of gsm widget functions for data visualizations and tables.

If we use the widget functions, here are the data sets we'll need:

Input	Widget_BarChart	Widget_GroupOverview	Widget_ScatterPlot	Widget_TimeSeries
dfResults	x	x	x	x
lMetric	x		x	x
dfMetrics		x
dfGroups	x	x	x	x
dfBounds			x
vThreshold	x			x
strOutcome	x			x
bAddGroupSelect	x		x	x
strGroupLevel		x
strGroupSubset		x
strGroupLabelKey		x

Section on gsmApp function inputs

Attached below is a mapping of where the various data points are located in the new data model:
sources.xlsx

0 replies

taylorrodgers · 2024-07-30T11:39:33Z

taylorrodgers
Jul 30, 2024
Maintainer

Here is a more concise list of required inputs:

New Data Source	Widget	gsmApp
dfBounds	x
dfGroups	x	x
dfInput		x
dfMetrics	x	x
dfResults	x
dfSummary		x
lMapped$dfAE		x
lMapped$dfENROLL		x
lMapped$dfPD		x
lMapped$dfSDRGCOMP		x
lMapped$dfSTUDCOMP		x
lMapped$dfSUBJ		x
lMapped$QUERY		x
lMetric	x
strGroupLabelKey	x
strGroupLevel	x
strGroupSubset	x
strOutcome	x

0 replies

jonthegeek · 2024-08-12T20:24:05Z

jonthegeek
Aug 12, 2024
Maintainer Author

Coming back to this now that I know a lot more about how things work.

We've discussed wanting to support multiple input modalities for the core data, from straightforward data.frames (what we have right now) to a database or other dynamically updated source. To make that make sense, I think we should expect the inputs to come in as reactives. That way the user can decide whether their particular data is a file read via reactiveFileReader(), a database read via reactivePoll(), or a simple data.frame sent in as a basic reactive(). We can provide helpers for the core versions of these, and, eventually (but probably not THAT far from the start) automatically deal with data.frames and file paths (by wrapping them in a simple function that returns the data.frame and reactiveFileReader(), respectively).

The reason to use reactives is that way a running app can automatically update when new data comes in, without having to be stopped/restarted (including if a database changes, via reactivePoll()). If we pass them in at launch (as inputs into run_app()), they'll be global to all users, so each user won't have to wait for any processing that has to occur; they'll process the first time they're accessed, and then remain in memory as long as the app is active for any user on that server (and/or update when triggered via reactivePoll(), etc).

The special case (the whole reason that we're really thinking about all this) is participant data. We shouldn't load all of the participant data for every user, since most of it will never be used by app users (after it's processed by gsm into the summaries). I think we'll want this to be a list of reactive() (or, more likely, a list (named by siteID) of lists (named by SubjID) of reactive()s). That way each reactive only loads into RAM when it's accessed, but doesn't have to re-load after that.

Tested with this simple app:

library(shiny)
library(lobstr)

# Globally define a list of reactives.
outVals <- list()
outVals[["1"]] <- shiny::reactive(sample(1:100000, 100000, TRUE))
outVals[["2"]] <- shiny::reactive(sample(1:100000, 100000, TRUE))
outVals[["3"]] <- shiny::reactive(sample(1:100000, 100000, TRUE))
outVals[["4"]] <- shiny::reactive(sample(1:100000, 100000, TRUE))
outVals[["5"]] <- shiny::reactive(sample(1:100000, 100000, TRUE))

ui <- fluidPage(
  selectInput("id", "ID", 1:5),
  verbatimTextOutput("testOut"),
  verbatimTextOutput("sizeOut")
)

server <- function(input, output, session) {
  memUsed <- reactiveVal()
  
  observeEvent(input$id, {
    memUsed(lobstr::obj_size(outVals))
  })
  
  output$testOut <- renderPrint(mean(outVals[[input$id]]()))
  output$sizeOut <- renderPrint(memUsed())
}

shinyApp(ui, server)

The mem goes up each time you choose a new ID, but doesn't change if you return to a previous ID.

I'll need to play around with this with some real data to be SURE that this is the path we want to take, but I think it makes sense.

Again, we'll want to provide helpers to make this as straightforward as possible.

0 replies

jonthegeek · 2024-09-05T20:59:05Z

jonthegeek
Sep 5, 2024
Maintainer Author

I think it's worthwhile to be more specific about what we're using for each widget/table/etc. I'm going to use a shorthand-ish version of the gsm.template spec to work this out.

Tab: Study Overview

Module: Overview Table

meta:
  output: red_kri | amber_kri
spec:
  dfResults:
    Flag:
      required: true
      type: numeric

meta:
  output: site_overview_table
spec:
  dfResults:
    GroupID:
      required: true
      type: character
    Numerator:
      required: true
      type: numeric
    Denominator:
      required: true
      type: numeric
    Metric:
      required: true
      type: numeric
    Score:
      required: true
      type: numeric
    Flag:
      required: true
      type: numeric
    MetricID:
      required: true
      type: character
  dfMetrics:
    MetricID:
      required: true
      type: character
    Abbreviation:
      required: true
      type: character
    Metric:
      required: true
      type: character
    Numerator:
      required: false
      type: character
    Denominator:
      required: false
      type: character
    Score:
      required: false
      type: character
  dfGroups:
    GroupID:
      required: true
      type: character
    GroupLevel:
      required: true
      type: character
      value: strGroupLevel ("Site")
    Param:
      required: true
      type: character
    Value:
      required: true
      type: character

(stopping here for now to decide if this is the best way to do this...)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data #127

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Data #127

jonthegeek Jul 23, 2024 Maintainer

Replies: 6 comments

jonthegeek Jul 23, 2024 Maintainer Author

jonthegeek Jul 23, 2024 Maintainer Author

taylorrodgers Jul 24, 2024 Maintainer

Updating gsm data viz functions + data sources

Section on gsmApp function inputs

taylorrodgers Jul 30, 2024 Maintainer

jonthegeek Aug 12, 2024 Maintainer Author

jonthegeek Sep 5, 2024 Maintainer Author

Tab: Study Overview

Module: Overview Table

jonthegeek
Jul 23, 2024
Maintainer

jonthegeek
Jul 23, 2024
Maintainer Author

jonthegeek
Jul 23, 2024
Maintainer Author

taylorrodgers
Jul 24, 2024
Maintainer

Updating `gsm` data viz functions + data sources

taylorrodgers
Jul 30, 2024
Maintainer

jonthegeek
Aug 12, 2024
Maintainer Author

jonthegeek
Sep 5, 2024
Maintainer Author