support for placebo groups #14

djnavarro · 2025-07-14T01:45:08Z

The intended scope of this PR (once completed) is to allow users to specify placebo groups when building ER models, make decisions about whether placebo group cases should be included in the ER model, and support plots that are able to show (or remove) the placebo group responses irrespective of whether the model was built from those data.

Currently early draft stage, additional detail in comments

closes #9

djnavarro · 2025-07-14T02:19:24Z

Hi @yoshidk6 - this is a very early sketch of what the PR might look like for #9. The idea here is that if we want to support scenarios like "user wants to plot the placebo group, but does not want those data being part of the model" then we need to support two new arguments at the modelling level:

A var_placebo argument that would specify the name of an indicator variable that specifies whether each subject belongs to the placebo group
An exclude_placebo argument that indicates whether the placebo data should be excluded when fitting the ER model

When exclude_placebo = TRUE, the only thing that happens is the corresponding rows are not passed to the Stan model; the full data set including the placebo data are retained within the ermod object, so that those rows are still accessible if the ermod object is passed to a plotting method that requires them.

As it stands, all I've done is add these two arguments for the ermod_bin case. I haven't added anything for other model classes, and I haven't added any new flexibility for the plot methods. It's extremely minimal!

However, it's enough that you can at least see how the approach would play out. The placebo_example.R script has a minimal example showing what happens when the data set contains a placebo group, under two scenarios: one where the user includes it in the model, one where it is ignored by the model.

Obviously this is nowhere near ready to merge, but I wanted to run it by you at this stage to see if this general approach makes sense to you? It would be a bit of work to apply this approach consistently across the various model types, but conceptually it seems to make sense to me

EDIT: also, agreed with your comments in the issues thread. As it currently stands it will not work with log-transformed exposures, and no attempt has been made to bin the placebo group separately from other cases; that would have to be handled in the plot methods. Happy to take a stab at implementing all this, if we are agreed on the general approach :)

yoshidk6 · 2025-07-14T02:39:50Z

Hi @djnavarro

Thanks so much for putting this initial sketch together and starting the conversation! I very much appreciate your proactive check-in on the design choices.
The general approach of adding arguments to separate model fitting from plotting data absolutely makes sense and directly addresses the core need.

Building on your idea, I was wondering what you'd think about consolidating the placebo-related logic into a single, more structured argument. This could make the functionality even more robust and extensible for future use cases.

The idea would be to introduce a single list argument, perhaps named option_placebo_handling.

ermod_bin(
  ...,
  option_placebo_handling = list(...)
)

This list would contain the options for defining and handling the placebo group:

approach: A string specifying how the placebo group is identified. It could have options like:
- "zero_exposure_as_placebo" (Default): Automatically treats rows where the exposure variable is 0 as placebo. This handles a very common dose-response scenario without needing an extra column.
- "var_placebo": Uses a dedicated indicator variable to identify the placebo group, as in your original proposal.
- "none": No specification of placebo (i.e. no data exclusion)
var_placebo: A string for the name of the placebo indicator variable. This would only be used when approach = "var_placebo". The column need to be boolean or 0/1.
include_placebo: A boolean (TRUE/FALSE) to control whether the identified placebo group is included in the model fitting. We could set the default to FALSE, aligning with the primary goal of excluding it from the model.

Why this approach could be beneficial:

It forces a clear choice on how the placebo group is defined ("zero_exposure_as_placebo" vs. "var_placebo"), which makes the user's intent unambiguous.
It makes it easy for covering the likely most common case (exposure = 0 equals placebo)
If we ever conceive of a third way to handle placebos, we can add it to the approach argument without adding yet another parameter to the ermod_bin function.

Example Usage:

Scenario 1: Default behavior (placebo is the zero-dose group, excluded from model)
The user provides nothing extra; the defaults handle it.

fit <- ermod_bin(data = df, var_exposure = "DOSE", var_response = "RESPONSE")

Scenario 2: Using an indicator variable (excluded from model)

fit <- ermod_bin(
  data = df,
  var_exposure = "EXPOSURE",
  var_response = "RESPONSE",
  option_placebo_handling = list(
    approach = "var_placebo",
    var_placebo = "IS_PBO_GROUP"
  )
)

Scenario 3: Including the placebo group (defined by zero dose) in the model

fit <- ermod_bin(
  data = df,
  var_exposure = "DOSE",
  var_response = "RESPONSE",
  option_placebo_handling = list(include_placebo = TRUE)
)

This design also reinforces the principle that the ermod object becomes the "single source of truth." The plot() function would then read from this object to display the placebo data without needing any of these definitions again.

What are your thoughts on this alternative structure?

Other consideratio

We can consider showing warning if the default is used without explicit input (not sure how easy it is to "detect" if the default was used though... maybe that's too much to ask) AND exposure of 0 are included?
Other scenario to consider showing warning is when the exposure variable include negative value (highly likely that the exposure was log-transformed)

djnavarro · 2025-07-14T02:48:36Z

Oh that's very nice. I like the list argument approach a lot: it nicely addresses the worry I'd had about placebo handling potentially leading to a proliferation of arguments when specifying the model. I'll adapt the PR to use that approach.

yoshidk6 · 2025-07-14T02:51:37Z

Glad you liked it :) It's coming from the approach I took in https://genentech.github.io/BayesERtools/reference/plot_er.html .
By the way I used options_ rather than option_ there, it's probably better to stick to options_

…r linear models

…s placebo options

djnavarro · 2025-10-03T12:29:22Z

Hi @yoshidk6,

My sincere apologies for taking so long on this -- I've been busier than expected, and this turned out to be somewhat trickier than I thought it would. Nevertheless, I've updated PR so that all the dev_er_mod_* function take a options_placebo_handling list argument with the default behaviour we discussed upthread. There's still work to do (see limitations section below), but at last I've gotten it to the point that you might want to review in its current state!

Best
Danielle

Key features

The complete data set is always stored internally within ermod$data, even when the placebo data are not used in the models or plot, to ensure that the ER model remains the single source of truth
All model classes store the placebo handling options internally: for the ersim class it's stored as an attribute in the same way that the original data is stored as an attribute
As much as possible, whenever asks for the placebo data to be ignored, the "dropping" occurs as late as possible: e.g., for model fitting, it happens only at the moment the data is passed to rstanarm or rstanemax; for plotting it happens only at the moment that the ggplot object is built.

The reason for that last feature, rather than dropping/keeping the placebo data at the moment the ER model object is constructed, we leave open the possibility that later on we can allow the user to specify one set of options at the model fitting stage (e.g., ignore placebo when fitting) but override these options at the plotting stage (e.g., display placebo data even though those didn't contribute to the model fit).

Actually, that last part would probably be easy to do given the way everything else is structured: I think all that's needed is for plot_er() itself to take an options_placebo_handling argument?

Additional features

I've modified the simulated d_sim_emax data set in two ways, to make it easier to use it for testing: it has a placebo group, and it has an additional exposure metric added. This would necessitate a few changes to the book, but those would be relatively small.
The signature of the extract_data() generic function has changed: it now takes ... to pass arguments down to the methods. The reason for this is the extract_data.ermod(), extract_data.ersim() etc now have a method argument to specify whether the data set to be returned should be the "raw" data (includes placebo data always), or the "processed" data (drops placebo if the user ignores it). I hadn't originally intended to make that part of the public API, but it turns out that the difference between extract_data(x, method = "raw") and extract_data(x, method = "processed") is useful for a lot of purposes internally, and might also be handy for the user.
Expanding that last point, for ermod objects specifically, you can also use method = "internal" to pull out the internal storage from within the stanemax or stanarm objects. I think this should be buried and not part of the public API. Currently I'm using it in the unit tests to make sure that when the user asks for the placebo data to be dropped, it doesn't get passed to stan, but that doesn't need to be user facing.

Limitations

As already mentioned, right now the user has to apply the same placebo handling option to the modelling as to the plot. This could be made more flexible
The unit testing is fairly solid in terms of checking what data sets get passed to stan; but I haven't added any tests of the plot_er() behaviour yet. The closest I have to that right now is the placebo_example.R script in the "other" folder. You can use that to illustrate the scope of what the PR does right now, but it would be nice to incorporate some of that into the unit tests and/or documentation
Almost all the work up to this point has focused on managing the data: i.e., ensuring that the correct data get passed to stan, that the correct data get passed to the plots, and ensuring that the full data set is always stored even when the model and plot don't use the placebo data. The plot_er() function itself hasn't really gotten as much attention yet. For instance for binary outcomes, if there's a placebo group to be plotted I think the default behaviour should be that the placebo group is always a distinct bin, with quantiles computed only from treatment-groups. That hasn't been implemented yet.

yoshidk6 · 2025-10-09T01:10:35Z

Thank you so much for the note and truly appreciate all of the efforts you have put in so far! I must admit I underestimated the extent of changes we needed to make. I want to take a bit more time exploring the code & putting together thoughts on this, but wanted to let you know I haven't forgotten about it. Thanks!

djnavarro · 2025-10-13T03:11:21Z

no rush. it's been nice to spend time thinking about this one, and I completely agree that it is a surprisingly complicated PR - I wasn't anticipating the need to do a deep dive into the data structures in the package, tbh! in retrospect it does make sense though, because placebo data has a strange relationship to ER analyses. happy to make whatever changes you think might be needed, but as i say... no rush :)

yoshidk6 · 2025-10-16T20:23:38Z

Hi Danielle, thank you very much again for the pull request update!

I've been thinking a lot about it for a week and now I'm thinking it might be actually more confusing to users than helpful to have the placebo handling integrated into the model development functionalities.
The users need to understand when/whether the placebo group data were used in individual steps (model development, simulation, visualization), and without a deeper understanding of the underlying architecture, it might not be so apparent.

Instead of having these being a part of the BayesERtools, I'm wondering if it would be the best to showcase how to handle the placebo group data in simulation and visualization in a dedicated page on BayesERbook.

I know this is what I proposed to do and you spend a lot of time thinking about and implementing the changes into the codebase, so I feel really sorry about not getting to this thoughts earlier.

I'd also love to hear your opinions about it, please let me know your thoughts.

djnavarro added 3 commits July 14, 2025 11:38

adds dummy data set with a placebo group

4ece817

adds placebo related args for ermod_bin

195ded1

adds illustrative placebo script

7960af6

djnavarro added 23 commits July 27, 2025 13:50

switches to options_placebo_handling arg

3531749

expands documentation for options_placebo_handling

3b0e0e9

adds placebo handling arguments to ermod classes and dev functions fo…

33a95de

…r linear models

roxygenise

69a4d80

emax mod class and dev function take placebo handling argument

56da0ee

update print_coveff test

043dc56

removes incorrect check on placebo option

c6e5d6a

temporary test skip

d40891d

adds placebo group to the emax data

37c3d91

adds test for internal data storage across placebo settings

8d16309

use multiple metrics in emax data set

c4d05fb

use renamed exposure_1 in placebo test

163c12f

placebo handling data tests now cover metric selection functions

2e45d0f

placebo handling data tests now cover covariate selection functions

c8dd5b2

use placebo options in examples/test

680bb86

attempt covsel bug fix

98b46e6

apply same fix to emax

fd7ada9

test on ci

4fa0c3f

typo fix

59c708b

use placebo handling options in refmodel

53dcaa2

roxygenise

21ccf4c

don't use partial for placebo handling

62ecb95

don't use partial for placebo handling in dev_ermod_bin_cov_sel

29210b3

djnavarro added 25 commits September 17, 2025 21:37

linting fixes

95557f3

additional linting fixes

7abc55b

don't quote

4f5475e

strips back example code

f92cddc

placebo handling tests contain a no_error check

4126bb5

whitespace fixes for linter

f299333

extract_data.ermod permits inner data extraction

4da2618

tidies the internal_data tests

2e65112

updates examples

4025d15

eval_ermod tests supply placebo options explicitly

ba10894

ersim class stores placebo options

a6e387c

fixes dropped placebo options during exposure selection bug

7d919bf

extract_data has a method argument

e878f02

default exposure ranges in sim_er_curve and sim_er_curve_marg respect…

805135c

…s placebo options

linter whitespace

6202801

roxygenise

1ac3dd8

Merge branch 'Genentech:main' into placebo-plot

e5d9834

plot_er() now inherits placebo handling options

6b94151

adds to the examples

ba08288

removes unnecessary printing

5527e39

typo fix

e9630fc

supply defaults if options are NULL

7d523cf

tidies documentation

645fbba

partially reenable loo and kfold tests

31045fd

linter fixes

e1f59e4

djnavarro marked this pull request as ready for review October 3, 2025 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support for placebo groups #14

support for placebo groups #14

Uh oh!

djnavarro commented Jul 14, 2025

Uh oh!

djnavarro commented Jul 14, 2025 •

edited

Loading

Uh oh!

yoshidk6 commented Jul 14, 2025 •

edited

Loading

Uh oh!

djnavarro commented Jul 14, 2025

Uh oh!

yoshidk6 commented Jul 14, 2025 •

edited

Loading

Uh oh!

djnavarro commented Oct 3, 2025

Uh oh!

yoshidk6 commented Oct 9, 2025

Uh oh!

djnavarro commented Oct 13, 2025

Uh oh!

yoshidk6 commented Oct 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

support for placebo groups #14

Are you sure you want to change the base?

support for placebo groups #14

Uh oh!

Conversation

djnavarro commented Jul 14, 2025

Uh oh!

djnavarro commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yoshidk6 commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this approach could be beneficial:

Example Usage:

Other consideratio

Uh oh!

djnavarro commented Jul 14, 2025

Uh oh!

yoshidk6 commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

djnavarro commented Oct 3, 2025

Key features

Additional features

Limitations

Uh oh!

yoshidk6 commented Oct 9, 2025

Uh oh!

djnavarro commented Oct 13, 2025

Uh oh!

yoshidk6 commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

djnavarro commented Jul 14, 2025 •

edited

Loading

yoshidk6 commented Jul 14, 2025 •

edited

Loading

yoshidk6 commented Jul 14, 2025 •

edited

Loading

yoshidk6 commented Oct 16, 2025 •

edited

Loading