Skip to content

Conversation

@djnavarro
Copy link
Contributor

The intended scope of this PR (once completed) is to allow users to specify placebo groups when building ER models, make decisions about whether placebo group cases should be included in the ER model, and support plots that are able to show (or remove) the placebo group responses irrespective of whether the model was built from those data.

Currently early draft stage, additional detail in comments

closes #9

@djnavarro
Copy link
Contributor Author

djnavarro commented Jul 14, 2025

Hi @yoshidk6 - this is a very early sketch of what the PR might look like for #9. The idea here is that if we want to support scenarios like "user wants to plot the placebo group, but does not want those data being part of the model" then we need to support two new arguments at the modelling level:

  • A var_placebo argument that would specify the name of an indicator variable that specifies whether each subject belongs to the placebo group
  • An exclude_placebo argument that indicates whether the placebo data should be excluded when fitting the ER model

When exclude_placebo = TRUE, the only thing that happens is the corresponding rows are not passed to the Stan model; the full data set including the placebo data are retained within the ermod object, so that those rows are still accessible if the ermod object is passed to a plotting method that requires them.

As it stands, all I've done is add these two arguments for the ermod_bin case. I haven't added anything for other model classes, and I haven't added any new flexibility for the plot methods. It's extremely minimal!

However, it's enough that you can at least see how the approach would play out. The placebo_example.R script has a minimal example showing what happens when the data set contains a placebo group, under two scenarios: one where the user includes it in the model, one where it is ignored by the model.

Obviously this is nowhere near ready to merge, but I wanted to run it by you at this stage to see if this general approach makes sense to you? It would be a bit of work to apply this approach consistently across the various model types, but conceptually it seems to make sense to me

EDIT: also, agreed with your comments in the issues thread. As it currently stands it will not work with log-transformed exposures, and no attempt has been made to bin the placebo group separately from other cases; that would have to be handled in the plot methods. Happy to take a stab at implementing all this, if we are agreed on the general approach :)

@yoshidk6
Copy link
Collaborator

yoshidk6 commented Jul 14, 2025

Hi @djnavarro

Thanks so much for putting this initial sketch together and starting the conversation! I very much appreciate your proactive check-in on the design choices.
The general approach of adding arguments to separate model fitting from plotting data absolutely makes sense and directly addresses the core need.

Building on your idea, I was wondering what you'd think about consolidating the placebo-related logic into a single, more structured argument. This could make the functionality even more robust and extensible for future use cases.

The idea would be to introduce a single list argument, perhaps named option_placebo_handling.

ermod_bin(
  ...,
  option_placebo_handling = list(...)
)

This list would contain the options for defining and handling the placebo group:

  • approach: A string specifying how the placebo group is identified. It could have options like:
    • "zero_exposure_as_placebo" (Default): Automatically treats rows where the exposure variable is 0 as placebo. This handles a very common dose-response scenario without needing an extra column.
    • "var_placebo": Uses a dedicated indicator variable to identify the placebo group, as in your original proposal.
    • "none": No specification of placebo (i.e. no data exclusion)
  • var_placebo: A string for the name of the placebo indicator variable. This would only be used when approach = "var_placebo". The column need to be boolean or 0/1.
  • include_placebo: A boolean (TRUE/FALSE) to control whether the identified placebo group is included in the model fitting. We could set the default to FALSE, aligning with the primary goal of excluding it from the model.

Why this approach could be beneficial:

  1. It forces a clear choice on how the placebo group is defined ("zero_exposure_as_placebo" vs. "var_placebo"), which makes the user's intent unambiguous.
  2. It makes it easy for covering the likely most common case (exposure = 0 equals placebo)
  3. If we ever conceive of a third way to handle placebos, we can add it to the approach argument without adding yet another parameter to the ermod_bin function.

Example Usage:

Scenario 1: Default behavior (placebo is the zero-dose group, excluded from model)
The user provides nothing extra; the defaults handle it.

fit <- ermod_bin(data = df, var_exposure = "DOSE", var_response = "RESPONSE")

Scenario 2: Using an indicator variable (excluded from model)

fit <- ermod_bin(
  data = df,
  var_exposure = "EXPOSURE",
  var_response = "RESPONSE",
  option_placebo_handling = list(
    approach = "var_placebo",
    var_placebo = "IS_PBO_GROUP"
  )
)

Scenario 3: Including the placebo group (defined by zero dose) in the model

fit <- ermod_bin(
  data = df,
  var_exposure = "DOSE",
  var_response = "RESPONSE",
  option_placebo_handling = list(include_placebo = TRUE)
)

This design also reinforces the principle that the ermod object becomes the "single source of truth." The plot() function would then read from this object to display the placebo data without needing any of these definitions again.

What are your thoughts on this alternative structure?

Other consideratio

  • We can consider showing warning if the default is used without explicit input (not sure how easy it is to "detect" if the default was used though... maybe that's too much to ask) AND exposure of 0 are included?
  • Other scenario to consider showing warning is when the exposure variable include negative value (highly likely that the exposure was log-transformed)

@djnavarro
Copy link
Contributor Author

Oh that's very nice. I like the list argument approach a lot: it nicely addresses the worry I'd had about placebo handling potentially leading to a proliferation of arguments when specifying the model. I'll adapt the PR to use that approach.

@yoshidk6
Copy link
Collaborator

yoshidk6 commented Jul 14, 2025

Glad you liked it :) It's coming from the approach I took in https://genentech.github.io/BayesERtools/reference/plot_er.html .
By the way I used options_ rather than option_ there, it's probably better to stick to options_

@djnavarro
Copy link
Contributor Author

Hi @yoshidk6,

My sincere apologies for taking so long on this -- I've been busier than expected, and this turned out to be somewhat trickier than I thought it would. Nevertheless, I've updated PR so that all the dev_er_mod_* function take a options_placebo_handling list argument with the default behaviour we discussed upthread. There's still work to do (see limitations section below), but at last I've gotten it to the point that you might want to review in its current state!

Best
Danielle

Key features

  • The complete data set is always stored internally within ermod$data, even when the placebo data are not used in the models or plot, to ensure that the ER model remains the single source of truth
  • All model classes store the placebo handling options internally: for the ersim class it's stored as an attribute in the same way that the original data is stored as an attribute
  • As much as possible, whenever asks for the placebo data to be ignored, the "dropping" occurs as late as possible: e.g., for model fitting, it happens only at the moment the data is passed to rstanarm or rstanemax; for plotting it happens only at the moment that the ggplot object is built.

The reason for that last feature, rather than dropping/keeping the placebo data at the moment the ER model object is constructed, we leave open the possibility that later on we can allow the user to specify one set of options at the model fitting stage (e.g., ignore placebo when fitting) but override these options at the plotting stage (e.g., display placebo data even though those didn't contribute to the model fit).

Actually, that last part would probably be easy to do given the way everything else is structured: I think all that's needed is for plot_er() itself to take an options_placebo_handling argument?

Additional features

  • I've modified the simulated d_sim_emax data set in two ways, to make it easier to use it for testing: it has a placebo group, and it has an additional exposure metric added. This would necessitate a few changes to the book, but those would be relatively small.
  • The signature of the extract_data() generic function has changed: it now takes ... to pass arguments down to the methods. The reason for this is the extract_data.ermod(), extract_data.ersim() etc now have a method argument to specify whether the data set to be returned should be the "raw" data (includes placebo data always), or the "processed" data (drops placebo if the user ignores it). I hadn't originally intended to make that part of the public API, but it turns out that the difference between extract_data(x, method = "raw") and extract_data(x, method = "processed") is useful for a lot of purposes internally, and might also be handy for the user.
  • Expanding that last point, for ermod objects specifically, you can also use method = "internal" to pull out the internal storage from within the stanemax or stanarm objects. I think this should be buried and not part of the public API. Currently I'm using it in the unit tests to make sure that when the user asks for the placebo data to be dropped, it doesn't get passed to stan, but that doesn't need to be user facing.

Limitations

  • As already mentioned, right now the user has to apply the same placebo handling option to the modelling as to the plot. This could be made more flexible
  • The unit testing is fairly solid in terms of checking what data sets get passed to stan; but I haven't added any tests of the plot_er() behaviour yet. The closest I have to that right now is the placebo_example.R script in the "other" folder. You can use that to illustrate the scope of what the PR does right now, but it would be nice to incorporate some of that into the unit tests and/or documentation
  • Almost all the work up to this point has focused on managing the data: i.e., ensuring that the correct data get passed to stan, that the correct data get passed to the plots, and ensuring that the full data set is always stored even when the model and plot don't use the placebo data. The plot_er() function itself hasn't really gotten as much attention yet. For instance for binary outcomes, if there's a placebo group to be plotted I think the default behaviour should be that the placebo group is always a distinct bin, with quantiles computed only from treatment-groups. That hasn't been implemented yet.

@djnavarro djnavarro marked this pull request as ready for review October 3, 2025 12:29
@yoshidk6
Copy link
Collaborator

yoshidk6 commented Oct 9, 2025

Thank you so much for the note and truly appreciate all of the efforts you have put in so far! I must admit I underestimated the extent of changes we needed to make. I want to take a bit more time exploring the code & putting together thoughts on this, but wanted to let you know I haven't forgotten about it. Thanks!

@djnavarro
Copy link
Contributor Author

no rush. it's been nice to spend time thinking about this one, and I completely agree that it is a surprisingly complicated PR - I wasn't anticipating the need to do a deep dive into the data structures in the package, tbh! in retrospect it does make sense though, because placebo data has a strange relationship to ER analyses. happy to make whatever changes you think might be needed, but as i say... no rush :)

@yoshidk6
Copy link
Collaborator

yoshidk6 commented Oct 16, 2025

Hi Danielle, thank you very much again for the pull request update!

I've been thinking a lot about it for a week and now I'm thinking it might be actually more confusing to users than helpful to have the placebo handling integrated into the model development functionalities.
The users need to understand when/whether the placebo group data were used in individual steps (model development, simulation, visualization), and without a deeper understanding of the underlying architecture, it might not be so apparent.

Instead of having these being a part of the BayesERtools, I'm wondering if it would be the best to showcase how to handle the placebo group data in simulation and visualization in a dedicated page on BayesERbook.

I know this is what I proposed to do and you spend a lot of time thinking about and implementing the changes into the codebase, so I feel really sorry about not getting to this thoughts earlier.

I'd also love to hear your opinions about it, please let me know your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add placebo group in E-R plots

2 participants