Make simulate_data() for internal use and apply in the testthat files #92

gthopkins · 2024-10-20T16:47:47Z

To folks wishing to recreate the figures in the appendix using the old simulate_data() code, take a visit to the radEmu supplementary repository. This is where we now keep the version that was used to generate the figures in the corresponding manuscript.

svteichman · 2024-10-20T17:30:30Z

Forgive me if you've already figured this out @gthopkins, but now that you made the simulate_data() function not exported, you'll have to update lines 90-91 of the clustered data vignette to call radEmu:::simulate_data() instead of just simulate_data(), and may want to choose an argument for mean_z to replicate the old version of the call which includes the old argument on the exponential scale, mean_count_before_ZI.

gthopkins · 2024-10-20T19:41:12Z

Thank you for the help @svteichman! I realized quickly that I need to call the function as radEmu:::simulate_data() in the vignette and testing section, but I figured I would finish updating the testing before correcting this issue in the PR. I will consider if I need to add more functionality besides the mean_z argument as I go through updating the vignette, thank you for the great tip!

gthopkins · 2024-10-28T00:09:56Z

@adw96 I have now changed the vast majority of the testthat code to instead use the simulate_data() function. With some minor adjustments to accomodate hard-coded thresholds, all but two tests still pass. However, there are two tests I truly do not understand, which I have temporarily commented out. I did not have enough documentation in the test-micro_wald.R to figure out what two tests are meant to confirm.

The two tests in question are:

expect_equal(wald_result$coefficients$pval, 0.61, tolerance = 0.02)

and, identically, in a seperate simulation

expect_equal(wald_result$coefficients$pval, 0.11, tolerance = 0.03)

Do you have any sense why we would want to confirm the p-values match these two arbitrary numbers? If not, could you connect me with someone who can clarify? Any p-value will be highly sensitive to the means of data generation, so I am not sure of the best way to proceed.

This addresses feature request/issue statdivlab#88

gthopkins · 2024-11-04T04:04:56Z

@adw96 I have added a "partially_verbose" argument to emuFit, which allows users to track progress of the algorithm (mostly score tests) without the annotations given by emuFit_micro_penalized(), emuFit_micro(), micro_wald(), or score_test(). This means we omit the lengthy technical messages in likeness of:

Max absolute difference in B since last augmented Lagrangian outer step: ___
Estimate deviates from feasibility by ___
Parameter u set to ___
Parameter rho set to ___
Iteration limit reached; exiting optimization.
Computing data augmentations for Firth penalty. For larger models, this may take some time.
Scaled norm of derivative ___

but we will still get insightful updates like:

Centering rows of B with pseudo-Huber smoothed median with smoothing parameter 0.1.
Performing Wald tests and constructing CIs.
[1] "Running score test 1 of 46 (row of B k = 2; column of B j = 1)."
...
[1] "Running score test 46 of 46 (row of B k = 2; column of B j = 1)."

This amounts to quite a simple change. Please note: this is now a "compound" pull request, in that it contains all of my commits pertaining to 1) changing the testing files and 2) the partially_verbose argument.

…g process, but we may remove p-value check soon.

svteichman · 2024-11-27T19:36:44Z

@gthopkins overall this looks great! I see there is a merge conflict with "test-emuFit.R", could you please pull the recent updates to the package and resolve this merge conflict?

I think that everything from the simulate_data() update looks great.

For the partially verbose option, could you make partially verbose an option to add to the verbose argument instead of adding a new argument? I'm imagining that a user could input TRUE FALSE or "partial" (or something along those lines) for the verbose argument. Then you could add a check that verbose evaluates to one of these (and if not throw an error), and then internally if the user inputs verbose = "partial" then you could make a new variable partially_verbose, set verbose = TRUE, and then use all of the code that you currently have. I just recommend this in order to reduce the number of arguments (emuFit() already has so many) and hopefully streamline the process a tiny bit for the user.

gthopkins · 2024-12-03T00:15:45Z

@svteichman yes, I would be happy to make that change! Personally, I find very little value in the current "verbose = TRUE" argument. For example, the technical messages in likeness of

Max absolute difference in B since last augmented Lagrangian outer step: ___
Estimate deviates from feasibility by ___
Parameter u set to ___
Parameter rho set to ___
Iteration limit reached; exiting optimization.
Computing data augmentations for Firth penalty. For larger models, this may take some time.
Scaled norm of derivative ___

are out of context and do not clarify much for me. I propose the following three settings:

TRUE will correspond to big-picture messages, what we have been calling partially verbose
FALSE will correspond to no messages at all
"development" will display the all technical messages above in addition big-picture messages

What are your thoughts on this? I just think people are more likely to set "verbose = TRUE" as the immediate alternative to FALSE, but I do not think anyone would actually want the development messages.

svteichman · 2024-12-03T00:19:13Z

I think this is a great idea! Thanks for thinking through this. I use the verbose = T messages but I agree that a user will rarely need them.

gthopkins added 3 commits October 20, 2024 09:46

Merge branch 'main' of https://github.com/gthopkins/radEmu

a6c86e4

Merge branch 'statdivlab:main' into main

bedad69

gthopkins added 7 commits October 27, 2024 14:43

Change means of simulated data in testthat() functions

5f0fea5

Create required testthat() objects as needed

9d71dd7

Edit out non-descriptive, hardcoded test

f2b52a7

Further corrections to testthat() syntax

8007ac0

Adjust MPLE implementation to differ notably from MLE

9e91259

Merge branch 'statdivlab:main' into main

c3be909

Add linguistic note discussed in PR statdivlab#95

595e1cb

gthopkins mentioned this pull request Oct 28, 2024

Add tests to test_emuFit_micro comparing closed-form MPLEs to MPLE obtained via numerical optimization #95

Merged

gthopkins changed the title ~~Adjust simulate_data() to be for internal use only~~ Make simulate_data() for internal use and apply in the testthat files Oct 28, 2024

gthopkins added 2 commits November 3, 2024 19:42

Add partially_verbose option

8acdfda

Add documentation to include partially_verbose argument in emuFit

ff9164e

This addresses feature request/issue statdivlab#88

For now, hard-code old data. Can alternatively use old data generatin…

cde1f1c

…g process, but we may remove p-value check soon.

adw96 requested a review from svteichman November 27, 2024 16:12

Merge branch 'main' into main

64b06a5

gthopkins added 2 commits December 3, 2024 10:56

Update verbose argument to display only relevant messages

b1b78c2

Merge branch 'statdivlab:main' into main

e77fa27

svteichman merged commit 846f792 into statdivlab:main Dec 4, 2024
4 checks passed

This was referenced Dec 18, 2024

simulate_data has no coverage #35

Closed

[FEATURE REQUEST] add partially verbose option #73

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make simulate_data() for internal use and apply in the testthat files #92

Make simulate_data() for internal use and apply in the testthat files #92

gthopkins commented Oct 20, 2024

svteichman commented Oct 20, 2024

gthopkins commented Oct 20, 2024

gthopkins commented Oct 28, 2024

gthopkins commented Nov 4, 2024

svteichman commented Nov 27, 2024 •

edited

Loading

gthopkins commented Dec 3, 2024

svteichman commented Dec 3, 2024

Make simulate_data() for internal use and apply in the testthat files #92

Make simulate_data() for internal use and apply in the testthat files #92

Conversation

gthopkins commented Oct 20, 2024

svteichman commented Oct 20, 2024

gthopkins commented Oct 20, 2024

gthopkins commented Oct 28, 2024

gthopkins commented Nov 4, 2024

svteichman commented Nov 27, 2024 • edited Loading

gthopkins commented Dec 3, 2024

svteichman commented Dec 3, 2024

svteichman commented Nov 27, 2024 •

edited

Loading