Issue 40: Add example test data #41

athowes · 2024-05-15T13:43:47Z

Description

This PR closes #40.

I have some remaining uncertainty about:

How accessible this data is (internal or external to the package)
Should the data be documented?
Does the set.seed work as intended? What about that simulate_gillespie() has a set.seed argument?

Checklist

My PR is based on a package issue and I have explicitly linked it.
I have included the target issue or issues in the PR title in the for Issue(s) issue-numbers: PR title
I have read the contribution guidelines.
I have tested my changes locally.
I have added or updated unit tests where necessary.
I have updated the documentation if required.
My code follows the established coding standards.
I have added a news item linked to this PR.
I have reviewed CI checks for this PR and addressed them as far as I am able.

seabbs

I think this is fine and a reasonable thing to do but note that CRAN can get funny about including data just for examples. As flagged in the issue we might want to consider the alternative approach epinowcast users or provide some pushback as to why we should take this approach

athowes · 2024-05-15T13:51:02Z

I've made the test data internal.

As for the seed issue, I've rerun it two times and the results are the same, so seems fine.

athowes · 2024-05-15T13:54:36Z

I'm pretty indifferent about it being in data/ and accessible or this internal one. I don't know if there are other options.

athowes · 2024-05-15T14:51:51Z

Have reverted back to data being a part of the package, and added documentation. Ready for review @seabbs.

athowes · 2024-05-16T09:29:34Z

From https://r-pkgs.org/data.html:

If you want to store R objects and make them available to the user, put them in data/. This is the best place to put example datasets. All the concrete examples above for data in a package and data as a package use this mechanism. See section Section 7.1.

Not perfect fit but arguably the test data could be helpful to a user.

If you want to store R objects for your own use as a developer, put them in R/sysdata.rda. This is the best place to put internal data that your functions need. See section Section 7.2.

The test data are for own use as a developer. It's not "internal data that functions need" though.

If you want to store data in some raw, non-R-specific form and make it available to the user, put it in inst/extdata/. For example, readr and readxl each use this mechanism to provide a collection of delimited files and Excel workbooks, respectively. See section Section 7.3.

No.

If you want to store dynamic data that reflects the internal state of your package within a single R session, use an environment. This technique is not as common or well-known as those above, but can be very useful in specific situations. See section Section 7.4.

No.

If you want to store data persistently across R sessions, such as configuration or user-specific data, use one of the officially sanctioned locations. See section Section 7.5.

No.

seabbs · 2024-05-16T09:43:22Z

@athowes and I had a f2f chat and concluded we should use simulated data in the tests via test/testthat/setup.R and include the PNAS data for examples / real-world context (with lots of documentation to point back at the original source).

…da in data

athowes · 2024-05-16T10:12:56Z

Added simulated data in test/testthat/setup.R
Added back ebola data. Got .xlsx direct download from paper in data-raw. Used data-raw/process_raw_data.R to lightly process and save to .rda. Documented data in R/data.R

R/data.R

seabbs

This all looks good. A few minor comments

data-raw/process_raw_data.R

athowes · 2024-05-16T10:38:55Z

Have fixed these comments I think.

seabbs

All good to go now except the minor linting issues

data-raw/process_raw_data.R

athowes · 2024-05-16T12:16:01Z

Failing lint due to "Variable and function names should not be longer than 30 characters."

sierra_leone_ebola_outbreak_data change to sierra_leone_ebola_data perhaps.

* Add example test data generation in data-raw and as .rda * Make the test data internal * Delete PNAS data (assume copied here from another package) * Revert back to non-internal data * Add data documentation * Move simulated data generation from data-raw to tests/testthat * Add PNAS ebola data in raw xlsx format with script for saving it to rda in data * Remove simulated data from data/ * Clean the ebola column names * Document ebola data * Add readxl and janitor to Suggests * Add newline to end of file * Add ask to cite and spacing change * Change name to "ebola_outbreak_sierra_leone" and document roxygen2 * Fix linter issues * Rename to sierra_leone_ebola_outbreak_data * Use shorter ebola data name, and change code style Former-commit-id: 402162a536b79273485492ed4af4ed3c5b760ed2 [formerly 3528376db71b7640f2f7976b5cc06855c792d1d9] Former-commit-id: 57355982266f61ca1f7f81ed8b3a6f3c762a0477

* Add example test data generation in data-raw and as .rda * Make the test data internal * Delete PNAS data (assume copied here from another package) * Revert back to non-internal data * Add data documentation * Move simulated data generation from data-raw to tests/testthat * Add PNAS ebola data in raw xlsx format with script for saving it to rda in data * Remove simulated data from data/ * Clean the ebola column names * Document ebola data * Add readxl and janitor to Suggests * Add newline to end of file * Add ask to cite and spacing change * Change name to "ebola_outbreak_sierra_leone" and document roxygen2 * Fix linter issues * Rename to sierra_leone_ebola_outbreak_data * Use shorter ebola data name, and change code style Former-commit-id: d140527 Former-commit-id: 4ec821ababa950c635bdc96ad43a3dc1def7f70a

* Add example test data generation in data-raw and as .rda * Make the test data internal * Delete PNAS data (assume copied here from another package) * Revert back to non-internal data * Add data documentation * Move simulated data generation from data-raw to tests/testthat * Add PNAS ebola data in raw xlsx format with script for saving it to rda in data * Remove simulated data from data/ * Clean the ebola column names * Document ebola data * Add readxl and janitor to Suggests * Add newline to end of file * Add ask to cite and spacing change * Change name to "ebola_outbreak_sierra_leone" and document roxygen2 * Fix linter issues * Rename to sierra_leone_ebola_outbreak_data * Use shorter ebola data name, and change code style Former-commit-id: 402162a536b79273485492ed4af4ed3c5b760ed2 [formerly 3528376db71b7640f2f7976b5cc06855c792d1d9] Former-commit-id: 57355982266f61ca1f7f81ed8b3a6f3c762a0477

* Add example test data generation in data-raw and as .rda * Make the test data internal * Delete PNAS data (assume copied here from another package) * Revert back to non-internal data * Add data documentation * Move simulated data generation from data-raw to tests/testthat * Add PNAS ebola data in raw xlsx format with script for saving it to rda in data * Remove simulated data from data/ * Clean the ebola column names * Document ebola data * Add readxl and janitor to Suggests * Add newline to end of file * Add ask to cite and spacing change * Change name to "ebola_outbreak_sierra_leone" and document roxygen2 * Fix linter issues * Rename to sierra_leone_ebola_outbreak_data * Use shorter ebola data name, and change code style Former-commit-id: 402162a536b79273485492ed4af4ed3c5b760ed2 [formerly 3528376db71b7640f2f7976b5cc06855c792d1d9] Former-commit-id: 57355982266f61ca1f7f81ed8b3a6f3c762a0477 Former-commit-id: 65806aa

* Add example test data generation in data-raw and as .rda * Make the test data internal * Delete PNAS data (assume copied here from another package) * Revert back to non-internal data * Add data documentation * Move simulated data generation from data-raw to tests/testthat * Add PNAS ebola data in raw xlsx format with script for saving it to rda in data * Remove simulated data from data/ * Clean the ebola column names * Document ebola data * Add readxl and janitor to Suggests * Add newline to end of file * Add ask to cite and spacing change * Change name to "ebola_outbreak_sierra_leone" and document roxygen2 * Fix linter issues * Rename to sierra_leone_ebola_outbreak_data * Use shorter ebola data name, and change code style Former-commit-id: d140527 Former-commit-id: 4ec821ababa950c635bdc96ad43a3dc1def7f70a Former-commit-id: 193fd3c60e41e5a4c319d604b4cf71f2cb16a25e [formerly e8b98fe] Former-commit-id: ae71e652bc5e2f008ccf024c951af8e5d91e98a9

athowes added 2 commits May 15, 2024 14:41

Add example test data generation in data-raw and as .rda

9ff7788

Make the test data internal

a5ee221

seabbs reviewed May 15, 2024

View reviewed changes

Delete PNAS data (assume copied here from another package)

0ab0bef

athowes added 2 commits May 15, 2024 15:46

Revert back to non-internal data

63cddc2

Add data documentation

d95a0ab

athowes added 5 commits May 16, 2024 10:48

Move simulated data generation from data-raw to tests/testthat

21e5abc

Add PNAS ebola data in raw xlsx format with script for saving it to r…

4489fa6

…da in data

Remove simulated data from data/

a9cb916

Clean the ebola column names

2e5fd91

Document ebola data

5da824b

athowes requested a review from seabbs May 16, 2024 10:14

seabbs enabled auto-merge May 16, 2024 10:22

seabbs force-pushed the add-test-data branch from 5da824b to bcf2196 Compare May 16, 2024 10:22

seabbs reviewed May 16, 2024

View reviewed changes

R/data.R Outdated Show resolved Hide resolved

seabbs reviewed May 16, 2024

View reviewed changes

R/data.R Outdated Show resolved Hide resolved

seabbs reviewed May 16, 2024

View reviewed changes

R/data.R Show resolved Hide resolved

seabbs reviewed May 16, 2024

View reviewed changes

R/data.R Outdated Show resolved Hide resolved

seabbs requested changes May 16, 2024

View reviewed changes

data-raw/process_raw_data.R Outdated Show resolved Hide resolved

data-raw/process_raw_data.R Outdated Show resolved Hide resolved

athowes added 4 commits May 16, 2024 11:32

Add readxl and janitor to Suggests

86ea4eb

Add newline to end of file

1c86569

Add ask to cite and spacing change

27358c6

Change name to "ebola_outbreak_sierra_leone" and document roxygen2

e7410b3

athowes force-pushed the add-test-data branch from bcf2196 to e7410b3 Compare May 16, 2024 10:38

athowes requested a review from seabbs May 16, 2024 10:39

seabbs previously approved these changes May 16, 2024

View reviewed changes

Fix linter issues

ac676e1

athowes dismissed seabbs’s stale review via ac676e1 May 16, 2024 11:12

seabbs self-requested a review May 16, 2024 11:23

seabbs previously approved these changes May 16, 2024

View reviewed changes

seabbs force-pushed the add-test-data branch from ac676e1 to c18e94b Compare May 16, 2024 11:24

Rename to sierra_leone_ebola_outbreak_data

6b3ad96

athowes dismissed seabbs’s stale review via 6b3ad96 May 16, 2024 12:02

athowes force-pushed the add-test-data branch from c18e94b to 6b3ad96 Compare May 16, 2024 12:02

seabbs reviewed May 16, 2024

View reviewed changes

data-raw/process_raw_data.R Outdated Show resolved Hide resolved

seabbs previously approved these changes May 16, 2024

View reviewed changes

Use shorter ebola data name, and change code style

91e4e1b

athowes dismissed seabbs’s stale review via 91e4e1b May 16, 2024 12:25

seabbs approved these changes May 16, 2024

View reviewed changes

seabbs added this pull request to the merge queue May 16, 2024

Merged via the queue into main with commit d140527 May 16, 2024
3 checks passed

seabbs deleted the add-test-data branch May 16, 2024 12:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 40: Add example test data #41

Issue 40: Add example test data #41

athowes commented May 15, 2024 •

edited

Loading

seabbs left a comment

athowes commented May 15, 2024

athowes commented May 15, 2024

athowes commented May 15, 2024

athowes commented May 16, 2024

seabbs commented May 16, 2024

athowes commented May 16, 2024

seabbs left a comment

athowes commented May 16, 2024

seabbs left a comment

athowes commented May 16, 2024

Issue 40: Add example test data #41

Issue 40: Add example test data #41

Conversation

athowes commented May 15, 2024 • edited Loading

Description

Checklist

seabbs left a comment

Choose a reason for hiding this comment

athowes commented May 15, 2024

athowes commented May 15, 2024

athowes commented May 15, 2024

athowes commented May 16, 2024

seabbs commented May 16, 2024

athowes commented May 16, 2024

seabbs left a comment

Choose a reason for hiding this comment

athowes commented May 16, 2024

seabbs left a comment

Choose a reason for hiding this comment

athowes commented May 16, 2024

athowes commented May 15, 2024 •

edited

Loading