Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 40: Add example test data #41

Merged
merged 17 commits into from
May 16, 2024
Merged

Issue 40: Add example test data #41

merged 17 commits into from
May 16, 2024

Conversation

athowes
Copy link
Collaborator

@athowes athowes commented May 15, 2024

Description

This PR closes #40.

I have some remaining uncertainty about:

  1. How accessible this data is (internal or external to the package)
  2. Should the data be documented?
  3. Does the set.seed work as intended? What about that simulate_gillespie() has a set.seed argument?

Checklist

  • My PR is based on a package issue and I have explicitly linked it.
  • I have included the target issue or issues in the PR title in the for Issue(s) issue-numbers: PR title
  • I have read the contribution guidelines.
  • I have tested my changes locally.
  • I have added or updated unit tests where necessary.
  • I have updated the documentation if required.
  • My code follows the established coding standards.
  • I have added a news item linked to this PR.
  • I have reviewed CI checks for this PR and addressed them as far as I am able.

Copy link
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine and a reasonable thing to do but note that CRAN can get funny about including data just for examples. As flagged in the issue we might want to consider the alternative approach epinowcast users or provide some pushback as to why we should take this approach

@athowes
Copy link
Collaborator Author

athowes commented May 15, 2024

I've made the test data internal.

As for the seed issue, I've rerun it two times and the results are the same, so seems fine.

@athowes
Copy link
Collaborator Author

athowes commented May 15, 2024

I'm pretty indifferent about it being in data/ and accessible or this internal one. I don't know if there are other options.

@athowes
Copy link
Collaborator Author

athowes commented May 15, 2024

Have reverted back to data being a part of the package, and added documentation. Ready for review @seabbs.

@athowes
Copy link
Collaborator Author

athowes commented May 16, 2024

From https://r-pkgs.org/data.html:

If you want to store R objects and make them available to the user, put them in data/. This is the best place to put example datasets. All the concrete examples above for data in a package and data as a package use this mechanism. See section Section 7.1.

Not perfect fit but arguably the test data could be helpful to a user.

If you want to store R objects for your own use as a developer, put them in R/sysdata.rda. This is the best place to put internal data that your functions need. See section Section 7.2.

The test data are for own use as a developer. It's not "internal data that functions need" though.

If you want to store data in some raw, non-R-specific form and make it available to the user, put it in inst/extdata/. For example, readr and readxl each use this mechanism to provide a collection of delimited files and Excel workbooks, respectively. See section Section 7.3.

No.

If you want to store dynamic data that reflects the internal state of your package within a single R session, use an environment. This technique is not as common or well-known as those above, but can be very useful in specific situations. See section Section 7.4.

No.

If you want to store data persistently across R sessions, such as configuration or user-specific data, use one of the officially sanctioned locations. See section Section 7.5.

No.

@seabbs
Copy link
Contributor

seabbs commented May 16, 2024

@athowes and I had a f2f chat and concluded we should use simulated data in the tests via test/testthat/setup.R and include the PNAS data for examples / real-world context (with lots of documentation to point back at the original source).

@athowes
Copy link
Collaborator Author

athowes commented May 16, 2024

  • Added simulated data in test/testthat/setup.R
  • Added back ebola data. Got .xlsx direct download from paper in data-raw. Used data-raw/process_raw_data.R to lightly process and save to .rda. Documented data in R/data.R

@athowes athowes requested a review from seabbs May 16, 2024 10:14
@seabbs seabbs enabled auto-merge May 16, 2024 10:22
Copy link
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good. A few minor comments

@athowes
Copy link
Collaborator Author

athowes commented May 16, 2024

Have fixed these comments I think.

@athowes athowes requested a review from seabbs May 16, 2024 10:39
seabbs
seabbs previously approved these changes May 16, 2024
Copy link
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good to go now except the minor linting issues

@seabbs seabbs self-requested a review May 16, 2024 11:23
seabbs
seabbs previously approved these changes May 16, 2024
seabbs
seabbs previously approved these changes May 16, 2024
@athowes
Copy link
Collaborator Author

athowes commented May 16, 2024

Failing lint due to "Variable and function names should not be longer than 30 characters."

sierra_leone_ebola_outbreak_data change to sierra_leone_ebola_data perhaps.

@seabbs seabbs added this pull request to the merge queue May 16, 2024
Merged via the queue into main with commit d140527 May 16, 2024
3 checks passed
@seabbs seabbs deleted the add-test-data branch May 16, 2024 12:50
seabbs pushed a commit that referenced this pull request Jan 10, 2025
* Add example test data generation in data-raw and as .rda

* Make the test data internal

* Delete PNAS data (assume copied here from another package)

* Revert back to non-internal data

* Add data documentation

* Move simulated data generation from data-raw to tests/testthat

* Add PNAS ebola data in raw xlsx format with script for saving it to rda in data

* Remove simulated data from data/

* Clean the ebola column names

* Document ebola data

* Add readxl and janitor to Suggests

* Add newline to end of file

* Add ask to cite and spacing change

* Change name to "ebola_outbreak_sierra_leone" and document roxygen2

* Fix linter issues

* Rename to sierra_leone_ebola_outbreak_data

* Use shorter ebola data name, and change code style

Former-commit-id: 402162a536b79273485492ed4af4ed3c5b760ed2 [formerly 3528376db71b7640f2f7976b5cc06855c792d1d9]
Former-commit-id: 57355982266f61ca1f7f81ed8b3a6f3c762a0477
seabbs pushed a commit that referenced this pull request Jan 21, 2025
* Add example test data generation in data-raw and as .rda

* Make the test data internal

* Delete PNAS data (assume copied here from another package)

* Revert back to non-internal data

* Add data documentation

* Move simulated data generation from data-raw to tests/testthat

* Add PNAS ebola data in raw xlsx format with script for saving it to rda in data

* Remove simulated data from data/

* Clean the ebola column names

* Document ebola data

* Add readxl and janitor to Suggests

* Add newline to end of file

* Add ask to cite and spacing change

* Change name to "ebola_outbreak_sierra_leone" and document roxygen2

* Fix linter issues

* Rename to sierra_leone_ebola_outbreak_data

* Use shorter ebola data name, and change code style

Former-commit-id: d140527
Former-commit-id: 4ec821ababa950c635bdc96ad43a3dc1def7f70a
seabbs pushed a commit that referenced this pull request Jan 21, 2025
* Add example test data generation in data-raw and as .rda

* Make the test data internal

* Delete PNAS data (assume copied here from another package)

* Revert back to non-internal data

* Add data documentation

* Move simulated data generation from data-raw to tests/testthat

* Add PNAS ebola data in raw xlsx format with script for saving it to rda in data

* Remove simulated data from data/

* Clean the ebola column names

* Document ebola data

* Add readxl and janitor to Suggests

* Add newline to end of file

* Add ask to cite and spacing change

* Change name to "ebola_outbreak_sierra_leone" and document roxygen2

* Fix linter issues

* Rename to sierra_leone_ebola_outbreak_data

* Use shorter ebola data name, and change code style

Former-commit-id: d140527
Former-commit-id: 4ec821ababa950c635bdc96ad43a3dc1def7f70a
seabbs pushed a commit that referenced this pull request Jan 21, 2025
* Add example test data generation in data-raw and as .rda

* Make the test data internal

* Delete PNAS data (assume copied here from another package)

* Revert back to non-internal data

* Add data documentation

* Move simulated data generation from data-raw to tests/testthat

* Add PNAS ebola data in raw xlsx format with script for saving it to rda in data

* Remove simulated data from data/

* Clean the ebola column names

* Document ebola data

* Add readxl and janitor to Suggests

* Add newline to end of file

* Add ask to cite and spacing change

* Change name to "ebola_outbreak_sierra_leone" and document roxygen2

* Fix linter issues

* Rename to sierra_leone_ebola_outbreak_data

* Use shorter ebola data name, and change code style

Former-commit-id: d140527
Former-commit-id: 4ec821ababa950c635bdc96ad43a3dc1def7f70a
seabbs pushed a commit that referenced this pull request Jan 21, 2025
* Add example test data generation in data-raw and as .rda

* Make the test data internal

* Delete PNAS data (assume copied here from another package)

* Revert back to non-internal data

* Add data documentation

* Move simulated data generation from data-raw to tests/testthat

* Add PNAS ebola data in raw xlsx format with script for saving it to rda in data

* Remove simulated data from data/

* Clean the ebola column names

* Document ebola data

* Add readxl and janitor to Suggests

* Add newline to end of file

* Add ask to cite and spacing change

* Change name to "ebola_outbreak_sierra_leone" and document roxygen2

* Fix linter issues

* Rename to sierra_leone_ebola_outbreak_data

* Use shorter ebola data name, and change code style

Former-commit-id: 402162a536b79273485492ed4af4ed3c5b760ed2 [formerly 3528376db71b7640f2f7976b5cc06855c792d1d9]
Former-commit-id: 57355982266f61ca1f7f81ed8b3a6f3c762a0477
seabbs pushed a commit that referenced this pull request Jan 21, 2025
* Add example test data generation in data-raw and as .rda

* Make the test data internal

* Delete PNAS data (assume copied here from another package)

* Revert back to non-internal data

* Add data documentation

* Move simulated data generation from data-raw to tests/testthat

* Add PNAS ebola data in raw xlsx format with script for saving it to rda in data

* Remove simulated data from data/

* Clean the ebola column names

* Document ebola data

* Add readxl and janitor to Suggests

* Add newline to end of file

* Add ask to cite and spacing change

* Change name to "ebola_outbreak_sierra_leone" and document roxygen2

* Fix linter issues

* Rename to sierra_leone_ebola_outbreak_data

* Use shorter ebola data name, and change code style

Former-commit-id: 402162a536b79273485492ed4af4ed3c5b760ed2 [formerly 3528376db71b7640f2f7976b5cc06855c792d1d9]
Former-commit-id: 57355982266f61ca1f7f81ed8b3a6f3c762a0477
Former-commit-id: 65806aa
seabbs pushed a commit that referenced this pull request Jan 21, 2025
* Add example test data generation in data-raw and as .rda

* Make the test data internal

* Delete PNAS data (assume copied here from another package)

* Revert back to non-internal data

* Add data documentation

* Move simulated data generation from data-raw to tests/testthat

* Add PNAS ebola data in raw xlsx format with script for saving it to rda in data

* Remove simulated data from data/

* Clean the ebola column names

* Document ebola data

* Add readxl and janitor to Suggests

* Add newline to end of file

* Add ask to cite and spacing change

* Change name to "ebola_outbreak_sierra_leone" and document roxygen2

* Fix linter issues

* Rename to sierra_leone_ebola_outbreak_data

* Use shorter ebola data name, and change code style

Former-commit-id: d140527
Former-commit-id: 4ec821ababa950c635bdc96ad43a3dc1def7f70a
Former-commit-id: 193fd3c60e41e5a4c319d604b4cf71f2cb16a25e [formerly e8b98fe]
Former-commit-id: ae71e652bc5e2f008ccf024c951af8e5d91e98a9
seabbs pushed a commit that referenced this pull request Jan 21, 2025
* Add example test data generation in data-raw and as .rda

* Make the test data internal

* Delete PNAS data (assume copied here from another package)

* Revert back to non-internal data

* Add data documentation

* Move simulated data generation from data-raw to tests/testthat

* Add PNAS ebola data in raw xlsx format with script for saving it to rda in data

* Remove simulated data from data/

* Clean the ebola column names

* Document ebola data

* Add readxl and janitor to Suggests

* Add newline to end of file

* Add ask to cite and spacing change

* Change name to "ebola_outbreak_sierra_leone" and document roxygen2

* Fix linter issues

* Rename to sierra_leone_ebola_outbreak_data

* Use shorter ebola data name, and change code style

Former-commit-id: d140527
Former-commit-id: 4ec821ababa950c635bdc96ad43a3dc1def7f70a
Former-commit-id: 193fd3c60e41e5a4c319d604b4cf71f2cb16a25e [formerly e8b98fe]
Former-commit-id: ae71e652bc5e2f008ccf024c951af8e5d91e98a9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create data for use in tests
2 participants