-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SCHEMATIC-214] Wrap pandas functions to support not including None
with the NA values argument
#1553
Conversation
None
with the NA values argumentNone
with the NA values argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 LGTM! let's wait for Gianna/Andrew to review this when they're back before merging in case there are things we aren't thinking of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here make sense to me. But I am thinking if we could add None
as a valid value here in the data model: https://github.com/Sage-Bionetworks/schematic/blob/develop/tests/data/example.model.csv#L9 and then create a manifest with string None
to better test the changes? Currently, we can't test with our existing data model because no attribute has "None" as a valid value.
I added to an existing test for this, let me know if this covers what you had in mind @linglp |
Update example_test_nones.model.csv component and add new invalid manifest with nones
…/Sage-Bionetworks/schematic into schematic-210-pandas-remove-none
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewelamb, thanks for adding the integration test - please see my comments.
I've opened up #1556 to address the data model and component concerns. That PR just updates the existing data models and adds new test manifests. We can revert #1555 or modify the |
* add valid values to Patient attributes * update data model * add test manifests * update test for new model * update test for new valid value
…/Sage-Bionetworks/schematic into schematic-210-pandas-remove-none
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To aid in the upcoming release due to long testing times, I did this: d0a8e15.
- Nit: Thanks for adding this in an existing test module, but please be sure to modify the docstring to fit the extra testing functions
- Nit: In my opinion, parametrize is best used when it's testing the same condition. For example, if there were many valid manifests to be tested, that is when I would personally use it to add all the valid manifest paths. In my opinion, it's more readable than trying to figure out the if-else AND parametrize around two testing conditions.
Great work everybody!
cc @andrewelamb.
Quality Gate passedIssues Measures |
* add new tests * add unit tests * ran black * Update schematic/models/validate_attribute.py Co-authored-by: BryanFauble <[email protected]> * added tests * Update README.md * Update README.md * add unit tests * run black * Update README.md * temp commit * remove old tests * [FDS-2386] Synapse entity tracking and code concurrency updates (#1505) * [FDS-2386] Synapse entity tracking and code concurrency updates * ran black * Update CODEOWNERS * updated data model type rules to include error param * fix validate type attribute to use msg level param * added error handling * run black * create Node class * sat up Node class so that nodes with no displayName fields cause an error on creation * ran black * ran mypy * added new configs for CLI tests * added new manifests for testing CLI commands * automate manual CLI tests * ran black * Update CODEOWNERS * Update scan_repo.yml * Update .github/CODEOWNERS * Update .github/workflows/scan_repo.yml * Attach additional telemetry data to OTEL traces (#1519) * Attach additional telemetry data to OTEL traces * feat: added tracing for cross manifest validation and file name validation (#1509) * add tracing for GX validation * temp commit * Updating contribution doc to expect squash and merge (#1534) * [FDS-2491] Integration tests for Schematic API Test plan (#1512) Integration tests for Schematic API Test plan * [FDS-2500] Add Integration Tests for: Manifest Validation (#1516) * Add Integration Tests for: Manifest Validation * [FDS-2449] Lock `sphinx` version and update `poetry.lock` (#1530) Also install `typing-extensions` in the build * manual test files now being saved in manifests folder * manual test files now being saved in manifests folder * remove lines to delete json files that were under git control * ran black * add try finally blocks to remove created files * ran black * add lines to remove created json files * Update file annotation store process to require filename be present in order to annotate file * add lines to remove created json files * Revert "Update file annotation store process to require filename be present in order to annotate file" This reverts commit f57c718. * Don't attempt to annotate the table * add code in finally blocks to reset config to default values, when tests change them * complete submit manifest command test * ran black * add test for bug case * update test for table tidyness * remove unused import * remove etag column if already present when building temp file view * catch all exceptions to switch to sequential mode * update test for updated data * Revert "update test for updated data" This reverts commit 255e3c0. * Revert "catch all exceptions to switch to sequential mode" This reverts commit 68b0b24. * catch ValueErrors as well * Updates for integration test failures (#1537) * Updates for integration test failures, Config file reset and scope changes * add todos for removing config resets * [FDS-2525] Authenticated export of telemetry data (#1527) * Authenticated export of telemetry data, updating to HTTP otel library * temp reduce tests * restore tests * uncomment tests * redid how files are deleted, manual tests values are set * ran black * [SCHEMATIC-157] Make some dependencies required to avoid `schematic CLI` commands from potentially erroring when doing a pip install (#1540) * Make otel flash non-optional * Add dependencies as non-optional * Include schematic_api for now (#1547) * update toml version to 24.11.1 (#1548) * [SCHEMATIC-193] Support exporting telemetry data from GH integration test runs (#1550) * Support exporting telemetry data from GH run via access token retrieved via oauth2 * [SCHEMATIC-30, SCHEMATIC-200] Add version to click cli / use pathlib.Path module for checking cache size (#1542) * Add version to click cli * Add version * Run black * Reformat * Fix * Update schematic/schemas/data_model_parser.py * Add test for check_synapse_cache_size * Reformat * Fix tests * Remove unused parameter * Install all-extras for now * Make otel flash non-optional * Update dockerfile * Add dependencies as non-optional * Update pyproject toml * Fix trivy issue * Add service version * Run black * Move all utils.general tests into separate folder * Use pre-commit * Add updates to contribution doc * Fix * Add service version to log provider --------- Co-authored-by: BryanFauble <[email protected]> * [SCHEMATIC-212] Prevent traces from being combined (#1552) * Set instance id in github CI run, uninstrument flask auto during integration test run * [SCHEMATIC-163] Catch error when manifest is generated and existing one doesn't have `entityId` (#1551) * adds error handling * adds unit tests for _get_file_entityIds * updates error message * adds entityid check to parent func * updates docstring * [SCHEMATIC-183] Use paths from file view for manifest generation (#1529) source manifest file paths from synapse fileviews at generation * [SCHEMATIC-214] Wrap pandas functions to support not including `None` with the NA values argument (#1553) * Wrap pandas functions to support not including `None` with the NA values argument * Ignore types * pylint issues * ordering of ignore * Add to integration test to cover none in a manifest * Add additional test for manifest * [SCHEMATIC-210] Add attribute to nones data model (#1555) Update example_test_nones.model.csv component and add new invalid manifest with nones * first commit * ran black * add test for validateModelManifest * [SCHEMATIC-214] change data model and component (#1556) * add valid values to Patient attributes * update data model * add test manifests * update test for new model * update test for new valid value * change test to use new manifests * remove uneeded test file * revert file * revert file * change tests to use new manifests * remove uneeded manifests * ran black * add tests back in * ran black * revert manifest * Split up valid and errored test as separate testing functions * Remove unused import --------- Co-authored-by: Gianna Jordan <[email protected]> Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Thomas Yu <[email protected]> * incremented packge version number * Update publish.yml * Update test.yml * Update api_test.yml * Update pdoc.yml * Update version.py * updates publish.yml (#1558) (#1561) Co-authored-by: Brad Macdonald <[email protected]> --------- Co-authored-by: BryanFauble <[email protected]> Co-authored-by: Jenny V Medina <[email protected]> Co-authored-by: Thomas Yu <[email protected]> Co-authored-by: Lingling <[email protected]> Co-authored-by: GiaJordan <[email protected]> Co-authored-by: Brad Macdonald <[email protected]> Co-authored-by: Gianna Jordan <[email protected]>
Problem:
None
string is included with a manifest the pandas function was causing it to be converted over to a float not a number (NaN). This is a change in Pandas 2.0 release: "Added "None" to default na_values in read_csv() (GH 50286)"Solution:
na_value
objects and remove theNone
value from the list. Pass that list back into the function and replace the defaultna_value
objects with this new list.Testing: