Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update error message and docs for features argument to clarify CellProfiler default expectations and how to handle non-CellProfiler data #448

Merged
merged 15 commits into from
Sep 27, 2024

Conversation

axiomcura
Copy link
Member

Description

Thank you for your contribution to pycytominer!
Please succinctly summarize your proposed change.
What motivated you to make this change?

Summary

This PR addresses one of the reviewers’ comments:


" When I used the function pycytominer.normalize() on a dataset originating from my own work, I encountered an error message saying

“No CP features found. Are you sure this dataframe is from CellProfiler?”.

I could easily fix the problem because I know the properties of a CellProfiler dataset, and the error message was hinting at some “Metadata_” missing.

The fix was very easy, I just needed to add the prefix “Metadata_” to my metadata columns. My dataset comes from another commonly used proprietary software called Harmony. I understand that the authors are in favor of using open-source resources, but I think that the pycytominer tool could be useful in a much larger context, and take as input different types of datasets. In the readme file, that I followed to perform the normalization, I couldn’t find anywhere that the dataset has to come from CellProfiler. ..."


I replaced the ‘assert’ statement with an exception, as it’s considered bad practice to use assertions during runtime. Assertions can be disabled with the -o parameter, which means they aren’t reliable for handling errors in production code.
Here are some useful links:

Additionally, I updated the error message to clarify that pycytominer infers CellProfiler features by default and provided guidance on how users can manually specify their feature space when working with non-CellProfiler data.

Please also link to any relevant issues that your code is associated with.

This code is also an addition to #430

What is the nature of your change?

  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • This change requires a documentation update.

Checklist

Please ensure that all boxes are checked before indicating that a pull request is ready for review.

  • I have read the CONTRIBUTING.md guidelines.
  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • New and existing unit tests pass locally with my changes.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have deleted all non-relevant text in this pull request template.

@gwaybio
Copy link
Member

gwaybio commented Sep 23, 2024

@axiomcura - integration tests failed, please fix. Looks like the tests expect a specific error message which also needs updating

@codecov-commenter
Copy link

codecov-commenter commented Sep 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.72%. Comparing base (c6ad2a0) to head (5f4ac6e).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #448   +/-   ##
=======================================
  Coverage   94.72%   94.72%           
=======================================
  Files          57       57           
  Lines        3145     3146    +1     
=======================================
+ Hits         2979     2980    +1     
  Misses        166      166           
Flag Coverage Δ
unittests 94.72% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@axiomcura
Copy link
Member Author

@gwaybio All tests have passed! Please review when you have a moment to confirm if it's safe to merge.

@gwaybio
Copy link
Member

gwaybio commented Sep 23, 2024

Thanks Erik! I see that the tests are passing now.

Re-reading the reviewer comment, it seemed that they have issues with the metadata parameter.

I'm thinking that in addition to clarifying the features exception, we should also do something to the metadata parameter here: https://github.com/cytomining/pycytominer/blob/main/pycytominer/cyto_utils/features.py#L113C1-L116C19

Perhaps we can do two more things in this PR to update error message in the metadata inference.

  1. Add an exception (and corresponding tests) for if metadata_features > 0
  2. Add a description to the metadata parameter docstring specifying exactly what metadata="infer" actually expects/does

I'll also tag @d33bs - Dave, after Erik addresses these two points, perhaps you can take a final look to see if both Erik and I overlooked something in making this change. Thanks!

@axiomcura
Copy link
Member Author

axiomcura commented Sep 24, 2024

@gwaybio - This is certainly doable, but I have a few clarifying questions. Could you please explain why the exception condition needs to be metadata_features > 0?

From my understanding, Pycytominer’s main functions default to metadata_features="infer", which always expects metadata features from CellProfiler. As a result, adding an exception for metadata_features > 0 in infer_cp_features() seems to introduce a contradictory logic. My guess would have been metadata_features == 0.

I might be misunderstanding something where the metadata_features > 0 exception should be implemented. I just want to make sure if I am thinking of this correctly before I implement! Thanks!

@gwaybio
Copy link
Member

gwaybio commented Sep 24, 2024

Great catch Erik! We should not check metadata features in the way I suggested above. We ask users to provide metadata features within each function (e.g., normalize()) call.

Perhaps the only change should be whereever we see meta_features="infer" in the docstring (e.g., here: https://github.com/cytomining/pycytominer/blob/main/pycytominer/normalize.py#L41)

We add some specification to clarify that the cases in which meta_features="infer" works as expected is not only when metadata features are prefixed by Metadata_ but that this occurs when using CellProfiler-derived input.

I think this is a relatively minor and straightforward fix.

  1. Find wherever meta_features="infer" is used.
  2. Modify the docstring slightly to indicate CellProfiler-derived default

@axiomcura
Copy link
Member Author

@gwaybio - I have updated the docstrings.

@gwaybio gwaybio changed the title updated error message Update error message and docs for features argument to clarify CellProfiler default expectations and how to handle non-CellProfiler data Sep 26, 2024
@kenibrewer
Copy link
Member

Ahh shucks... there's going to be a lot of merge conflicts between this branch and #396. I was going to finish that one off this weekend. Not sure which order we want to merge / resolve merge conflicts in.

@gwaybio
Copy link
Member

gwaybio commented Sep 26, 2024

@kenibrewer
Copy link
Member

Sounds good! Let's do that!

@gwaybio
Copy link
Member

gwaybio commented Sep 27, 2024

@d33bs - ready for you to take a final look, thanks!

Copy link
Member

@kenibrewer kenibrewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

This would have been very helpful for my first attempt at using pycytominer too!

Copy link
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job, I left a few minor comments, otherwise LGTM!

pycytominer/consensus.py Outdated Show resolved Hide resolved
pycytominer/cyto_utils/features.py Outdated Show resolved Hide resolved
pycytominer/cyto_utils/features.py Outdated Show resolved Hide resolved
pycytominer/cyto_utils/features.py Outdated Show resolved Hide resolved
pycytominer/cyto_utils/write_gct.py Outdated Show resolved Hide resolved
pycytominer/operations/get_na_columns.py Show resolved Hide resolved
tests/test_cyto_utils/test_feature_infer.py Outdated Show resolved Hide resolved
tests/test_operations/test_correlation_threshold.py Outdated Show resolved Hide resolved
tests/test_operations/test_get_na_columns.py Outdated Show resolved Hide resolved
tests/test_operations/test_variance_threshold.py Outdated Show resolved Hide resolved
@axiomcura
Copy link
Member Author

@d33bs @gwaybio I’ve finished applying the requested changes. I’m currently waiting for the tests to pass, and once they do, it’s ready for merging. ( I am unable to merge)

@gwaybio gwaybio merged commit 2fd5ca3 into cytomining:main Sep 27, 2024
11 checks passed
@gwaybio
Copy link
Member

gwaybio commented Sep 27, 2024

merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants