Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parametrises column names #269

Merged
merged 7 commits into from
Mar 19, 2024
Merged

Parametrises column names #269

merged 7 commits into from
Mar 19, 2024

Conversation

ns-rse
Copy link
Collaborator

@ns-rse ns-rse commented Mar 14, 2024

Closes #267

Parametrises the column headers for input files (different versions of ftrs and maxquant, theoretical masses) and the columns PGFinder renames these to and subsequently derived columns based on these.

These values were hard-coded at various places through out the code, this makes updating them awkward and fickle. Instead the values are defined in the file pgfinder/config/columns.yaml and this is loaded to a Python dictionary. The reference to these values are then used at various points rather than hard coded values themselves.

Required tweaks to...

  • pgfinder/pgio.py
  • pgfinder/validation.py
  • pgfinder/matching.py

...as well as some of the test files which were used for comparison (including the regression tests) where the header columns have changed.

This should in theory make it somewhat easier to...

  1. Extend to new file formats when ftrs version changes.
  2. Make it easier to change PGFinder column names if its decided the current are not sufficiently descriptive.

Other changes

  • Tidying up doc strings switching to the newer format of str | Path rather than the older Union[str, Path]. This can be supported under Python 3.8 with from __future__ import annotations but Python 3.8 is due to reach End of Life in 2024-10-31.
  • Drop support for Python 3.8 as it will reach End of Life later this year and set the minimum version to 3.9 and adding support/testing for 3.11.

Tidying up doc strings we are switching to the newer format of `str | Path` rather than the older `Union[str,
Path]`. This can be supported under Python 3.8 with `from __future__ import annotations` but since Python 3.8 is due to
reach End of Life in [2024-10-31](https://endoflife.date/python) there is no harm in dropping support now and setting
the minimum version to 3.9 and adding support/testing for 3.11.
+ Documentation on output columns added.
+ Parametrises columns in `lib/pgfinder/config/columns.yaml` and loads these on initialisation.
+ Uses the columns defined and loaded in `pgfinder.find_pg` functions for loading ftrs and maxquant files.
+ Adds a dictionary for mapping and renaming consolidated variables.

Abstracting the information out of hard-coded variables into parameter files simplifies the process of extending to
future update file formats and makes managing changes (still to come in this PR) easier.

Will have to add to documentation how the configuration file works and perhaps also something to change it, although the
way things are going I don't think anyone will use this interactively and so this documentation would be for
developers (which doesn't make it any less important as the more we can help our future selves the better).
+ Updates `pgfinder/matching.py` to use the parametrised fields names.
+ Annotates `pgfinder/config/columns.yaml` with comments.
+ Updates regression test files (column headers are the only thing that has changed)
+ Updates tests for consolidation (column headers are the only thing that has changed)
+ Removes comment from test.
@ns-rse ns-rse added the io Input and Output label Mar 14, 2024
@TheLostLambda
Copy link
Member

Moving columns to a config file has always felt like the ideal solution! Happy to give this a look soon — is there a reason tests are failing still?

@ns-rse
Copy link
Collaborator Author

ns-rse commented Mar 14, 2024

Yes, I figured if I was going to change lots of things in lots of places it would be sensible to abstract it out and parameterise the values so its easier to change and extend in the future.

Tests passed locally (for once I checked before pushing and making the PR).

I'll have a look tomorrow.

Seems I was mistaken about the minimum Python version for the newer method of writing type hints.
@TheLostLambda
Copy link
Member

TheLostLambda commented Mar 14, 2024

Haha, just beat me to it! https://peps.python.org/pep-0604/ (minimum version of 3.10)

Copy link

codecov bot commented Mar 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.37%. Comparing base (329b88d) to head (4d49244).
Report is 2 commits behind head on master.

❗ Current head 4d49244 differs from pull request most recent head 2cf0534. Consider uploading reports for the commit 2cf0534 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #269      +/-   ##
==========================================
+ Coverage   70.25%   72.37%   +2.12%     
==========================================
  Files          11       10       -1     
  Lines         511      496      -15     
==========================================
  Hits          359      359              
+ Misses        152      137      -15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TheLostLambda
Copy link
Member

@ns-rse Unfortunately looks like some Windows demons to slay — I've tried a couple of times to re-run it, but with no luck. I'm also stuck without a Windows machine to replicate it outside of the CI!

After that's ironed out, I'll review!

@ns-rse
Copy link
Collaborator Author

ns-rse commented Mar 15, 2024

The error stems from pgfinder.gui.internal.uploaded_file() and the function with tempfile.TemporaryDirectory() as tempdir: these haven't been touched in the current Pull Request and work fine on other OS's so my crude guess at the moment is that it is something outside of this code base.

I don't see any way the changes this PR introduces could cause these errors, the with ... : statement is meant to implicitly close files when it is done using them, perhaps its an artifact of using tempdir "with"in the subsequent call to with open(file) as f: would be one possible cause at a guess but since it doesn't happen elsewhere its somewhat confusing.

@ns-rse ns-rse force-pushed the ns-rse/267-column-names branch 2 times, most recently from 6298479 to 4d49244 Compare March 15, 2024 12:02
@TheLostLambda
Copy link
Member

@ns-rse Not sure how you'd like to carry on from here — I'd agree it doesn't seem to be the fault of this diff. Would you want to drop the 3.11 testing for Windows from the CI in the meantime?

@ns-rse
Copy link
Collaborator Author

ns-rse commented Mar 18, 2024

@TheLostLambda Tricky, Python and OS versions are defined via matrixes, so changing that makes it fiddly. We could over-ride the merging rules of requiring all tests to pass and write it up as an issue to be addressed in due course perhaps.

@TheLostLambda
Copy link
Member

@ns-rse I'm happy enough with that! I'll review this and we can get it merged then!

Copy link
Member

@TheLostLambda TheLostLambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, with just a few bits of documentation that I could answer questions for, some commented bits of code that should probably be deleted, and a non-essential note or two!

I'll approve if you want to tidy those last little bits and then merge things in!

lib/docs/data_dictionary.md Outdated Show resolved Hide resolved
lib/docs/data_dictionary.md Outdated Show resolved Hide resolved
lib/docs/data_dictionary.md Outdated Show resolved Hide resolved
lib/docs/data_dictionary.md Outdated Show resolved Hide resolved
lib/docs/data_dictionary.md Outdated Show resolved Hide resolved
lib/pgfinder/matching.py Outdated Show resolved Hide resolved
lib/pgfinder/matching.py Outdated Show resolved Hide resolved
lib/pgfinder/matching.py Outdated Show resolved Hide resolved
lib/pgfinder/matching.py Outdated Show resolved Hide resolved
ascending=[True, False],
key=lambda k: abs(k) if is_numeric_dtype(k) else k,
inplace=True,
kind="stable",
)
group.reset_index(drop=True, inplace=True)

# The hard coded 'Delta ppm' and 'Intensity column names aren't yet handled dynamically
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems somewhat similar with the "Inferred structure ..." and "Intensity ..." ones which are only partially dynamic. If this is a faff to sort out though, then there is no need to worry about it for now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't address these as it would have required making pgfinder.input a dictionary so that the Intensity column could be referenced and would have required another entry under pgfinder to hold the value of Delta ppm.

I'll do those in a separate PR though tomorrow as there is no point in leaving the job half-done. Thanks for picking me up on this.

Thanks for feedback from @TheLostLambda

Also ordered pre-commit hooks.
@ns-rse ns-rse merged commit 736694b into master Mar 19, 2024
9 of 10 checks passed
@ns-rse ns-rse deleted the ns-rse/267-column-names branch March 19, 2024 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io Input and Output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revise column names to be unique after consolidation
3 participants