Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(isProteinCoding): get all overlapping genes from the variant index #948

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from

Conversation

ireneisdoomed
Copy link
Contributor

✨ Context

This PR is related to the isProteinCoding L2G feature and these 2 issues:

This PR largely solves an issue in the feature matrix where the protein coding annotation of the gene was wrongly attributed.

🛠 What does this PR implement

Reimplementation of is_protein_coding_feature_logic: instead of using the TSS location in the gene index to extract which gene the sentinel variant overlaps with, we now:

  • Use intersecting genes extracted by VEP that are present in the variant index. This covers all genes 500kb up and dowstream the variant
  • We use all variants in the locus instead of only the lead

Tests have been adjusted accordingly.

🙈 Missing

This solution is still not ideal as we still might be missing credible set/gene annotation for genes that are outside of the defined window. In that case we could still be wrongly annotating genes as non protein coding.

  • To run the feature matrix step and evaluate if this is still an actual problem

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@github-actions github-actions bot added bug Something isn't working size-M Dataset labels Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Dataset size-M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants