extract sequences from pdb files with BioPandas #5

stephprince · 2024-04-05T17:04:32Z

Motivation

Currently sequence extraction using the biopython SeqIO parser creates fasta files with incorrectly shortened sequences. This seems to occur because SeqIO ignores out-of-order residues, and residues added to the beginning of the pdb file by pdb-fixer are given out-of-order sequence numbers (see 3M8J_A.pdb for an example).

The proposed changes use BioPandas instead to extract the sequence information from the pdb files.

How to test the behavior?

extract-seq -f /path/to/input.pdb -o /path/to/output.fasta

Checklist

Did you update CHANGELOG.md with your changes?
Have you checked our Contributing document?
Have you ensured the PR clearly describes the problem and the solution?
Is your contribution compliant with our coding style? This can be checked running ruff from the source directory.
Have you checked to ensure that there aren't other open Pull Requests for the same change?
Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

codecov · 2024-04-05T17:11:26Z

Codecov Report

Attention: Patch coverage is 97.36842% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 94.38%. Comparing base (aab0700) to head (fc3e6ea).

Files	Patch %	Lines
tests/test_extract_seq.py	96.15%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main       #5       +/-   ##
===========================================
+ Coverage   80.95%   94.38%   +13.42%     
===========================================
  Files           2        3        +1     
  Lines          63       89       +26     
===========================================
+ Hits           51       84       +33     
+ Misses         12        5        -7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

stephprince added 4 commits April 4, 2024 17:31

update seqextraction to use biopandas

47dcd2d

update pdb bash script to use metfish method

956df2d

add tests for seq extraction

f2e3841

fix formatting

68ad840

stephprince requested review from smallfishabc and ajtritt April 5, 2024 17:04

update requirements file

fc3e6ea

smallfishabc approved these changes Apr 5, 2024

View reviewed changes

smallfishabc merged commit bb2b20d into main Apr 5, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extract sequences from pdb files with BioPandas #5

extract sequences from pdb files with BioPandas #5

stephprince commented Apr 5, 2024

codecov bot commented Apr 5, 2024

extract sequences from pdb files with BioPandas #5

extract sequences from pdb files with BioPandas #5

Conversation

stephprince commented Apr 5, 2024

Motivation

How to test the behavior?

Checklist

codecov bot commented Apr 5, 2024

Codecov Report