Question regarding align_trim.pl #99

ARastrojo · 2022-01-12T23:37:41Z

Hi all,

I am using align_trim.py script to remove primers from the alignment. In me case, we have performed SARS2 amplification with v3 primer set, and then sequenced with illumina (Miseq).

I was expecting align_trim.py to trims primers from aligned reads only if reads starts or ends in the same positions as a primers starts or ends, and both reads and primers are in the same orientation (I think it has no sense to remove a reverse primer from a forward read, as this forward read can come from and overlapped amplicon). But after executing the program, we can see that all positions in the alignment were a primers is mapped are remove, and all bases up to the end of the read (when the read is forward and the primers is reverse for instance), among other strange things, it could be long to explain...

Here is a image of a couple of reads (in a very sad region, with very few mapped reads)

So, I do not know is I am losing something is the program, or if in nanopore reads, where I do not have any experience, this behavior of the program is the expected.

Best,

Alberto

BioWilko · 2022-01-28T11:13:32Z

Hi Alberto

align_trim is designed for full amplicon unpaired read data and relies on that type of data to function properly. There is version of it I have adapted to work with paired and fragmented illumina data which you can find here but I'm still validating this so it comes with absolutely zero guarantees.

charlesfoster · 2022-02-10T01:26:01Z

Hi @BioWilko,

Based on your comment above, am I correct in assuming that artic's align_trim also does not function optimally on Nanopore data if a rapid barcoding kit has been used, since amplicons can be fragmented? We exclusively use the rapid-barcoding kit here instead of the ligation sequencing kit(s). I've been playing around with alternatives to align_trim in a snakemake workflow, and currently I'm trialling using samtools ampliconclip instead.

Thanks,
Charles

BioWilko · 2022-02-10T15:16:45Z

That is correct however I also have a version of align_trim I've modified to work in the case of fragmented nanopore data available here if you utilise the --fragmented flag however as before this comes with zero guarantees, use at your own risk etc etc but based on my very limited testing it seems to work.

charlesfoster · 2022-02-11T00:29:57Z

Thanks for the reply. I gave the script a crack by (1) dropping it in as a replacement in a clone of the artic-ncov2019 conda environment, and (2) creating a new conda environment after cloning your fork of the fieldbioinformatics repo. The align_trim step errored out for me. No biggie, but just in case you find it useful as an edge case and if you want to continue modifying align_trim, the error was as follows:

Traceback (most recent call last):
  File "/home/vrl/miniconda3/envs/artic-clone/bin/align_trim", line 8, in <module>
    sys.exit(main())
  File "/home/vrl/miniconda3/envs/artic-clone/lib/python3.6/site-packages/artic/align_trim.py", line 379, in main
    go(args)
  File "/home/vrl/miniconda3/envs/artic-clone/lib/python3.6/site-packages/artic/align_trim.py", line 232, in go
    p1 = find_primer(bed, segment.reference_start, "+", args)
  File "/home/vrl/miniconda3/envs/artic-clone/lib/python3.6/site-packages/artic/align_trim.py", line 45, in find_primer
    key=itemgetter(0),
ValueError: min() arg is an empty sequence

I get a bit lost in some of the functions, so I'm not sure what the issue is.

BioWilko · 2022-02-11T10:11:38Z

Would you be willing to share the fastqs for the data you used here? It would be super handy for getting to the bottom of the issue

Edit: Oh you used conda, that will cause problems since I haven't packaged the fork, you should follow the compile from source instructions to use this fork properly.

charlesfoster · 2022-02-21T02:10:18Z

Apologies for my slow reply. I previously installed your fork with the following steps:

git clone https://github.com/BioWilko/fieldbioinformatics.git
cd fieldbioinformatics
mamba env create -f environment.yml # I changed the name of the env to artic-clone first
conda activate artic-clone
python setup.py install

Forgive me, but I can't see any different instructions for compiling from source, have I missed something?

I checked for the --fragmented flag with align trim -h:

usage: align_trim [-h] [--normalise NORMALISE] [--report REPORT] [--start]
                  [--no-read-groups] [--verbose] [--remove-incorrect-pairs]
                  [--fragmented]
                  bedfile

Trim alignments from an amplicon scheme.

positional arguments:
  bedfile               BED file containing the amplicon scheme

optional arguments:
  -h, --help            show this help message and exit
  --normalise NORMALISE
                        Subsample to n coverage per strand
  --report REPORT       Output report to file
  --start               Trim to start of primers instead of ends
  --no-read-groups      Do not divide reads into groups in SAM output
  --verbose             Debug mode
  --remove-incorrect-pairs
  --fragmented

I used the following command to trim the primers:

align_trim --fragmented --normalise 10000 nCoV-2019.scheme.bed --remove-incorrect-pairs --report test.alignreport.txt < test.sorted.bam

Same error as above.

However, when I tried to replicate this problem today on a different computer, I could not do so. When I follow the link you provided above, there is no longer a --fragmented command line argument. Now I'm thoroughly confused!

In any case, the problematic reads and bamfile can be found here: https://unsw-my.sharepoint.com/:f:/g/personal/z3533036_ad_unsw_edu_au/ErSNhdfncMRDvddJFQhV0N4BPB4Y3LRp9jfsaCFIUagz6g?e=M4o0Q8

BioWilko · 2022-02-22T11:39:24Z

Sorry that's on me entirely, I had to revert my repo for a version bump PR. However chris wright at ONT has written an amplicon overlap based version of align_trim which should suit your purposes which you can find here: https://github.com/epi2me-labs/fieldbioinformatics/tree/align_trim

jonbra · 2024-09-05T12:34:04Z

Hi, I think I might have the same problem. Tried to use the version you linked to from epi2me, but got the same problem.
I am using Nanopore data from the Nanopore rapidkit. And we are sequencing a small virus genome, ca. 3kb, and it's (partially) circular. The majority of reads are skipped as "not correctly paired" and the output of align_trim is completely empty. In the end I get a longshot error: "Max read coverage set to 0. printing empty VCF file".

BioWilko · 2024-09-05T12:43:40Z

Fieldbioinformatics doesn't support circular genomes (yet) but I don't know what the specific issue would be in this case, my assumption would be that only amplicons crossing the reference boundaries would be an issue....

If you're having trouble with the epi2me version of fieldbioinformatics I suggest you raise the issues there since it differs in a few key ways, but if all you want is rapid barcoding support then I suggest you try the pre-release of fieldbioinformatics 1.4.0, this does support rapid barcoded data but not circular genomes.

jonbra · 2024-09-06T12:04:14Z

Thanks @BioWilko for the quick response! I managed to run the 1.4.0 release, but still have similar issues. More than 60000 reads map in the first round, but it seems that everything is removed by align_trim because the reads are not correctly paired. Could it be something wrong with my primer file?

BioWilko · 2024-09-06T12:06:36Z

It sounds like it, if you can share your reads, reference, and bed file I can have a look and give you a better answer

jonbra · 2024-09-06T12:36:31Z

Thank you so much for the help!
Do you have access to download this file?
https://www.dropbox.com/scl/fi/ep0r3g5gqz3nbqs2lrbwt/hbv_artic.tar?rlkey=w0dmecvnuhz4fe3l57nb9msky&st=nrjvxr7l&dl=0

jonbra · 2024-09-06T12:39:17Z

This is the command I used and the 1.4.0 release:
artic minion --model r1041_e82_400bps_hac_g615 --normalise 500 --threads 6 --scheme-directory ../primer-schemes/ --bed ../primer-schemes/Ringlander/V4/Ringlander.scheme.bed --ref ../primer-schemes/Ringlander/V4/Ringlander.reference.fasta --read-file /home/jonr/Prosjekter/HBV/work/32/5d92f661e2c02eb1ca4c03d808e57a/HBV010bc01.fastq.gz MINION_TEST

BioWilko · 2024-09-06T13:11:42Z

Ah, here's your problem, the bedfile is malformed:

NC_003977.2     3094    3112    HBV_3131_LEFT   1       +
NC_003977.2     333     355     HBV_353_RIGHT   1       -
NC_003977.2     301     321     HBV_299_LEFT    2       +
NC_003977.2     670     694     HBV_692_RIGHT   2       -
NC_003977.2     636     657     HBV_634_LEFT    1       +
NC_003977.2     1000    1021    HBV_1019_RIGHT  1       -
NC_003977.2     960     985     HBV_958_LEFT    2       +
NC_003977.2     1299    1315    HBV_1313_RIGHT  2       -
NC_003977.2     1265    1285    HBV_1263_LEFT   1       +
NC_003977.2     1611    1629    HBV_1627_RIGHT  1       -
NC_003977.2     1593    1610    HBV_1590_LEFT   2       +
NC_003977.2     1943    1967    HBV_1965_RIGHT  2       -
NC_003977.2     1908    1932    HBV_1906_LEFT   1       +
NC_003977.2     2307    2331    HBV_2329_RIGHT  1       -
NC_003977.2     2053    2072    HBV_2049_LEFT   2       +
NC_003977.2     2474    2496    HBV_2500_RIGHT  2       -
NC_003977.2     2450    2474    HBV_2454_LEFT   1       +
NC_003977.2     2835    2856    HBV_2860_RIGHT  1       -
NC_003977.2     2807    2831    HBV_2811_LEFT   2       +
NC_003977.2     3166    2       HBV_2_RIGHT     2       -

The number in the middle of the primer name is the amplicon number and they need to match e.g. HBV_1_LEFT and HBV_1_RIGHT fieldbioinformatics requires matching names to know which primers are used to amplify which amplicon.

If you want to see the bedfile specifications you can see them here: https://github.com/ChrisgKent/primal-page, fieldbioinformatics supports any of these formats so long as it is used consistently.

jonbra · 2024-09-06T13:23:51Z

Ah... I see. Thanks!! These are custom primers we designed without primalscheme. I tried to format everything corectly, but didn't notice this 😳

jonbra · 2024-09-06T13:24:12Z

The pipeline works now

BioWilko · 2024-09-06T13:25:28Z

Unfortunately it still won't be able to handle amplicons crossing the reference boundary, that functionality will hopefully be added soon!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding align_trim.pl #99

Question regarding align_trim.pl #99

ARastrojo commented Jan 12, 2022

BioWilko commented Jan 28, 2022

charlesfoster commented Feb 10, 2022

BioWilko commented Feb 10, 2022 •

edited

Loading

charlesfoster commented Feb 11, 2022

BioWilko commented Feb 11, 2022 •

edited

Loading

charlesfoster commented Feb 21, 2022

BioWilko commented Feb 22, 2022

jonbra commented Sep 5, 2024

BioWilko commented Sep 5, 2024

jonbra commented Sep 6, 2024

BioWilko commented Sep 6, 2024

jonbra commented Sep 6, 2024

jonbra commented Sep 6, 2024 •

edited

Loading

BioWilko commented Sep 6, 2024 •

edited

Loading

jonbra commented Sep 6, 2024

jonbra commented Sep 6, 2024

BioWilko commented Sep 6, 2024

Question regarding align_trim.pl #99

Question regarding align_trim.pl #99

Comments

ARastrojo commented Jan 12, 2022

BioWilko commented Jan 28, 2022

charlesfoster commented Feb 10, 2022

BioWilko commented Feb 10, 2022 • edited Loading

charlesfoster commented Feb 11, 2022

BioWilko commented Feb 11, 2022 • edited Loading

charlesfoster commented Feb 21, 2022

BioWilko commented Feb 22, 2022

jonbra commented Sep 5, 2024

BioWilko commented Sep 5, 2024

jonbra commented Sep 6, 2024

BioWilko commented Sep 6, 2024

jonbra commented Sep 6, 2024

jonbra commented Sep 6, 2024 • edited Loading

BioWilko commented Sep 6, 2024 • edited Loading

jonbra commented Sep 6, 2024

jonbra commented Sep 6, 2024

BioWilko commented Sep 6, 2024

BioWilko commented Feb 10, 2022 •

edited

Loading

BioWilko commented Feb 11, 2022 •

edited

Loading

jonbra commented Sep 6, 2024 •

edited

Loading

BioWilko commented Sep 6, 2024 •

edited

Loading