about -noautoindex #24

vpbrendel · 2021-01-06T19:24:06Z

Hi Gordon et al.,
I have been trying to use gth in parallel, using a combination of -noautoindex and -intermediate. I got it to work in a roundabout way only, because I could not figure out how to make a proper index with either mkvtree or a first run of gth. The problem I ran into was presence/absence of .dna in the index files. Below is a script how I got around it. Surely there must be a more elegant solution?
Happy New Year, Volker

#/bin/bash!

NUMPRC=24
GENOME=IRBB7unm.fa
CDNAFILE=IRBB7trinityTranscripts.fa

We'll run a toy spliced alignment to create the genome index:

head -2 IRBB7trinityTranscripts.fa > tmpcdna
gth -genomic ${GENOME} -cdna tmpcdna -species rice

... if everything worked as planned, we should now have the

genome index files and can go ahead with the real work in parallel.

However, the created genome index files have the extra tag .dna,

which is then not recognized when using the -noautoindex option to

gth next. As a workaround, we rename the index files to get rid of

the .dna tag. Then gth -noautoindex works (it seems to copy the index

files it needs, using again the .dna tag, but now we seem to have a

working index ...).

ls -1 ${GENOME}.dna* > tmpcmda
cat tmpcmda | sed -e "s/.dna//" > tmpcmdb
sed -i -e "s/^/mv /" tmpcmda
paste tmpcmda tmpcmdb | bash
gth -noautoindex -genomic ${GENOME} -cdna tmpcdna -species rice
\rm tmpcmda tmpcmdb tmpcdna*

gt splitfasta -numfiles ${NUMPRC} ${CDNAFILE}

for cdnafile in ${CDNAFILE}.*
do
gth -noautoindex -intermediate -xmlout -gzip -o gth.${cdnafile}.gz -genomic ${GENOME} -cdna ${cdnafile} -species rice &
done
wait
echo "... gth intermediate run done"

gthconsensus -o gth.TranscriptsOnIRBB7 gth.${CDNAFILE}.*.gz
echo "... gthconsensus run done"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about -noautoindex #24

about -noautoindex #24

vpbrendel commented Jan 6, 2021

about -noautoindex #24

about -noautoindex #24

Comments

vpbrendel commented Jan 6, 2021

We'll run a toy spliced alignment to create the genome index:

... if everything worked as planned, we should now have the

genome index files and can go ahead with the real work in parallel.

However, the created genome index files have the extra tag .dna,

which is then not recognized when using the -noautoindex option to

gth next. As a workaround, we rename the index files to get rid of

the .dna tag. Then gth -noautoindex works (it seems to copy the index

files it needs, using again the .dna tag, but now we seem to have a

working index ...).