You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, a wonderful collection of tools and a pipeline !! I have had trouble with some if the installation but I am just chopping and cutting the pipeline to fit my needs.
A quick query for the Virfinder step. So I know you want to do q-value < 0.01 and length > 1000bp. I have run the virfinder rscript file and got the output folder. I just want to double check that the filtering awk script is correct.
The subsequent awk script is then this obviously (ignore all mu '"'"'" this is for a loop script that generates jobs for all of my samples).
cat '"$PALPAL"'_virfinder.tsv | awk -F'\t' '"'"'{ if ($4 <= 0.01) print }'"'"' | awk -F'_' '"'"'{ if ($4 >= 1000) print }'"'"' | cut -f2 | sed "s/\"//g" > '"$PALPAL"'_virfinder_filtered_data.txt
So this awk script is currently firstly using awk on the fourth \t column which is p-value ($4), should this not be $5 as the <0.01 filtering is for q-value ?
Secondly, second awk is extracting the fourth instance after underscore, looking at the output this doesn't seem to be correct. If you are wanting length >1000 would it not be a better idea to do awk -F'\t' '"'"'{ if ($2 >= 1000) print }'"'"'
Looking at the virfinder github it seems the original results format would work with your awk script, but I think a new update may of switched this ?
I used the mamba install in the mudoger install scripts but edited a wee bit (mamba create -n virfinder_env -c bioconda r-virfinder) so I believe both my local and the mudogoer install versions would be the same.
Happy to provide more info if needed, and I apologise in advance if I am barking up the wrong tree, but I was doing some QC and testing and noticed that the awk was not approproate for the output file generated by virfinder :).
Ben
The text was updated successfully, but these errors were encountered:
Also, additional thing I have just realised, you need to identify your column and + 1 as the awk takes into account the rowname column in the output from write.table. It may be prudent therefore to put a row.names=F in the r script, or for the awk it would need to be $6 =<0.01 and $3 =>100 for q-value and length respectively
Hi there :)
First of all, a wonderful collection of tools and a pipeline !! I have had trouble with some if the installation but I am just chopping and cutting the pipeline to fit my needs.
A quick query for the Virfinder step. So I know you want to do q-value < 0.01 and length > 1000bp. I have run the virfinder rscript file and got the output folder. I just want to double check that the filtering awk script is correct.
head P10_virfinder.tsv
The subsequent awk script is then this obviously (ignore all mu '"'"'" this is for a loop script that generates jobs for all of my samples).
So this awk script is currently firstly using awk on the fourth \t column which is p-value ($4), should this not be $5 as the <0.01 filtering is for q-value ?
Secondly, second awk is extracting the fourth instance after underscore, looking at the output this doesn't seem to be correct. If you are wanting length >1000 would it not be a better idea to do
awk -F'\t' '"'"'{ if ($2 >= 1000) print }'"'"'
Looking at the
virfinder
github it seems the original results format would work with your awk script, but I think a new update may of switched this ?I used the mamba install in the mudoger install scripts but edited a wee bit (
mamba create -n virfinder_env -c bioconda r-virfinder
) so I believe both my local and the mudogoer install versions would be the same.Happy to provide more info if needed, and I apologise in advance if I am barking up the wrong tree, but I was doing some QC and testing and noticed that the awk was not approproate for the output file generated by
virfinder
:).Ben
The text was updated successfully, but these errors were encountered: