Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export format problem with taxonkit and csvtk #100

Closed
AlenaYoung opened this issue Aug 15, 2024 · 1 comment
Closed

Export format problem with taxonkit and csvtk #100

AlenaYoung opened this issue Aug 15, 2024 · 1 comment

Comments

@AlenaYoung
Copy link

Hi,

I hope to gain taxonomy info while running taxonkit and csvtk using-t, which is helpful for me to import the result into R. But R and excel seems to have trouble importing the result. Some lines such as (et al.2015) and Sedi can't be import effectively.
得到的R导入结果要么是全部集中在一行(read.csv),要么是得到超过结果的行数和列数(read.table)

My script is as shown below:
taxonkit lineage taxid.txt -j 120 | taxonkit reformat -r NA -R 0 -j 120 | csvtk -H -t cut -f 1,3 | csvtk -H -t sep -f 2 -s ';' -R | csvtk add-header -t -n taxid,kingdom,phylum,class,order,family,genus,species | csvtk pretty -t -o taxid_out.csv

My R script is as shown below:
test2 <- read.table("taxid_out.csv",header = TRUE)

The output file I get is as follows.
taxid_out.csv

Any help will be much appreciated.
Thank you in advance,

Alena

@shenwei356
Copy link
Owner

shenwei356 commented Aug 15, 2024

csvtk pretty is for formatting readable format in terminal, the output is not tab or comma deleted file any more.

$ taxonkit lineage <(echo 9606)  \
    | taxonkit reformat -r NA -R 0  \
    | csvtk -H -t cut -f 1,3 \
    | csvtk -H -t sep -f 2 -s ';' -R \
    | csvtk add-header -t -n taxid,kingdom,phylum,class,order,family,genus,species \
> taxid_out.csv

$ cat taxid_out.csv
taxid   kingdom phylum  class   order   family  genus   species
9606    Eukaryota       Chordata        Mammalia        Primates        Hominidae       Homo    Homo sapiens

$ csvtk pretty -t taxid_out.csv -S grid
+-------+-----------+----------+----------+----------+-----------+-------+--------------+
| taxid | kingdom   | phylum   | class    | order    | family    | genus | species      |
+=======+===========+==========+==========+==========+===========+=======+==============+
| 9606  | Eukaryota | Chordata | Mammalia | Primates | Hominidae | Homo  | Homo sapiens |
+-------+-----------+----------+----------+----------+-----------+-------+--------------+

btw, -j 120 does not help.

  -j, --threads int       number of CPUs. 4 is enough (default 4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants