Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about using cayman #8

Open
Xinpeng021001 opened this issue Sep 10, 2024 · 7 comments
Open

Questions about using cayman #8

Xinpeng021001 opened this issue Sep 10, 2024 · 7 comments

Comments

@Xinpeng021001
Copy link

Hi,

Thank you for the excellent tool! I'm trying to use it based on the biorxiv paper but I have some questions:

  1. for the bwa index, I noticed that in the paper you mentioned you used the non-human-gut dataset but in the zenodo, I found the gut dataset also.
  2. Using bwa to create the index is quite slow, so should we create the individual index for each dataset or combine those to create a total index?
  3. could you please provide the code to plot in the paper? the link https://git.embl.de/grp-zeller/cazy_gut_microbiome/ can't be opened.

Thank you for your time and help!

Best Regards,
Xinpeng

@cschu
Copy link
Member

cschu commented Sep 10, 2024

Dear Xinpeng,

Thank you for your interest in cayman!

  1. Indeed, the non human-gut catalogues were annotated in addition to the human gut one.
  2. Both ways work, but the original idea is to create an individual index for each catalogue. In theory, one could profile against the complete GMGC, but that would not perform very well. If you find the bwa indexing slow, you could use a very large value for the -K parameter (as discussed here), however that is usually not necessary for smaller catalogues.
  3. Unfortunately, the link in the preprint is out of date. The repo can be found here.

Best,
Christian

@Xinpeng021001
Copy link
Author

Dear Xinpeng,

Thank you for your interest in cayman!

  1. Indeed, the non human-gut catalogues were annotated in addition to the human gut one.
  2. Both ways work, but the original idea is to create an individual index for each catalogue. In theory, one could profile against the complete GMGC, but that would not perform very well. If you find the bwa indexing slow, you could use a very large value for the -K parameter (as discussed here), however that is usually not necessary for smaller catalogues.
  3. Unfortunately, the link in the preprint is out of date. The repo can be found here.

Best, Christian

Dear Christian,

Thank you for your reply! So should we use the non-human gut to make the index or for different environments you recommend we use different catalogs? For example, if I’m trying to annotate a human gut env, should I follow the paper to use the non-human gut catalogue or just use the annotated human gut catalogue? For other envs also the same question. Thank you for your reply!

Best Regards,
Xinpeng

@cschu
Copy link
Member

cschu commented Sep 10, 2024

Dear Xinpeng,

For a human gut environment, you'd use a bwa index created from GMGC10.human-gut.95nr.0.5.percent.prevalence.fna.gz (gene_catalogues.zip) and the cazy annotations in GMGC10.human-gut.95nr.no-rare.0.5.percent.prevalence_all_v3_FINAL.csv (gene_catalogue_annotations.zip).

For, say, soil, you'd create a bwa index from GMGC10.soil.95nr.no-rare.0.5.percent.prevalence.fna.gz and use the annotations in GMGC10.soil.95nr.no-rare.0.5.percent.prevalence_all_v3_FINAL.csv

And so on. Different environments should be profiled using the closest fitting catalogue.

Best,
Christian

@Xinpeng021001
Copy link
Author

Dear Xinpeng,

For a human gut environment, you'd use a bwa index created from GMGC10.human-gut.95nr.0.5.percent.prevalence.fna.gz (gene_catalogues.zip) and the cazy annotations in GMGC10.human-gut.95nr.no-rare.0.5.percent.prevalence_all_v3_FINAL.csv (gene_catalogue_annotations.zip).

For, say, soil, you'd create a bwa index from GMGC10.soil.95nr.no-rare.0.5.percent.prevalence.fna.gz and use the annotations in GMGC10.soil.95nr.no-rare.0.5.percent.prevalence_all_v3_FINAL.csv

And so on. Different environments should be profiled using the closest fitting catalogue.

Best, Christian

Dear Christian,

Thank you for your reply! I’ll redo the index part. Thank you!

Best Regards,
Xinpeng

@Xinpeng021001
Copy link
Author

Dear Christian,

Good afternoon! Hope this message finds you well. I've finished the index and annotation part before with your help. Currently I'm trying to do the substrate annotation based on the results and substrate curation table. Is there any provided script or program to do that or do I need to write by myself? I noticed that some CAZyme families had multiple substrates, how should I annotate that kind of CAZyme substrates with the annotation RPKM result? Really appreciate your help!

Best Regards,
Xinpeng

@qrducarmon
Copy link
Collaborator

Dear Xinpeng,

Thanks a lot for using our tool! Indeed, it is inevitable that some CAZymes have multiple substrates. I assume that since you saw that some CAZymes have multiple substrates, you did identify our substrate table. You can find some examples of how to deal with this from our Github page where all the code/files are to reproduce the preprint/paper figures (https://github.com/zellerlab/cayman_paper). For example, you could take a CAZyme along into multiple GSEA analyses for the different substrates if it is annotated with different substrates. If you have some very specific research question or so you'd want to tackle, please let me know and I can hopefully provide a more targeted answer.

Best,
Quinten

@Xinpeng021001
Copy link
Author

Dear Quinten,

Thank you for your help! I'm sorry I didn't reply to you in time. I'll try what you mention in the github link again.

Best Regards,
Xinpeng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants