Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get gzipped tab-separated-values (tabix file) from bedmethyl file generated by modkit, for down stream analysis by nanomethviz R package? #312

Open
ralanany opened this issue Dec 10, 2024 · 6 comments
Labels
question Looking for clarification on inputs and/or outputs

Comments

@ralanany
Copy link

ralanany commented Dec 10, 2024

I have 10 bedmethyl files for 10 patients(1 bedmethylfile/patient). I want to generate the tabix file(input for

nanomethviz R package) for down stream processing. I want the following format, would you advise how to move from ## bedmethyl format to the tabix format?

sample chr pos strand statistic

1 B6Cast_Prom_1_bl6 chr11 101463573 * -0.33
2 B6Cast_Prom_1_bl6 chr11 101463573 * -1.87
3 B6Cast_Prom_1_bl6 chr11 101463573 * -4.19
4 B6Cast_Prom_1_bl6 chr11 101463573 * 0.10
5 B6Cast_Prom_1_cast chr11 101463573 * -0.38
6 B6Cast_Prom_1_cast chr11 101463573 * -0.84

read_name

1 6cc38b35-6570-4b44-a1e3-2605fcf2ffe8
2 787f5f43-d144-4e15-ab7d-6b1474083389
3 c7ee7fb4-a915-4da7-9f36-da6ed5e68af2
4 bff8b135-0296-4495-9354-098242ea8cc4
5 11fe130b-8d48-4399-a9fa-2ca2860fa355
6 502fef95-c2f2-46ad-9bc5-fb3fc80b4245

@ArtRand
Copy link
Contributor

ArtRand commented Dec 11, 2024

Hello @ralanany,

I think you can probably format the bedMethyl output this way. Do you know what the statistic column is? From a quick read of the nanomethviz docs, I couldn't tell. @Shians maybe you know?

@ralanany
Copy link
Author

ralanany commented Dec 12, 2024

Thank you for your reply, but in the package html, They mentioned,

We currently support output from
Nanopolish
f5c
Megalodon

the format output from nanopolish (mentioned below), Then this output format can be converted to tabix indexed bgzipped format by create_tabix_file function in the methviz package, but unfortunately the modkit bedmethyl file is different, my question is how to convert the bedmethyl from modkit to tabix or to be like nanopolish output

chromosome strand start end read_name

1 chr1 - 127732476 127732476 e648c4e3-ca6a-4671-af17-86dab4c819eb
2 chr11 - 115423144 115423144 726dd8b5-1531-4279-9cf0-a7e4d5ea0478
3 chr11 + 69150806 69150814 34f9ee3e-4b27-4d2d-a203-4067f0662044
4 chr1 + 170484965 170484965 d8309c06-375f-4dfe-b22e-0c47af888cd9
5 chrY - 4082060 4082060 f68940f6-4236-4f0f-9af7-a81b5c2911b6
6 chr8 + 120733312 120733312 13ae181f-b88b-4d6c-a815-553ff2e25312

log_lik_ratio log_lik_methylated log_lik_unmethylated num_calling_strands

1 -5.91 -100.38 -94.47 1
2 -8.07 -115.21 -107.13 1
3 -1.65 -183.12 -181.47 1
4 2.74 -112.14 -114.88 1
5 -1.78 -135.09 -133.32 1
6 5.02 -129.31 -134.33 1

num_motifs sequence

1 1 CATTACGTTTC
2 1 AACTTCGTTGA
3 2 GGTCACGGGAATCCGGTTC
4 1 AGAAGCGCTAA
5 1 CTCACCGTATA
6 1 TCTGACGTTGA

@Shians
Copy link

Shians commented Dec 12, 2024

I recently tried to implement direct import of modkit bedmethyl. Looking at your data, I don't think it quite lines up with my expected columns. Could you let me know what command in modkit you used and what version?

Shians/NanoMethViz@4f81810

Repeated issue: Shians/NanoMethViz#49

@ralanany
Copy link
Author

ralanany commented Dec 12, 2024

I recently tried to implement direct import of modkit bedmethyl. Looking at your data, I don't think it quite lines up with my expected columns. Could you let me know what command in modkit you used and what version?

Shians/NanoMethViz@4f81810

Repeated issue: Shians/NanoMethViz#49

Thanks for your reply

here is the format for the betmethyl file generated from this command

modkit pileup path/to/reads.bam output/path/pileup.bed --cpg --ref path/to/reference.fasta

chr1 10468 10469 h 7 . 10468 10469 255,0,0 7 0.00 0 7 0 0 2 0 0
chr1 10468 10469 m 7 . 10468 10469 255,0,0 7 0.00 0 7 0 0 2 0 0

The column names for this file is mentioned here https://github.com/nanoporetech/modkit

@Shians
Copy link

Shians commented Dec 12, 2024

NanoMethViz is intended to be used with read-level information, as such pile-up information isn't compatible since it aggregates read-level information to site-level information. If you instead run modkit extract full then I believe you should be able to directly import the data into Tabix format.

@ralanany
Copy link
Author

ralanany commented Dec 12, 2024

Thank you Shians for your reply again,
I used the recommended command, I used The input bam file, and the output is tsv, but still it is different from nanopolish output
Here is the file content

read_id forward_read_position ref_position chrom mod_strand ref_strand ref_mod_strand

fw_soft_clipped_start fw_soft_clipped_end read_length mod_qual mod_code base_qual
ref_kmer query_kmer canonical_base modified_primary_base inferred flag

7ee32bc3-3bc2-4e05-8293-b478eae576c7 167 348147 chr1 + + + 38 13 912 0.15820313 h 11 . AGCGT C C false 0

@ArtRand ArtRand added the question Looking for clarification on inputs and/or outputs label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Looking for clarification on inputs and/or outputs
Projects
None yet
Development

No branches or pull requests

3 participants