Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent overlapping repeats #18

Open
palakpsheth opened this issue Jul 16, 2023 · 5 comments
Open

Prevent overlapping repeats #18

palakpsheth opened this issue Jul 16, 2023 · 5 comments

Comments

@palakpsheth
Copy link

Any suggestions on how to prevent overlapping repeat units? for example i only want unique repeats that do not overlap or contain any other repeats

thanks!

@yangao07
Copy link
Collaborator

Can you give a brief specific example?

@palakpsheth
Copy link
Author

@yangao07 apologies for the delay. Here is an example:

readName repN copyNum readLen start end consLen aveMatch fullLen subPos consSeq 0674db42-f074-4189-bb90-d7c75129499f rep0 5.1 10156 104 1019 179 100 0 117,296,475,654,828,1006 CTATGTGAAAACTTTTTGATTATGGGTAAAATTAAGCACAGTGGAAGAATTTCATTCTGTTCTCAGTTTTCCTGGATTATGCCTGGCACCATTAAAGAAAATATCATCTTCGGCGTGTCTTACGACGAGTACAGATACAGAAGCGTCATCAAAGCATGCCAACTAGAAGAGGTAAGAAA 0674db42-f074-4189-bb90-d7c75129499f rep1 3.7 10156 1013 1667 176 96.3 0 1025,1202,1375,1555 CAGAATGAAATTCTTCCACTGTGCTTAATTTTACCCATAATCAAAAAGTTTTCACATAGTTTCTTACCTCTTCTAGTTGGCATGCTTTGATGACGCTTCTGTATCTGTACTCATCATGACACACGAAGATGATATTTTCTTTAATGGTGCCAGGCATAATCCAGGAAAACTGAGAA 0674db42-f074-4189-bb90-d7c75129499f rep2 3.5 10156 1673 2293 179 99.3 0 1686,1863,2042,2222 CTATGTGAAAACTTTTTGATTATGGGTAAAATTAAGCACAGTGGAAGAATTTCATTCTGTTCTCAGTTTTCCTGGATTATGCCTGGCACCATTAAAGAAAATATCATCTTCGGCGTGTCTTACGACGAGTACAGATACAGAAGCGTCATCAAAGCATGCCAACTAGAAGAGGTAAGAAA 0674db42-f074-4189-bb90-d7c75129499f rep3 5.4 10156 2287 3236 179 98.3 0 2296,2473,2638,2813,2991,3170 ATAGTTTCTTACCTCTTCTAGTTGGCATGCTTTGATGACGCTTCTGTATCTGTACTCGTCGTAAGACACGCCGAAGATGATATTTTCTTTAATGGTGCCAGGCATAATCCAGGAAAACTGAGAACAGAATGAAATTCTTCCACTGTGCTTAATTTTACCCATAATCAAAAAGTTTTCAC 0674db42-f074-4189-bb90-d7c75129499f rep4 9.7 10156 2984 4133 112 99.9 0 3164,3276,3388,3500,3612,3724,3836,3948,4059 TTTTCACATAGTTTCTTACCTCTTCTAGTTGGCATGCTTTGATGACGCTTCTGTATCTGTACTCGTCGTAAGATTACCCTCTGAAGGCTCCAGTTCTCCCATAATCAAAAAG 0674db42-f074-4189-bb90-d7c75129499f rep5 8.8 10156 4791 5772 112 93.3 0 4821,4929,5046,5154,5266,5377,5497,5607,5710 CATCAAAGCATGCCAACTAGAAGAGGTAAGAAACTATGTGAAAACTTTTTGATTATGGGAGAACTGGAGCCTTCAGAGGGTAATCTTACGACAAGAGTACAGGATACAGAAG 0674db42-f074-4189-bb90-d7c75129499f rep6 5.4 10156 5688 6639 179 97.6 0 5714,5882,6056,6235,6414,6593 CATCAAAGCATGCCAACTAGAAGAGGTAAGAAACTATGTGAAAACTTTTTGATTATGGGTAAAATTAAGCACAGTGGAAGAATTTCATTCTGTTCTCAGTTTTCCTGGATTATGCCTGGCACCATTAAAGAAAATATCATCTTCGGCGTGTCTTACGACGAGTACAGATACAGAAGCGT 0674db42-f074-4189-bb90-d7c75129499f rep7 3.5 10156 6633 7251 179 98.7 0 6645,6823,7001,7179 CAGAATGAAATTCTTCCACTGTGCTTAATTTTACCCATAATCAAAAAGTTTTCACATAGTTTCTTACCTCTTCTAGTTGGCATGCTTTGATGACGCTTCTGTATCTGTACTCGTCGTAAGACACGCCGAAGATGATATTTTCTTTAATGGTGCCAGGCATAATCCAGGAAAACTGAGAA 0674db42-f074-4189-bb90-d7c75129499f rep8 3.7 10156 7248 7924 180 98.9 0 7257,7440,7620,7799 GAGTACAGATACAGAAGCGTCATCAAAGCATGCCAACTAGAAGAGGTAAGAAACTATGTGAAAACTTTTTGATTATGGGTAAAATTAGAGCACAGTGGAAGAATTTCATTCTGTTCTCAGTTTTCCTGGATTATGCCTGGCACCATTAAAGAAAATATCATCTTCGGCGTGTCTTACGAC 0674db42-f074-4189-bb90-d7c75129499f rep9 5.1 10156 7918 8836 179 99.4 0 7927,8105,8285,8462,8641,8819 ATAGTTTCTTACCTCTTCTAGTTGGCATGCTTTGATGACGCTTCTGTATCTGTACTCGTCGTAAGACACGCCGAAGATGATATTTTCTTTAATGGTGCCAGGCATAATCCAGGAAAACTGAGAACAGAATGAAATTCTTCCACTGTGCTTAATTTTACCCATAATCAAAAAGTTTTCAC 0674db42-f074-4189-bb90-d7c75129499f rep10 9.5 10156 9023 10154 112 99.3 0 9202,9313,9425,9537,9649,9761,9873,9985,10096 GAGTACAGATACAGAAGCGTCATCAAAGCATGCCAACTAGAAGAGGTAAGAAACTATGTGAAAACTTTTTGATTATGGGAGAACTGGAGCCTTCAGAGGGTAATCTTACGAC

@yangao07
Copy link
Collaborator

I am not sure what you want exactly.
For this example, which part of the result do you need/not need?

@palakpsheth
Copy link
Author

Hi @yangao07 if we look at rep0 rep1 and rep2, rep0 and rep2 have the same sequence and period length. rep1 starts inside the start-end range of rep0 and so breaksup the repeat between rep0 and rep2. What would be good to see is a mode that excludes any overlaps of start-end rep regions based on choosing the longest period length or most unit coverage.

The expected period for the above is 179

@yangao07
Copy link
Collaborator

-l/ --longest will only output the tandem repeat that covers the longest sequence. That may work for your case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants