Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rNatMau1 EAR #121

Merged
merged 2 commits into from
Jan 9, 2025
Merged

rNatMau1 EAR #121

merged 2 commits into from
Jan 9, 2025

Conversation

gitcruz
Copy link
Collaborator

@gitcruz gitcruz commented Nov 26, 2024

Assembly review request

  • ToLID: rNatMau1
  • Species: Natrix maura
  • Project: ERGA-BGE
  • Affiliation: CNAG Barcelona

Copy link
Contributor

erga-ear-bot bot commented Nov 26, 2024

Hi @gitcruz, thanks for sending the EAR of Natrix maura.
I added the corresponding tag to the PR and will contact a supervisor and a reviewer ASAP.

Copy link
Contributor

erga-ear-bot bot commented Nov 26, 2024

Hi @tbrown91, do you agree to supervise this assembly?
Please reply to this message only with OK to give acknowledge.

@tbrown91
Copy link
Collaborator

ok

Copy link
Contributor

erga-ear-bot bot commented Nov 26, 2024

*****
EAR Reviewer Selection Process
Date: 2024-11-26 10:49

All Eligible Candidates:

Github ID     | Full Name       | Institution | Total Reviews | Last Review | Active | Busy | Calling Score | Adjusted Score
----------------------------------------------------------------------------------------------------------------------------
ldemirdj      | Lola Demirdjian | Genoscope   | 1             | 2024-10-23  | Y      | N    | 1016          | 1066          
EmilieTeo     | Emilie Teodori  | Genoscope   | 1             | 2024-11-04  | Y      | N    | 1016          | 1066          
CaroB-M       | Caroline Menguy | Genoscope   | 2             | 2024-11-05  | Y      | N    | 1016          | 1066          
auryjm        | Jean-Marc Aury  | Genoscope   | 2             | 2024-11-13  | Y      | N    | 1015          | 1065          
bistace       | Benjamin Istace | Genoscope   | 3             | 2024-10-29  | Y      | N    | 1015          | 1065          
tommathers    | Tom Mathers     | Sanger      | 3             | 2024-09-30  | Y      | N    | 1006          | 1056          
SarahPelan    | Sarah Pelan     | Sanger      | 3             | 2024-10-04  | Y      | N    | 1006          | 1056          
epaule        | Michael Paulini | Sanger      | 3             | 2024-10-16  | Y      | N    | 1006          | 1056          
joannacollins | Jo Collins      | Sanger      | 3             | 2024-10-23  | Y      | N    | 1006          | 1056          
DomAbsolon    | Dom Absolon     | Sanger      | 3             | 2024-11-15  | Y      | N    | 1006          | 1056          
additive3     | Jo Wood         | Sanger      | 4             | 2024-11-04  | Y      | N    | 1005          | 1050          
tbrown91      | Tom Brown       | IZW         | 9             | 2024-10-31  | Y      | N    | 993           | 988           
diegomics     | Diego De Panis  | IZW         | 8             | 2024-10-22  | Y      | N    | 991           | 986           

Selected reviewer: Lola Demirdjian (ldemirdj)
The decision was based on:
- different institution ('Genoscope')
- active ('Y')
- not busy ('N')
- oldest review and fewest reviews among the finalists (1066)

Copy link
Contributor

erga-ear-bot bot commented Nov 26, 2024

Hi @ldemirdj, do you agree to review this assembly?
Please reply to this message only with Yes or No by 02-Dec-2024 at 15:49 CET

@ldemirdj
Copy link
Collaborator

Yes

@erga-ear-bot erga-ear-bot bot requested a review from ldemirdj November 26, 2024 10:51
Copy link
Contributor

erga-ear-bot bot commented Nov 26, 2024

Thanks for agreeing!
I appointed you as the EAR reviewer.
I will keep your status as Busy until you finish this review.
Please check the Wiki if you need to refresh something. (and remember that you must download the EAR PDF to be able to click on the link to the contact map file!)
Contact the PR assignee for any issues.

@tbrown91
Copy link
Collaborator

Hi @gitcruz thanks for sending the EAR for review. Could you briefly describe the savestates that are included in the download link?

Thank you

@gitcruz
Copy link
Collaborator Author

gitcruz commented Nov 26, 2024 via email

@tbrown91
Copy link
Collaborator

perfect, thank you

Copy link
Contributor

erga-ear-bot bot commented Dec 3, 2024

Ping @tbrown91,
One week without any movements on this PR!

@gitcruz
Copy link
Collaborator Author

gitcruz commented Dec 4, 2024

Dear @ldemirdj and @tbrown91,

Did you find some time to revise this assembly?

I would appreciate to have some feedback on it.

Thanks,
Fernando

@ldemirdj
Copy link
Collaborator

ldemirdj commented Dec 4, 2024

Hi @gitcruz,

Sorry for the delay, and thank you for sharing the EAR report. I have reviewed it, and the metrics for this assembly look good. I appreciate your efforts on this genome.

I noticed that you relocated a large number of contigs tagged as unloc, and I agree with this conservative approach. However, I made some adjustments by repositioning a few smaller contigs to the ends of their corresponding scaffolds, which could also be tagged as unloc. This mainly affects SUPER_1, 2, 4, Z, 5, 6, and W. These constitute the majority of my edits, but I can send you my save_state file if that would be helpful.

I agree with your identification of scaffolds Z and W, but I observed a strange pattern in scaffold Z with a spike in coverage. Do you have any insights into this? Additionally, it seems that the scaffolds from SUPER_8 to SUPER_16 interact with other scaffolds. Would you have an explanation for this?

I am still reviewing the map, but these are my initial remarks. Looking forward to your feedback and @tbrown91’s as well!

Best regards,

Lola

SUPER_Z

@gitcruz
Copy link
Collaborator Author

gitcruz commented Dec 4, 2024

Hi @ldemirdj,

Thanks for your response. Please see my replies below.

Yes, please share the savestate with the small unlocs you’ve found. That will be very useful. I think you should be able to place it inside the folder I shared with you. Please let me know if that works.

The region you showed in the SUPER_Z snapshot has a problem of mappability around a gap (no mappings using mq 40). It is a repetitive region rich in heterochromatin (quite frequent in snakes W chromosomes and some parts of Z). In fact, 9,094 bp are represented by tandem repeats. In addition, is actually bridging 2 contigs with autosomal coverage, similar to the PAR region in mammals. With regard to repeats in sex chromosomes I can refer to this paper: https://pmc.ncbi.nlm.nih.gov/articles/PMC5793158/

SUPER_8 to SUPER_16: These superscaffolds clearly correspond with the expected number of micro-chromosomes for this species. The microchromosomes exhibit high degrees of interchromosomal interaction, particularly with other microchromosomes (as we see in rNatMau1 assembly and in other snakes’ assemblies rHemHip1.1 before). See this reference https://pmc.ncbi.nlm.nih.gov/articles/PMC7947875/

Hope this answered your questions.

Best regards,
Fernando

@gitcruz
Copy link
Collaborator Author

gitcruz commented Dec 4, 2024

Hi @ldemirdj,

Thanks for your response. Please see my replies below.

Yes, please share the savestate with the small unlocs you’ve found. That will be very useful. I think you should be able to place it inside the folder I shared with you. Please let me know if that works.

The region you showed in the SUPER_Z snapshot has a problem of mappability around a gap (no mappings using mq 40). It is a repetitive region rich in heterochromatin (quite frequent in snakes W chromosomes and some parts of Z). In fact, 9,094 bp are represented by tandem repeats. In addition, is actually bridging 2 contigs with autosomal coverage, similar to the PAR region in mammals. With regard to repeats in sex chromosomes I can refer to this paper: https://pmc.ncbi.nlm.nih.gov/articles/PMC5793158/

SUPER_8 to SUPER_16: These superscaffolds clearly correspond with the expected number of micro-chromosomes for this species. The microchromosomes exhibit high degrees of interchromosomal interaction, particularly with other microchromosomes (as we see in rNatMau1 assembly and in other snakes’ assemblies). See this reference https://pmc.ncbi.nlm.nih.gov/articles/PMC7947875/

Hope this answered your questions.

Best regards,
Fernando

@ldemirdj
Copy link
Collaborator

ldemirdj commented Dec 4, 2024

Hi Fernando,

Thank you for your detailed responses and the references. I have uploaded the savestate with the small unlocs I found to the folder you shared. Please let me know if you encounter any issues accessing it.

Regarding SUPER_Z and the mappability issue, the information about the repetitive heterochromatin region is very useful. I’ll review the referenced paper to better understand its implications for sex chromosome organization. For the scaffolds (SUPER_8 to SUPER_16), I’ll also dive into the reference you shared to explore the patterns observed in other snake assemblies.

Thank you again for the clear explanations. I’ll follow up if I have additional questions after going through the papers.

Best regards,

Lola

@gitcruz
Copy link
Collaborator Author

gitcruz commented Dec 4, 2024

Hi Lola,

Thanks for the savestate with your review!!!!

Here my comments:

  1. there are several attempts to put together several unlocs that obviously interact with each other but are hard to localize and orient.
  2. Personally, I don’t agree with trying to “localize” SUPER_unloc_1 (see the snapshot) because there is a sharp lack of contacts along the diagonal…

Screenshot 2024-12-04 at 17 36 12

While if you leave it as it was looks like this:

Screenshot 2024-12-04 at 17 42 14

In this case I'd opt for the conservative choice of leaving it as unloc

I’d like to know @tbrown91 opinion on these edits and his own input on the rNatMau1 assembly shared by CNAG.

Again, thanks a lot for your review,
Fernando
P.S. This comment was edited as previous comment on SUPER pieces was wrong sorry

@gitcruz
Copy link
Collaborator Author

gitcruz commented Dec 5, 2024

Dear @ldemirdj,

Please forget my previous comments on the edits. I realized that were simply automatic scaffold renamings in the agp. Just reply to my two comments above.

Thanks!!!!

@tbrown91
Copy link
Collaborator

tbrown91 commented Dec 5, 2024

Morning both,

I am generally more in the conservative camp. I understand placing the unloc sequence from scaffold 1 there, but based on these maps I agree that I can't be 100% certain.

I would really suggest also creating maps in higlass so that you can get a better resolution view of the map and then make the changes in pretext.

I had a go at trying to sort out the W. Again, I would rather be looking at this in higlass, but hopefully it's a step in the right direction. I think at least the two "halfs" are assembled. I don't know if some of this PAR-like regions should also belong at the start of the W (e.g. super_z 136.5Mb-146.3Mb) but for now I think it is ok.

I included the save state ....tom.savestate_1 in the folder

image

The rest of the genome looks really nice. The scaffolding is definitely at the standard we are striving to reach in ERGA and the project. This would perhaps be a nice R&D case for chromosome assembly and scaffolding in the future

@gitcruz
Copy link
Collaborator Author

gitcruz commented Dec 5, 2024

Hi @ldemirdj and @tbrown91,

Thanks for your review.

Regarding to the reorganization of the SUPER_W unlocs, I understand the effort but I am not sure what the intention is. The original W (5.3 Mb) size now is towards the end. Is the idea paint everything together or previous unlocs remain as they were before? I'd like to know before tagging and painting the assembly again. I am not sure if you suggest painting the block around this original W and leave the rest as unlocs.

I agree that the little scaffold_136 is part of the W. Interacts strongly with SUPER_W_unloc_12 so I moved it there for now. Good catch!

Good to know, we agree on the SUPER_1 unlocs as they were originally.

Tom, we are not using HiGlass frequently. But I suspect that the W chromosome would have been better assembled with Hifi reads than with ONT...

Finally, I am bit lost trying to keep track of the Edits, Lola made 43 and Tom 84. @tbrown91 were yours done on top of Lola's save_state??? I personally like the assembly as it is now. But I do have doubts on what is the suggestion for painting the W and which scaffolds we could leave as unloc.

Best,
Fernando

@tbrown91
Copy link
Collaborator

tbrown91 commented Dec 5, 2024

he he he

Yes, I worked on Lola's savestate, but moved the unloc from super_1 back into the unloc region.

It's my feeling that the W can be painted into one chromosome - what do you think?

@talioto
Copy link
Collaborator

talioto commented Dec 5, 2024

Can I chip in? All this unlocalized stuff on super 1 and 5 was introduced after rescuing W sequence from purge_dups. It's repetitive or haplotypic. I've started identifying haplotigs but it's a pain. I would have left it out from the assembly to begin with and just kept the W sequence. Perhaps we can sort through it and keep some, but to me, I think it's more work than it's worth.

As far as localizing all the W sequence into a superscaffold, I would still be a little conservative. Maybe we can put together something, but I am not convinced we can localize it all.

@gitcruz
Copy link
Collaborator Author

gitcruz commented Dec 5, 2024

Hi all,

I agree with Tyler regarding the W. I think scaffolds are hard to order and orient, and it would be very misleading for downstream analyses to use it as it is now. Perhaps I would paint the group from previous W_unloc_13 up to W_unloc_11 (right corner on the image) leaving the rest as unloc (including previous SUPER_W). what do you think?

Screenshot 2024-12-05 at 13 40 28

I also see now scaffold_148 placed in super_4 that looks like a small haplotig.

Best.

@gitcruz
Copy link
Collaborator Author

gitcruz commented Dec 5, 2024

Hi again,

I take that back from SUPER_W up to W_unloc 10 we could group these scaffolds into a larger sequence block

SUPER_W_2024-12-05 at 14 13 39

@tbrown91
Copy link
Collaborator

tbrown91 commented Dec 5, 2024

ok, sounds to me like things are moving in the right direction

@gitcruz can you generate a new image, stats and EAR once you have finished with the W chromosome? I think after one round we should be good to go

@ldemirdj
Copy link
Collaborator

ldemirdj commented Dec 5, 2024

Hi Fernando, Tom, and Tyler,

Thanks for all the feedback and discussion—it’s really helpful to get everyone’s perspectives.

I understand the challenges of localizing SUPER_unloc_1, and I agree that in this case, a conservative approach (leaving it as unlocalized) is likely the best option.

Fernando, I see your concerns about the reorganization of the SUPER_W unlocs. If the goal is to paint the W as a single block, it’s important to clarify which scaffolds should be included and which might remain unlocalized. That said, I share Tyler’s perspective: while it might be possible to consolidate parts of the W, I believe we should proceed conservatively. We might manage to piece some sequences together, but I’m not confident we can localize everything with certainty. I’ll wait for your new image, stats and EAR, Fernando, for a final review.

Thanks again, everyone!

Best,

Lola

@gitcruz
Copy link
Collaborator Author

gitcruz commented Dec 11, 2024

Dear all,

SUPER_W: I have been reorgaizing a bit the W scaffolds and we think we have a section of 16Mb with solid diagonal contacts that can be painted together plus 36Mb constituted by unlocs (hard to place after the main block or with discontinous contacts along the diagonal respect to it)

SUPER_1 and SUPER_5 we distinguished several haplotigs from the other unlocs based on coverage.

The reviewed contact map has been painted and tagged accordingly @ldemirdj and @tbrown91 could you have a look at "rNatMau1.reviewed.savestate_1" before I generate the final EAR?

If it's ok with you, we could get the appoval and start uploading the assembly to ENA as we didi for other genomes. Of course we can wait for the final EAR to merge the PR.

Thanks,
Fernando

@tbrown91
Copy link
Collaborator

Thanks @gitcruz I will take a look through this afternoon. @ldemirdj do you have time today or tomorrow to have a look at the map?

@ldemirdj
Copy link
Collaborator

Thanks @gitcruz, I'll take a look tomorrow.

@gitcruz
Copy link
Collaborator Author

gitcruz commented Dec 12, 2024

Thanks both

@tbrown91
Copy link
Collaborator

W is looking much nicer. That PAR-like regions is really causing some difficulties with some of the scaffolds. I know there's a lot of unlocs, but I think what you've produced is very good.

@ldemirdj if you are happy please "Approve" the PR

Copy link
Contributor

erga-ear-bot bot commented Dec 16, 2024

Thanks @ldemirdj for the review.
I will add a new reviewed species for you to the table when @tbrown91 merges the PR ;)

Congrats on the assembly @gitcruz!
Please make sure that the fasta file to upload to ENA is generated based on the final reviewed version of the assembly.

After @tbrown91 confirmation, you can start with the assembly submission to save time.
The PR will be merged only when the final version of the EAR pdf is available.

@tbrown91
Copy link
Collaborator

Thank you for the review @ldemirdj

@gitcruz please go ahead with the upload and generation of a new EAR. I will merge the PR once we have the new pdf.

Thank you again both

@gitcruz
Copy link
Collaborator Author

gitcruz commented Dec 16, 2024

Thank you very much @ldemirdj and @tbrown91

I will start the computations and generate the final EAR asap.

Best regards,
Fernando

@gitcruz
Copy link
Collaborator Author

gitcruz commented Dec 18, 2024

Hi @tbrown91,

I uploaded the assembly to ENA. Just realized that for this genome we did not used Omni-C but Arima High Coverage HiC. I used the right label fro the upload "Arima v2" and I will fix the final EAR report with the correct information (i.e Arima-HiC).

I will try to upload it before the Christmas break if the mappings finished...but it will take time to map 900M pairs. Please ping me after holidays in case I forget it.

Best regards,
Fernando

@tbrown91
Copy link
Collaborator

Feliz Navidad

@gitcruz
Copy link
Collaborator Author

gitcruz commented Dec 18, 2024

Thanks! Merry Christmas @tbrown91 and @ldemirdj

@ldemirdj
Copy link
Collaborator

Merry Christmas @gitcruz and @tbrown91 !

@gitcruz
Copy link
Collaborator Author

gitcruz commented Dec 18, 2024

Thanks Lola! I fixed the tag in my greetings! I was tagging another reviewer before sorry.

Copy link
Contributor

erga-ear-bot bot commented Dec 25, 2024

Ping @tbrown91,
One week without any movements on this PR!

1 similar comment
Copy link
Contributor

erga-ear-bot bot commented Jan 1, 2025

Ping @tbrown91,
One week without any movements on this PR!

@tbrown91
Copy link
Collaborator

tbrown91 commented Jan 6, 2025

Happy new year all.

A small nudge to @gitcruz to see how many of us are back to work at this point

Copy link
Contributor

erga-ear-bot bot commented Jan 9, 2025

Attention @tbrown91, the EAR PDF was updated.

@gitcruz
Copy link
Collaborator Author

gitcruz commented Jan 9, 2025

Happy new year @tbrown91 !

I uploaded the final EAR report that contains the final pretext map link for rNatMau1.1

Please merge this branch.

Thanks,
Fernando

@tbrown91
Copy link
Collaborator

tbrown91 commented Jan 9, 2025

Ace, thanks @gitcruz happy new year!

@tbrown91 tbrown91 merged commit 1190783 into ERGA-consortium:main Jan 9, 2025
1 check passed
@gitcruz gitcruz deleted the gitcruz-rNatMau1 branch January 9, 2025 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants