Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pipistrellus hanaki #84

Merged
merged 5 commits into from
Nov 11, 2024

Conversation

SMangenot
Copy link
Contributor

Assembly review request

  • ToLID: mPipHan1
  • Species: Pipistrellus hanaki
  • Project: ERGA-BGE
  • Affiliation: Genoscope

Copy link
Contributor

erga-ear-bot bot commented Oct 3, 2024

Hi @SMangenot, thanks for sending the EAR of Pipistrellus hanaki.
I added the corresponding tag to the PR and will contact a supervisor and a reviewer ASAP.

@erga-ear-bot erga-ear-bot bot added the ERGA-BGE label Oct 3, 2024
Copy link
Contributor

erga-ear-bot bot commented Oct 3, 2024

Hi @tbrown91, do you agree to supervise this assembly?
Please reply to this message only with OK to give acknowledge.

@tbrown91
Copy link
Collaborator

tbrown91 commented Oct 3, 2024

ok

Copy link
Contributor

erga-ear-bot bot commented Oct 3, 2024

*****
EAR Reviewer Selection Process
Date: 2024-10-03 07:05

All Eligible Candidates:

Github ID  | Full Name       | Institution | Total Reviews | Last Review | Active | Busy | Calling Score | Adjusted Score
-------------------------------------------------------------------------------------------------------------------------
talioto    | Tyler Alioto    | CNAG        | 2             | 2024-09-30  | Y      | N    | 1004          | 1054          
epaule     | Michael Paulini | Sanger      | 2             | 2024-09-05  | Y      | N    | 1002          | 1052          
DomAbsolon | Dom Absolon     | Sanger      | 2             | 2024-09-23  | Y      | N    | 1002          | 1052          
additive3  | Jo Wood         | Sanger      | 3             | 2024-06-20  | Y      | N    | 1001          | 1051          
tommathers | Tom Mathers     | Sanger      | 3             | 2024-09-30  | Y      | N    | 1001          | 1051          
tbrown91   | Tom Brown       | IZW         | 8             | 2024-07-05  | Y      | N    | 994           | 989           
diegomics  | Diego De Panis  | IZW         | 7             | 2024-07-05  | Y      | N    | 992           | 987           

Selected reviewer: Tyler Alioto (talioto)
The decision was based on:
- different institution ('CNAG')
- active ('Y')
- not busy ('N')
- highest adjusted calling score in this particular selection (1054)

Copy link
Contributor

erga-ear-bot bot commented Oct 3, 2024

Hi @talioto, do you agree to review this assembly?
Please reply to this message only with Yes or No by 10-Oct-2024 at 09:05 CET

@tbrown91
Copy link
Collaborator

tbrown91 commented Oct 3, 2024

@SMangenot Could you please add BUSCO scores from a more appropriate lineage, for example mammalia or laurasiatheria? I am concerned about the number of duplicated genes and that they didn't seem to decrease even though you removed quite a number of sequences during the curation

Hi @tbrown91, here's the new EAR report with the BUSCO scores from laurasiatheria lineage
Copy link
Contributor

erga-ear-bot bot commented Oct 4, 2024

The researcher has updated the EAR PDF. Please review the assembly @tbrown91.

@talioto
Copy link
Collaborator

talioto commented Oct 9, 2024 via email

@erga-ear-bot erga-ear-bot bot requested a review from talioto October 9, 2024 13:29
Copy link
Contributor

erga-ear-bot bot commented Oct 9, 2024

Thanks for agreeing!
I appointed you as the EAR reviewer.
I will keep your status as Busy until you finish this review.
Please check the Wiki if you need to refresh something. (and remember that you must download the EAR PDF to be able to click on the link to the contact map file!)
Contact the PR assignee for any issues.

@tbrown91
Copy link
Collaborator

Hi @talioto Have you had a chance to look through the assembly? We need to get this submitted by the end of the month unless there are major issues

Copy link
Collaborator

@talioto talioto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not quite done, but here are my notes so far:

Contact density is in general very low, making decisions somewhat difficult. It's too bad the library was not sequenced to higher depth.

Telomeric and subtelomeric sequence is often incorporated but there is a lot left in the chaff/shrapnel at the end with non-specific signal. I don't know how much to trust the placement of this stuff. This is why my review has taken some time.
SUPER_1: some faint telomeric sequence in the middle around 91.1-91.8 Mb. BUT, it seems to be contiged and contacts support this being some relic signal from a chromosome fusion. Perhaps nothing to do here.

join SUPER_7 and SUPER_5? Similar to signal seen in SUPER_1. There is a telomeric repeat region, though.

SUPER_8: 6.5-7 Mb contig repeat that is maybe misplaced. Better to unloc it.

SUPER_14: 0-3.8 Mb. I don't think the signal is specific enough to keep this attached to this chromosome. Based on pattern of contacts to other subtelomeric regions, it seems like this is the odd one out. II would place it in the chaff.

SUPER_15: beginning subtelomeric region. I'm not sure I trust YaHS in putting this together.

SUPER_17: interior telomeric region around 32-26 Mb. Not sure how to handle.

"Y" should probably be X. It is a male specimen: https://www.ebi.ac.uk/biosamples/samples/SAMEA115120470 so I assume XY. Other species in genus have 104 Mb X and 4 Mb Y. or 106 Mb X and 6 Mb Y.
"Y": interior telomeric region 54.3-61.8 Mb. I don't think there's enough support to keep it in the middle of the X. In fact I think this is the actual Y sequence. Keep the part next to it that has contacts with the X and is higher coverage. This is likely part of the PAR. See my savestate.

Perhaps minimap2 alignments to other species in the genus would help sort some of this out.

@talioto
Copy link
Collaborator

talioto commented Oct 14, 2024

@tbrown91
Copy link
Collaborator

Thank you for the review @talioto!

@SMangenot can you please look through Tyler's suggested changes and see if they make sense in the context of the Hi-C map. I wonder if looking at synteny to other pipistrellus genomes would also help, e.g. these 4 are all in chromosomes, including the kuhlii which is stated as scaffold: https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=27671&reference_only=true

@talioto
Copy link
Collaborator

talioto commented Oct 20, 2024

It would be good to know if there's been any progress on this. We're in a time crunch here. Could use this genome span.

@ldemirdj
Copy link
Collaborator

Hi @talioto,

I'll reach out to Sophie for an update, but she's still working on the map. We should have news soon, and we'll be able to submit it before the October 31st deadline.

Thanks!

Lola

A new update for Pipistrellus hanaki
I made the corrections according to your remarks but when comparing with the reference genome I did not join SUPER_7 and SUPER_5 and SUPER_14 and SUPER_15.
@erga-ear-bot erga-ear-bot bot requested a review from talioto October 28, 2024 07:37
Copy link
Contributor

erga-ear-bot bot commented Oct 28, 2024

The researcher has updated the EAR PDF. Please review the assembly @talioto.

@ldemirdj
Copy link
Collaborator

Hi everyone,

Given tomorrow's deadline, it won’t be possible to submit this genome on time. Sophie and I are awaiting Tyler's feedback for a final review, and we will submit it once that's completed 👍 .

Thank you for your understanding.

Best,

Lola

@talioto
Copy link
Collaborator

talioto commented Oct 30, 2024

Man, this one is not easy, but I assume you did some alignments to other Pipistrellus bats. Is the X colinear? Are there any Y's to align to. I'm not sure the piece labeled Y is the Y. I broke off the little bit that matches the X and placed in the gap in the middle of X.
SUPER_13: 53.6 to the end I would break off an unplace it in the chaff.

Here's a link to folder where I have the pretextmap and a savestate.

@ldemirdj
Copy link
Collaborator

ldemirdj commented Oct 30, 2024

Yes, it's a hard genome, the message was mostly for Tom to keep him informed. Sophie will answer your question. Thanks @talioto.

@additive3
Copy link
Collaborator

additive3 commented Oct 30, 2024

My 2 cents on the Y chromosome.
What is currently annotated as Y looks to be just satellite.. perhaps centromeric, and looks like placement is to SUPER_3 (at a guess ~91.4Mb).
I agree that there is a small bit that clips off and is X centromere (also scaffold_56).

So Y... looking at the map, I would suggest that scaffold_32, scaffold_37, scaffold_58, scaffold_34 (in that order) are it.

Hi-C coverage is really not high enough and for another discussion.

@additive3
Copy link
Collaborator

Screenshot 2024-10-30 at 16 11 41

Y chrom.

Copy link
Contributor

erga-ear-bot bot commented Nov 5, 2024

Attention @talioto, the EAR PDF was updated.

@tbrown91
Copy link
Collaborator

tbrown91 commented Nov 5, 2024

Hi @SMangenot Thank you for the new EAR. Could you please detail here the changes that you have made? I'm finding it a little difficult to go through the conversation here and find everything.

Thanks

@SMangenot
Copy link
Contributor Author

The Y chromosome was wrong, I made a mistake in my last card.
I aligned the X chromosome against a reference genome and it now looks correct.
I followed @talioto's instructions for SUPER_13
I've organized scaffold_32, scaffold_37, scaffold_58, scaffold_34 (SUPER_22) to reconstruct the Y chromosome but the alignments against a reference don't seem conclusive.

@additive3
Copy link
Collaborator

I wouldn't necessarily expect to see alignment between Y, esp. from different species.
While gene content is likely conseved, copy number, structure and composition are likely quite different.

A new map with the Y chromosome tagged
Copy link
Contributor

erga-ear-bot bot commented Nov 7, 2024

Attention @talioto, the EAR PDF was updated.

@tbrown91
Copy link
Collaborator

tbrown91 commented Nov 8, 2024

Thanks @SMangenot

@talioto @additive3 let's try to get this one finalised. I don't see much more room for improvement

Copy link
Collaborator

@talioto talioto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead. Not much else to improve. Really need higher coverage of Hi-C from Genoscope to better scaffold and curate and spend less time doing it. The agreed on target is 50x coverage minimum. For bad libraries we go to 100x.

Copy link
Contributor

erga-ear-bot bot commented Nov 10, 2024

Thanks @talioto for the review.
I will add a new reviewed species for you to the table when @tbrown91 merges the PR ;)

Congrats on the assembly @SMangenot!
Please make sure that the fasta file to upload to ENA is generated based on the final reviewed version of the assembly.

After @tbrown91 confirmation, you can start with the assembly submission to save time.
The PR will be merged only when the final version of the EAR pdf is available.

@diegomics
Copy link
Collaborator

diegomics commented Nov 10, 2024

Hi @SMangenot, out of curiosity, do you know why HiC throughput was so low?

@tbrown91 tbrown91 merged commit a95e7ed into ERGA-consortium:main Nov 11, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants