Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeValGal status and issues #553

Open
nekrut opened this issue Oct 8, 2024 · 2 comments
Open

TreeValGal status and issues #553

nekrut opened this issue Oct 8, 2024 · 2 comments

Comments

@nekrut
Copy link
Collaborator

nekrut commented Oct 8, 2024

(from @fubar2)

TreeValGal/EBPhib workflows: October 9 status

1. Current state:

  • Two workflows that could be combined into one.
    • Need a new name
    • Both running on EU and on VGP
  • Basic tracks:
  • Dimers, coverage, SV:
  • New and with little exposure to curators or biologist target audience, so refinement from user feedback is needed to make it more useful.

Question: Should these two WF JBrowse2 tracklists be
combined into one mongo ~50 track browser? Jbrowse2 can cope if most are hidden on the track menu - the unchecked boxes in the track menus at the side. Both currently use sniffles to make a SV track and two of the map one coverage tracks are also duplicated.

2. Blockers

  • Big shoutout to @bgruening @jennaj @natefoo @mvdbeek for helping make all the important ones on EU and VGP go away.
  • Repeatmasker not stable on VGP. A map-reduce solution proposed by @mvdbeek works for the reduce step, but the map repeatmasker over a collection step overwhelms something.
  • Repeatmodeler -> TF models -> repeatmasker may be broken too. Ideally get both working if resources are available, but optional for now.

3. What needs to be done for production?

  • IWC submission being prepared for a PR (Ross and Bjoern)
  • Process to gather inputs to run each new assembly:
    • Two assembled haplotypes (same contig names);
    • Optional NCBI gene/rna/protein taxon id and downloaded fasta files;
    • Optional haplotype or reference fasta from two or more closely related species for sequence similarity mapping;
    • Run repeatmodeler for repeatmasker if both can be made to work;
  • SOP gathering all inputs, executing workflows, distributing JBrowse2 zip files to genomeArk and distributing Fediverse and every other possible notification. Bjoern mentioned some additional ideas for the SOP to minimise redundant work and maximise benefit.
  • Minimise redundant VGP Galaxy storage demand as part of the SOP
    • ? Libraries for the big files - compressed and not. Lots of pointers.
  • Outreach is key to get biologists to try them and provide comments.
    • Push demonstrations to genomeArk assemblies
    • Publicise the URI
    • Make GTN tutorials on JBrowse2 and the WFs
  • Need input to developer and user documentation linked above
  • Need suggestions for improvements, bugs and feedback from curators

4. Track lists for the two workflows - potentially combined?

WF1 WF2
image image
@fubar2
Copy link
Member

fubar2 commented Oct 9, 2024

@Delphine-L:
Would be easier if it was a branch of IWC rather than a fork!
Finally figured out how to switch my IWC clone to your fork but cannot push any updates to your fork without being given access:

fatal: Authentication failed for 'https://github.com/Delphine-L/iwc/'

@Delphine-L
Copy link
Contributor

@fubar2 I invited you to access my fork you should be able to push now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

3 participants