Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NTR] Deepening Azimuth annotations #1258

Closed
2 tasks
paolaroncaglia opened this issue Sep 10, 2021 · 15 comments
Closed
2 tasks

[NTR] Deepening Azimuth annotations #1258

paolaroncaglia opened this issue Sep 10, 2021 · 15 comments
Labels
HuBMAP Needed/useful for HuBMAP new term request

Comments

@paolaroncaglia
Copy link
Contributor

paolaroncaglia commented Sep 10, 2021

Placeholder. From the HCA/SCEA/HubMap curators meeting (Sep 8th):
@dosumis assigned this to me: "https://azimuth.hubmapconsortium.org/ - looks at non neural for potential deepening via adding new."
so I'll exclude Human - Motor Cortex and Mouse - Motor Cortex.
Also, leave Human - Lung aside for now, as Joshua Fortriede may be working on that.

  • Identify area(s) of work
    Update: Human - Pancreas would be a good reference tissue to start from, as it only has 1 level of annotation at present, with 13 entries that all have a mapping to CL, so I could easily review the mappings, inspect for presence of CL subtypes etc. (At the curators call on Dec 8th, we agreed that I'll work on the Pancreas dataset in Jan 2022.) Following pancreas, I'd list in order of growing difficulty, based on annotation levels, absence of mappings and number of entries: kidney, bone marrow + PBMC (both have immune focus, so would need a resident expert to at least review), and fetal development.
  • Then email reference group before starting edits (also, ask if they could please provide the data contained in the annotation levels in a tabular/spreadsheet format rather than just online)

Update/note for self from the HCA/SCEA/HubMap curators meeting (Nov 10th): see Ellen's comments there for additional info.

@paolaroncaglia paolaroncaglia self-assigned this Sep 10, 2021
@paolaroncaglia paolaroncaglia added the HuBMAP Needed/useful for HuBMAP label Sep 10, 2021
@paolaroncaglia paolaroncaglia assigned ghost and unassigned paolaroncaglia Feb 22, 2022
@ghost
Copy link

ghost commented Mar 7, 2022

@paolaroncaglia, thank you for advising me of this ticket. If you and @dosumis have recommendations on how to proceed with this work, I would like to hear your ideas so I can get a better sense of the current goal.

For the pancreas set, for example, is the goal to enhance the current entries in CL with, perhaps, refined definitions, additional synonyms and axioms if they can be derived from the Azimuth references? As you wrote above, the cell types referenced already exist in CL.

@paolaroncaglia
Copy link
Contributor Author

Hi @bvarner-ebi , if I remember correctly, @dosumis suggested that, before starting, it may be good to check if the data/mappings linked above are the most up-to-date. You may want to add this topic to the agenda for our monthly curators call on Wednesday, please, so we can check who the best contacts are (in case they've changed). We can then discuss priorities and strategies. If we can't discuss on Wed of course we can chat offline. Thanks!

@emquardokus
Copy link

emquardokus commented Mar 8, 2022

@paolaroncaglia @bvarner-ebi
Here's the pancreas Azimuth link for pancreas : https://azimuth.hubmapconsortium.org/references/#Human%20-%20Pancreas
Here's what you should look for in the Azimuth annotations that I know exist since our group has had many discussions with Rahul Satija's group---he is the other mapping component of HuBMAP.

All of the Azimuth datasets were annotated with existing CL terms whether they were an EXACT match or not.
Example: if there are cell subtypes, but only the parent class exists, both cell types would be annotated with the EXACT same parent CL ID, despite being molecularly unique.

For pancreas, looking at the annotations, the first (abbreviated name) and second column (extended name) but often will differ from the 3rd column CL RDFS:Label name and could indicate a unique cell subtype of this class of cells.

It looks like there are "cell state" differences indicated for stellate cells activated or quiescent, but both mapped to pancreatic stellate cell. Bigger discussion is how to represent cell states in CL.

n label extended label OBO ontology ID markers
1 activated_stellate activated stellate pancreatic stellate cell COL1A1, COL1A2, COL6A3, COL3A1, TIMP3, TIMP1, CTHRC1, SFRP2, BGN, LUM
2 quiescent_stellate quiescent stellate pancreatic stellate cell RGS5, C11orf96, FABP4, CSRP2, IL24, ADIRF, NDUFA4L2, GPX3, IGFBP4, ESAM
3 cycling cycling pancreatic endocrine cell UBE2C, TOP2A, CDK1, BIRC5, PBK, CDKN3, MKI67, CDC20, CCNB2, CDCA3
4 gamma gamma islet cell pancreatic PP cell PPY, AQP3, MEIS2, ID2, GPC5-AS1, CARTPT, PRSS23, ETV1, PPY2, TUBB2A

NOTES: Opportunities to enhance CL
1-3 in table above involve cell state based on differential gene expression data; CL I do not believe addresses "cell state"
4. in table above is an opportunity to add more to definition: gamma islet cell could be added as synonym; doesn't exist yet.

@dosumis
Copy link
Contributor

dosumis commented Mar 8, 2022

All of the Azimuth datasets were annotated with existing CL terms whether they were an EXACT match or not.
Example: if there are cell subtypes, but only the parent class exists, both cell types would be annotated with the EXACT same parent CL ID, despite being molecularly unique.

That's pretty standard use of ontologies. The two subtypes are potentially new CL terms, but we'd need more info before adding. Adding on markers alone from scRNAseq is not impossible, but has the problem that markers tend to be study & analysis technique specific.

It looks like there are "cell state" differences indicated for stellate cells activated or quiescent, but both mapped to pancreatic stellate cell. Bigger discussion is how to represent cell states in CL.

My preferred solution would be to have a standard way for annotators to post-compose these by combining with appropriate GO terms e.g. referring to cell cycle and activation processes.

The first (abbreviated name) and second column (extended name) but often will differ from the 3rd column CL RDFS:Label name and could indicate a unique cell subtype of this class of cells.

Yep. My aim was to leverage this to either suggest new annotations or find potential missing CL terms. Looking at the rather meagre annotations on Pancreas, there're really no opportunity to do that. level 2/3 annotations on the kidney has examples.

image

image

These should be cross-referenced with the other potential new cell types coming from HubMap

@dosumis
Copy link
Contributor

dosumis commented Mar 9, 2022

Adding on markers alone from scRNAseq is not impossible, but has the problem that markers tend to be study & analysis technique specific.

From discussion on CL for scRNAseq call - we decided terms like this are better in PCL. @bvarner-ebi to talk to @shawntanzk about how to go about adding new terms to PCL.

@ghost
Copy link

ghost commented Mar 17, 2022

Spreadsheet consolidating Azimuth terms in progress:
https://docs.google.com/spreadsheets/d/1-ovkQX93B9KVPYJqTuPN9NvPv3pfordMTsRnE9swIY4/edit#gid=1044169210

@ghost
Copy link

ghost commented Mar 21, 2022

Hi, @shawntanzk, is there a different process in place to add terms to PCL than CL? Are the edits made in pcl-edit.owl?

@shawntanzk
Copy link
Contributor

@bvarner-ebi basically yes, though we currently don't really use our edit file. I guess it depends on how you want to handle this. If it is just a couple of terms, then maybe adding it to -edit might be sensible (if this is the case, give me a bit, we made some changes to PCL ID which I've not updated in range). If you are doing a template/a lot of terms, it might be wise to follow what BDSO does and create a separate repo that handles all the backend stuff. From there we can put a make goal to download and incorporate it as a component (currently there are no terms in PCL edit cause it's all imported in).

@shawntanzk
Copy link
Contributor

@bvarner-ebi ive adjusted teh id range accordingly already, happy to assign you range

@ghost
Copy link

ghost commented Mar 21, 2022

Thanks, @shawntanzk! Sure. I might be doing some manual, direct adds related to the Azimuth data.
Additionally, I think we are considering submissions coming in via CAP to be added to PCL (instead of CL directly). Is that still in consideration, @dosumis? If so, would we set up an id range for 'CAP'?

@dosumis
Copy link
Contributor

dosumis commented Mar 21, 2022

I think we are considering submissions coming in via CAP to be added to PCL (instead of CL directly).

I think we probably need to decided on a case-by-case basis (or at least by sets of terms and how they are defined). So - good to have a range in place ASAP.

@shawntanzk
Copy link
Contributor

I've current set up a range for @bvarner-ebi. It's only 1000 terms now (think I will extend everyone's range to 5000 terms to be safe). Should I assign CAP a separate range? and how big should that be?

@github-actions
Copy link

This issue has not seen any activity in the past 6 months; it will be closed automatically in one year from now if no action is taken.

@shawntanzk
Copy link
Contributor

Working on it

@dosumis
Copy link
Contributor

dosumis commented Aug 1, 2023

superseded by #1911

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HuBMAP Needed/useful for HuBMAP new term request
Projects
Status: Done
Development

No branches or pull requests

4 participants