Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

links to specific DBs/Ontologies #55

Closed
lgeistlinger opened this issue Feb 3, 2021 · 34 comments
Closed

links to specific DBs/Ontologies #55

lgeistlinger opened this issue Feb 3, 2021 · 34 comments
Assignees
Labels
enhancement New feature or request priority necessary for early utility

Comments

@lgeistlinger
Copy link
Collaborator

lgeistlinger commented Feb 3, 2021

@lwaldron @tosfos recently I thought about how I typically read a wikipedia article. Often enough you read text of the article and follow the hyperlinks of the terms where there is another wiki page, if you'd like to know more. For example, if I read the wikipedia page on the human microbiome, I might follow the embedded hyperlinks to other wikipedia pages to better understand what eg metagenomics means.

I wonder whether we can support similar features on our study/experiment/signature/taxon pages? Meaning that we would eg embed links in the abstract of a study, where instances of our bugsigdb vocabulary (specific body sites, conditions, statistical tests, experimental platforms, ...) would link to the corresponding pages on Uberon, EFO, Wikipedia, ...; that would of course not only apply to the abstract of a study, but also on the corresponding fields on eg an experiment page.

Programmatically that would involve a two-column table that links all terms of our "bugsigdb vocabulary" to either specific DBs/Ontologies or (per default) Wikipedia itself:

TERM,LINK
feces,http://purl.obolibrary.org/obo/UBERON_0001988
acute myeloid leukemia,http://www.ebi.ac.uk/efo/EFO_0000222
Linear Regression,https://en.wikipedia.org/wiki/Linear_regression
Australia,https://en.wikipedia.org/wiki/Australia
RT-qPCR,https://en.wikipedia.org/wiki/Real-time_polymerase_chain_reaction
...

The second step would be to screen through abstracts and the individual fields of eg experiment pages and accordingly associate the terms with the links provided in the table above.

Of course, that could also apply to eg occurences of specific microbes in an abstract that could then link to the corresponding bugsigdb taxon pages.

What do you think about that?

@lgeistlinger lgeistlinger added the question Further information is requested label Feb 3, 2021
@tosfos
Copy link
Collaborator

tosfos commented Feb 4, 2021

embed links in the abstract of a study, where instances of our bugsigdb vocabulary (specific body sites, conditions, statistical tests, experimental platforms, ...) would link to the corresponding pages on Uberon, EFO, Wikipedia, ...; that would of course not only apply to the abstract of a study, but also on the corresponding fields on eg an experiment page.

Adding links to an abstract is a bit tricky since we don't actually store that text; we pull it from NCBI. It's still doable though, but we should think about how we detect the words. For example, do we want to link plurals? There is an extension that allows you to create your own glossary (which in turn can link to these external sources) and it will automatically provide markup to the pages so users can see the definitions. We may have discussed this on the last call.

Adding them to corresponding fields should be very simple.

that could also apply to eg occurences of specific microbes in an abstract that could then link to the corresponding bugsigdb taxon pages.

That would probably work similarly to the abstract links for a vocabulary.

@lwaldron
Copy link
Member

lwaldron commented Feb 4, 2021

I like the idea!

@lgeistlinger
Copy link
Collaborator Author

lgeistlinger commented Feb 4, 2021

Adding them to corresponding fields should be very simple.

So we could just focus on this for a start, and leave the abstract out for the moment.

but we should think about how we detect the words. For example, do we want to link plurals?

Yes. it's the same with synonyms. Again in the beginning we could use exact matches only, but expanding beyond that would mean we define a number of synonyms/aliases for each term, via an additional column to the vocabulary sheet:

TERM,LINK,ALIAS
feces,http://purl.obolibrary.org/obo/UBERON_0001988,"stool,faeces,fecal material"
acute myeloid leukemia,http://www.ebi.ac.uk/efo/EFO_0000222,"acute myelogenous leukemia,AML,Acute Myeloblastic Leukemia"
...

These synonyms can be partly pulled from the corresponding EFO and UBERON pages itself.
If @tosfos thinks it's feasible, and @lwaldron would like to support it, I could go ahead and create the TERM-LINK-ALIAS sheet and would pass it on to @tosfos for linking things on the wiki.

@tosfos
Copy link
Collaborator

tosfos commented Feb 5, 2021

It's feasible.

@lgeistlinger lgeistlinger self-assigned this Feb 9, 2021
@lgeistlinger lgeistlinger added the enhancement New feature or request label Feb 9, 2021
@lgeistlinger lgeistlinger changed the title can we support additional wiki-like features? links to specific DBs/Ontologies Feb 9, 2021
@lgeistlinger lgeistlinger added question Further information is requested and removed question Further information is requested labels Feb 9, 2021
@tosfos
Copy link
Collaborator

tosfos commented Feb 16, 2021

Checking on the status of this request, please.

@lgeistlinger
Copy link
Collaborator Author

lgeistlinger commented Feb 16, 2021

Sorry, I got trapped in a couple of other things. I will have something for you here over the course of this week.

@lgeistlinger
Copy link
Collaborator Author

@tosfos: is it helpful if I split the vocabulary table into thematic sections as given by the fields of the experiment data model? Means should I group links by body site, condition, statistical test, ... or just put everything in one big table?

@tosfos
Copy link
Collaborator

tosfos commented Feb 18, 2021

I guess groups is easier for us.

@lgeistlinger
Copy link
Collaborator Author

@tosfos I have here two such files for condition and body site to have something to play around with for a start:

condition.txt
bodysite.txt

If you say that's all you need, I'll proceed with producing such files also for the other experiment fields.

@tosfos
Copy link
Collaborator

tosfos commented Feb 22, 2021

Will do. Just noting that these files have no "Alias" field, but we'll keep in my mind that we should support this ability in case it's needed in the future.

@lgeistlinger
Copy link
Collaborator Author

yes the ALIAS field will become relevant once we are starting to screen abstracts. For the experiment fields, we expect exact matches only.

@tosfos
Copy link
Collaborator

tosfos commented Mar 17, 2021

Since we plan to expand this feature, we're creating pages for each of the ontology terms. The page stores a definition, external link to the ontology and aliases. Please see the way we implemented it here (condition and body site), and let me know what you think.

@lgeistlinger
Copy link
Collaborator Author

Interesting. Maybe I am missing something, but why not just linking to the external DBs as we eg do it for Pubmed IDs or NCBI Taxonomy IDs?

@tosfos
Copy link
Collaborator

tosfos commented Mar 17, 2021

We are displaying the link in a similar way:
image

The circle next to the "feces" text is a link to the external DB. (Should we change the icon?)

But we need a more complex system to support these 2 fields. The Pubmed and NCBI IDs always link to the same place. So all we need to know is the ID and we can construct the URL from that. If we know the NCBI ID, we always know that we can link by simply doing:
https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=<NCBI ID GOES HERE>

But for terms like Condition, we need to store an individual URL for each term. For example, these conditions link to different sites:
"HIV/AIDS pre-exposure prophylaxis","http://purl.obolibrary.org/obo/GSSO_001787"
"human papilloma virus infection","http://www.ebi.ac.uk/efo/EFO_0001668"

So we need a page (or some sort of table) to store data about each term. Also, there was some discussion above about supporting aliases and glossary functions, so we added this ability as well.

Please let me know if any changes are needed.

@lgeistlinger
Copy link
Collaborator Author

Ah I see!
I think it makes sense. And I guess we are going to fill this pages via a bulk import just as we did for the other pages?

As a side note:

@tosfos
Copy link
Collaborator

tosfos commented Apr 8, 2021

We added the requested icons, as you can see here: https://bugsigdb.org/Study_100/Experiment_2

image

Note the condition does not always come from efo. For example, the Condition spreadsheet lists:
"air pollution","http://purl.obolibrary.org/obo/ENVO_02500037"

@tosfos
Copy link
Collaborator

tosfos commented Apr 8, 2021

And I guess we are going to fill this pages via a bulk import just as we did for the other pages?

Yes, we can. Should we import these .txt files? Are they complete?

@lgeistlinger
Copy link
Collaborator Author

lgeistlinger commented Apr 12, 2021

We added the requested icons, as you can see here: https://bugsigdb.org/Study_100/Experiment_2

Cool, I like it!

Note the condition does not always come from efo. For example, the Condition spreadsheet lists:
"air pollution","http://purl.obolibrary.org/obo/ENVO_02500037"

Yes these are cross-references from within EFO (ie EFO is an umbrella ontology that links out to other primary ontologies for certain terms) - so using the efo icon for those is still fine.

Should we import these .txt files? Are they complete?

I'll pull the aliases for the terms from EFO and UBERON and provide you the complete files in the above defined 3-column specification. I'll report back. We should then also start moving beyond body site and condition.

@ftzohra22
Copy link
Collaborator

just wanted to note that most of the body sites and conditions are still not linked. For example:
https://bugsigdb.org/Study_255/Experiment_3 (body site and condition not linked)
https://bugsigdb.org/Study_218 (condition not linked)

@lgeistlinger
Copy link
Collaborator Author

lgeistlinger commented May 7, 2021

Sorry I have been swamped with a couple of other things, but @tosfos I am providing the 3-column files for condition and body site now also including the aliases/synonyms directly pulled from EFO (for condition) and UBERON (for body site):

condition.csv
bodysite.csv

Note: the synonyms for a specific term can contain all kind of special characters so I had to improvise a little bit on the separator in the ALIAS column (3rd column in the above files).
This is technically a comma-separated list of synonyms for each term, but as commas and semi-colons are partly contained in the synonyms - I used $; to separate individual synonyms within the list. Let me know if I can be clearer on that.

Should we import these .txt files? Are they complete?

Yes we can now go ahead with this.

@tosfos
Copy link
Collaborator

tosfos commented May 9, 2021

Will do.

@tosfos
Copy link
Collaborator

tosfos commented May 11, 2021

Noting that the links to the external resources will fill in as Glossary pages are added.

Is there something wrong with condition.csv? There are strange entries like:

[X]Other epilepsy$;[X]Other epilepsy (disorder)$;EF - Epileptic fit$;EP - Epilepsy$;Epilectic attack, NOS$;EPILEP NEC W/O INTR EPIL$;EPILEP NOS W/O INTR EPIL$;epilepsy$;Epilepsy (disorder)$;Epilepsy and recurrent seizures$;EPILEPSY NEC W INTR EPIL$;Epilepsy NOS$;Epilepsy NOS (disorder)$;EPILEPSY NOS W INTR EPIL$;epilepsy syndrome$;Epilepsy, NOS$;Epilepsy, unspecified$;Epilepsy, unspecified, with intractable epilepsy$;Epilepsy, unspecified, without mention of intractable epilepsy$;Epileptic$;Epileptic attack$;Epileptic attack, NOS$;Epileptic convulsions$;Epileptic convulsions, NOS$;Epileptic disorder$;Epileptic disorder, NOS$;Epileptic fit$;Epileptic fits$;Epileptic fits, NOS$;Epileptic Seizure$;Epileptic seizure (finding)$;Epileptic seizures$;Epileptic seizures, NOS$;epileptic syndrome$;Generalised convulsion$;Generalised fit$;Generalised seizure$;Generalized convulsion$;Generalized fit$;Generalized seizure$;Generalized seizure (finding)$;Other forms of epilepsy$;Other forms of epilepsy (disorder)$;Other forms of epilepsy and recurrent seizures$;Other forms of epilepsy NOS$;Other forms of epilepsy NOS (disorder)$;Other forms of epilepsy, with intractable epilepsy$;Other forms of epilepsy, without mention of intractable epilepsy$;Seizure disorder$;seizure disorder$;Seizure disorder (disorder)

bodysite.csv doesn't look great either. Also, for the alias field, we should only store legitimate aliases, not errors like "Acetaldehyd". If there are errors in the dataset, they should really just be fixed.

Question: How should we treat synonyms? Do we store them along with the primary term? Or only store the primary term?

@lgeistlinger
Copy link
Collaborator Author

lgeistlinger commented May 12, 2021

These are the synonyms that you get from EFO (condition) and from UBERON (body site) for these terms.

see eg the synonym section at the top of the EFO page for term "epilepsy" here which correspond to your example entry.

I agree that they are not perfect, but it will require manual intervention to sort out "good" and "bad" synonyms and here maybe @lwaldron might want to weigh in whether he has manpower for that.

Question: How should we treat synonyms? Do we store them along with the primary term? Or only store the primary term?

I am not clear on the implications here. Can you clarify?

@lgeistlinger
Copy link
Collaborator Author

Hi @tosfos,

I met with @lwaldron today and discussed how to further proceed on this.
Synonyms like "Acetaldehyd" (for term "acetaldehyde") and "[X]Other epilepsy" (for term "epilepsy") are present on the corresponding EFO pages and can be found in scientific publications when googling these terms. As such they are cryptic but represent valid synonyms present in PUBMED articles.

That means that we currently don't think they have arisen by mistake or do need further curation.

As to your question on how to store them, we thought that synonyms do not need to be exposed to the user but can be used internally for screening terms (and their synonyms) against other content in BugSigDB (such as other fields and abstracts).

@lwaldron
Copy link
Member

lwaldron commented Jun 10, 2021

In addition to the examples Fatima provided above, I noticed this again that some body sites and conditions have red links to create a new page even though they are valid ontology terms. Just to write them down:

And an older one https://bugsigdb.org/Study_89 - Body site: meconium, Condition: cesarean section

https://bugsigdb.org/Property:Body_site already contains all these body sites, "throat" "nasopharynx" and "meconium".

https://bugsigdb.org/Property:Condition already contains "cesarean section" (candidate EFO additions will be "influenza A (H1N1)" and "COVID-19")

Some conditions like obesity have a page with some information already, e.g. https://bugsigdb.org/Study_100 links to https://bugsigdb.org/Obesity which was created by WikiWorks743 (March 31 2021), along with the link to the ontology. This is great, can we have this for all ontology terms?

With lots of curators starting now, I'd say we need to establish what to do about Body_site and Condition. As an interim measure, should we add all the allowed ontology terms through their Values pages, then worry about linking later? Currently I have a hard time distinguishing between a term that is not among the allowed values, vs just not having a page on the wiki.

@lwaldron lwaldron added the priority necessary for early utility label Jun 10, 2021
@lwaldron
Copy link
Member

(I added the "priority" label just to bump it up, but it doesn't need to jump the queue ahead of the drill-down, search, and front page features)

@tosfos
Copy link
Collaborator

tosfos commented Jun 11, 2021

have red links to create a new page even though they are valid ontology terms

There are 2 separate things we do for terms. One is to make it a "valid" term. The other is to create a Glossary page for it, which allows adding a definition, link or synonyms. Terms that are valid but don't yet have a Glossary page show up as red links in order to allow you to create that Glossary page.

We do plan on importing the Links and aliases from the CSV. If you can provide definitions we will import these as well. Or feel free to create individual definitions through the wiki. Should we create a Help page to make this easier?

This is great, can we have this for all ontology terms?

Absolutely! That is the goal. We created the Obesity page as a demonstration.

As an interim measure, should we add all the allowed ontology terms through their Values pages, then worry about linking later?

I think so. The link/definition is a bonus to have in there, but (I assume) not necessary.

Currently I have a hard time distinguishing between a term that is not among the allowed values, vs just not having a page on the wiki.

That is true. We should be showing a warning icon for invalid values, and also show those values as plain text so that people don't create Glossary pages for them.

Please let me know how to proceed.

@lwaldron
Copy link
Member

lwaldron commented Jul 9, 2021

I think the main action items here are:

  1. import the full Uberon Anatomy Ontology for body site (so that all terms autocomplete),
  2. import the full Experimental Factor Ontology for condition (so that all terms autocomplete),
  3. link terms out to Uberon or Experimental Factor Ontology

A wishlist lower-priority item would be to utilize the hierarchy of these ontologies in browsing features, similarly to how we already do for taxonomy. For example, selecting "oral cavity" body site would return all the oral cavity subsites, rather than having these just be independent terms (such as "saliva" and "hard palette"). This is now issue #89.

Using synonyms seem lower priority; their only use that immediately comes to mind would be to make search more powerful by identifying relevant pages if someone searches for a synonym, even if the synonyms are never displayed.

@lwaldron
Copy link
Member

@lgeistlinger can you add all the EFO and Uberon ontology terms to the autocomplete fields? I noticed that we have a misspelled condition "irritable bowel sydrome" in 9 studies, an error from having an administrator manually add each new ontology term on an as-needed basis at https://bugsigdb.org/Help:Admin. I think this can be accomplished just by creating a complete list, one ontology name per line, and copy-pasting it into the box there. Then we will find "sydrome" and any other errors on the cleanup page.

@lgeistlinger
Copy link
Collaborator Author

Sure. Will provide.

@lgeistlinger
Copy link
Collaborator Author

I added all 27,175 EFO terms from

> efo
Ontology with 27175 terms

format-version: 1.2
data-version: http://www.ebi.ac.uk/efo/releases/v3.33.0/efo.owl
ontology: http://www.ebi.ac.uk/efo/efo.owl

to the admin field for condition.

I also added all 14,107 UBERON terms from

> uberon
Ontology with 14107 terms

format-version: 1.2
data-version: releases/2020-09-16
default-namespace: uberon
ontology: uberon

to the admin field for body site.
It seems to cause the bugsigdb.org page some processing work, as I see a:

Change propagation updates are pending (3051 jobs estimated) and it is recommended to wait with modifications to a property until the process has been finalized to prevent intermediary interruptions or contradictory specifications.

message. A side effect of this might be that I can currently not edit the information of an experiment, as I am left with a blank page in Firefox and an HTTP ERROR 500 in Chrome when I am trying to edit the information of Experiment 1 of Study 1. Might resolve on its own once pending propagation updates have completed.

@tosfos
Copy link
Collaborator

tosfos commented Aug 24, 2021

Noting that besides for the Form loading issues, the number of allowed values in these fields exceeds what the (older) Drilldown (#74) feature could tolerate. We disabled these fields for the initial (no filter) load so that the drilldown would load properly. Once any filter is selected these fields are available.

#89 may solve this problem.

@tosfos
Copy link
Collaborator

tosfos commented Nov 17, 2021

Can this be closed? I think the Glossary has solved these issues.

@lgeistlinger
Copy link
Collaborator Author

Yes we continue the discussion under #92

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority necessary for early utility
Projects
None yet
Development

No branches or pull requests

4 participants