-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
links to specific DBs/Ontologies #55
Comments
Adding links to an abstract is a bit tricky since we don't actually store that text; we pull it from NCBI. It's still doable though, but we should think about how we detect the words. For example, do we want to link plurals? There is an extension that allows you to create your own glossary (which in turn can link to these external sources) and it will automatically provide markup to the pages so users can see the definitions. We may have discussed this on the last call. Adding them to corresponding fields should be very simple.
That would probably work similarly to the abstract links for a vocabulary. |
I like the idea! |
So we could just focus on this for a start, and leave the abstract out for the moment.
Yes. it's the same with synonyms. Again in the beginning we could use exact matches only, but expanding beyond that would mean we define a number of synonyms/aliases for each term, via an additional column to the vocabulary sheet:
These synonyms can be partly pulled from the corresponding EFO and UBERON pages itself. |
It's feasible. |
Checking on the status of this request, please. |
Sorry, I got trapped in a couple of other things. I will have something for you here over the course of this week. |
@tosfos: is it helpful if I split the vocabulary table into thematic sections as given by the fields of the experiment data model? Means should I group links by body site, condition, statistical test, ... or just put everything in one big table? |
I guess groups is easier for us. |
@tosfos I have here two such files for condition and body site to have something to play around with for a start: If you say that's all you need, I'll proceed with producing such files also for the other experiment fields. |
Will do. Just noting that these files have no "Alias" field, but we'll keep in my mind that we should support this ability in case it's needed in the future. |
yes the |
Since we plan to expand this feature, we're creating pages for each of the ontology terms. The page stores a definition, external link to the ontology and aliases. Please see the way we implemented it here (condition and body site), and let me know what you think. |
Interesting. Maybe I am missing something, but why not just linking to the external DBs as we eg do it for Pubmed IDs or NCBI Taxonomy IDs? |
We are displaying the link in a similar way: The circle next to the "feces" text is a link to the external DB. (Should we change the icon?) But we need a more complex system to support these 2 fields. The Pubmed and NCBI IDs always link to the same place. So all we need to know is the ID and we can construct the URL from that. If we know the NCBI ID, we always know that we can link by simply doing: But for terms like Condition, we need to store an individual URL for each term. For example, these conditions link to different sites: So we need a page (or some sort of table) to store data about each term. Also, there was some discussion above about supporting aliases and glossary functions, so we added this ability as well. Please let me know if any changes are needed. |
Ah I see! As a side note:
|
We added the requested icons, as you can see here: https://bugsigdb.org/Study_100/Experiment_2 Note the condition does not always come from efo. For example, the Condition spreadsheet lists: |
Yes, we can. Should we import these .txt files? Are they complete? |
Cool, I like it!
Yes these are cross-references from within EFO (ie EFO is an umbrella ontology that links out to other primary ontologies for certain terms) - so using the efo icon for those is still fine.
I'll pull the aliases for the terms from EFO and UBERON and provide you the complete files in the above defined 3-column specification. I'll report back. We should then also start moving beyond body site and condition. |
just wanted to note that most of the body sites and conditions are still not linked. For example: |
Sorry I have been swamped with a couple of other things, but @tosfos I am providing the 3-column files for condition and body site now also including the aliases/synonyms directly pulled from EFO (for condition) and UBERON (for body site): Note: the synonyms for a specific term can contain all kind of special characters so I had to improvise a little bit on the separator in the
Yes we can now go ahead with this. |
Will do. |
Noting that the links to the external resources will fill in as Glossary pages are added. Is there something wrong with condition.csv? There are strange entries like:
bodysite.csv doesn't look great either. Also, for the alias field, we should only store legitimate aliases, not errors like "Acetaldehyd". If there are errors in the dataset, they should really just be fixed. Question: How should we treat synonyms? Do we store them along with the primary term? Or only store the primary term? |
These are the synonyms that you get from EFO (condition) and from UBERON (body site) for these terms. see eg the synonym section at the top of the EFO page for term "epilepsy" here which correspond to your example entry. I agree that they are not perfect, but it will require manual intervention to sort out "good" and "bad" synonyms and here maybe @lwaldron might want to weigh in whether he has manpower for that.
I am not clear on the implications here. Can you clarify? |
Hi @tosfos, I met with @lwaldron today and discussed how to further proceed on this. That means that we currently don't think they have arisen by mistake or do need further curation. As to your question on how to store them, we thought that synonyms do not need to be exposed to the user but can be used internally for screening terms (and their synonyms) against other content in BugSigDB (such as other fields and abstracts). |
In addition to the examples Fatima provided above, I noticed this again that some body sites and conditions have red links to create a new page even though they are valid ontology terms. Just to write them down:
And an older one https://bugsigdb.org/Study_89 - Body site: meconium, Condition: cesarean section https://bugsigdb.org/Property:Body_site already contains all these body sites, "throat" "nasopharynx" and "meconium". https://bugsigdb.org/Property:Condition already contains "cesarean section" (candidate EFO additions will be "influenza A (H1N1)" and "COVID-19") Some conditions like obesity have a page with some information already, e.g. https://bugsigdb.org/Study_100 links to https://bugsigdb.org/Obesity which was created by WikiWorks743 (March 31 2021), along with the link to the ontology. This is great, can we have this for all ontology terms? With lots of curators starting now, I'd say we need to establish what to do about Body_site and Condition. As an interim measure, should we add all the allowed ontology terms through their Values pages, then worry about linking later? Currently I have a hard time distinguishing between a term that is not among the allowed values, vs just not having a page on the wiki. |
(I added the "priority" label just to bump it up, but it doesn't need to jump the queue ahead of the drill-down, search, and front page features) |
There are 2 separate things we do for terms. One is to make it a "valid" term. The other is to create a Glossary page for it, which allows adding a definition, link or synonyms. Terms that are valid but don't yet have a Glossary page show up as red links in order to allow you to create that Glossary page. We do plan on importing the Links and aliases from the CSV. If you can provide definitions we will import these as well. Or feel free to create individual definitions through the wiki. Should we create a Help page to make this easier?
Absolutely! That is the goal. We created the Obesity page as a demonstration.
I think so. The link/definition is a bonus to have in there, but (I assume) not necessary.
That is true. We should be showing a warning icon for invalid values, and also show those values as plain text so that people don't create Glossary pages for them. Please let me know how to proceed. |
I think the main action items here are:
A wishlist lower-priority item would be to utilize the hierarchy of these ontologies in browsing features, similarly to how we already do for taxonomy. For example, selecting "oral cavity" body site would return all the oral cavity subsites, rather than having these just be independent terms (such as "saliva" and "hard palette"). This is now issue #89. Using synonyms seem lower priority; their only use that immediately comes to mind would be to make search more powerful by identifying relevant pages if someone searches for a synonym, even if the synonyms are never displayed. |
@lgeistlinger can you add all the EFO and Uberon ontology terms to the autocomplete fields? I noticed that we have a misspelled condition "irritable bowel sydrome" in 9 studies, an error from having an administrator manually add each new ontology term on an as-needed basis at https://bugsigdb.org/Help:Admin. I think this can be accomplished just by creating a complete list, one ontology name per line, and copy-pasting it into the box there. Then we will find "sydrome" and any other errors on the cleanup page. |
Sure. Will provide. |
I added all 27,175 EFO terms from
to the admin field for condition. I also added all 14,107 UBERON terms from
to the admin field for body site.
message. A side effect of this might be that I can currently not edit the information of an experiment, as I am left with a blank page in Firefox and an |
Noting that besides for the Form loading issues, the number of allowed values in these fields exceeds what the (older) Drilldown (#74) feature could tolerate. We disabled these fields for the initial (no filter) load so that the drilldown would load properly. Once any filter is selected these fields are available. #89 may solve this problem. |
Can this be closed? I think the Glossary has solved these issues. |
Yes we continue the discussion under #92 |
@lwaldron @tosfos recently I thought about how I typically read a wikipedia article. Often enough you read text of the article and follow the hyperlinks of the terms where there is another wiki page, if you'd like to know more. For example, if I read the wikipedia page on the human microbiome, I might follow the embedded hyperlinks to other wikipedia pages to better understand what eg metagenomics means.
I wonder whether we can support similar features on our study/experiment/signature/taxon pages? Meaning that we would eg embed links in the abstract of a study, where instances of our bugsigdb vocabulary (specific body sites, conditions, statistical tests, experimental platforms, ...) would link to the corresponding pages on Uberon, EFO, Wikipedia, ...; that would of course not only apply to the abstract of a study, but also on the corresponding fields on eg an experiment page.
Programmatically that would involve a two-column table that links all terms of our "bugsigdb vocabulary" to either specific DBs/Ontologies or (per default) Wikipedia itself:
The second step would be to screen through abstracts and the individual fields of eg experiment pages and accordingly associate the terms with the links provided in the table above.
Of course, that could also apply to eg occurences of specific microbes in an abstract that could then link to the corresponding bugsigdb taxon pages.
What do you think about that?
The text was updated successfully, but these errors were encountered: