queries on ontology-backed fields, add "isa" operator? #234

schristley · 2019-08-27T21:47:38Z

I brought this up in email, but creating an issue so it doesn't get lost.

This issue came to me the other day and I don't think we've talked about it yet, or at least not in detail. We might want to add this to our agenda to discuss in a future WG call.

Let's take cell_subset as an example. If a user performs a query and indicates "B cell" (CL_0000236), they can mean two possible things. One, they want data which is exactly cell_subset == "B cell". This is the current behavior of the API. Alternatively, they may want data that is B cell or any of its subtypes. Therefore, if a repertoire has cell_subset as "naive B cell" (CL_0000788), this repertoire will not be included in the former query but will be included in the latter query.

There currently isn't an easy way for a user to specific that latter query. Right now they would need to gather a list of all the subtypes of "B cell" and construct a large OR expression to capture all of them. That seems pretty onerous and error prone for the user.

One suggestion is to add an "isa" operator which conveys this meaning. So the former query, cell_subset == "B cell", indicates exact match, while the latter query, cell_subset isa "B cell", indicates B cell or any of its subtypes.

Defining that additional query operator is easy. The challenge is how do repositories implement this? If they are using a RDF triple store (which nobody is now) then it's easy, but your typical SQL or NoSQL databases have a harder time. They would have to do a similar thing of constructing a large OR expression to capture them (or other ideas just as ugly). This also means the repository needs to know about the ontology so that it can gather together the appropriate terms.

schristley · 2019-08-27T21:48:08Z

@bussec, made this email response

I agree, that's an important point. My (naive) expectation was that the current behavior would already be "isa"-ish. IIRC IEDB solved this problem for the species taxonomy by storing all nodes (i.e. between the ontology root and the annotated term) in a separate field, so that they can search across it... but I assume that this is what you consider "ugly" ;-)

schristley · 2019-08-27T21:49:01Z

Bjoern Peters had a followup

The standard approach we use is to store transitive closure (https://en.wikipedia.org/wiki/Transitive_closure) of the taxonomy; essentially a table that has two columns storing 'parent id, child id' pairs. Queries for all children of a given parent can be integrated into standard SQL then and are lightning fast if the table is properly indexed even for very large taxonomies.

bcorrie · 2019-12-06T16:18:40Z

This came up in our local group discussion, as we are embarking on an implementation around ontologies, both at the user interface/gateway level as well as at the service query level.

So for confirmation, we have the following in ADC API v1:

We have {"op":"=", "content":{"field":"sample.cell_subset.value", "value":"B cell"}} will search for an exact match on the "B cell" string in the value of the ontology field cell_subset.
We have {"op":"=", "content":{"field":"sample.cell_subset.id", "value":"CL_0000236"}} will search for an exact match on the "CL_0000236" string in the id of the ontology field cell_subset.

We will work on the definition of an "isa" operator in ADC API v2 for taxonomy/ontology based terms which would capture the more powerful concept of finding all onotlogy entities that lie beneath the queried ontology node.

Is that correct?

bcorrie · 2021-05-20T16:51:22Z

@schristley @bussec with the recent ontology sprint finishing, wondering if we can renew this discussion?

My previous comment above makes sense to me, should we try and move this forward? I think this is more of a definition/semantics thing as the spec doesn't need to change. It is the expected result of the query that needs to be defined.

And then of course our repositories need to implement it 8-)

bcorrie · 2021-05-20T16:53:02Z

Hmm, I thought there already was an "isa" operator, but there is not. So we do need to add it.

bussec · 2021-06-03T00:08:39Z

@bcorrie Looking at this again I think one thing that we need to clarify is which relation in an ontology we would follow. As far as I can see OBO uses subClassOf (http://www.w3.org/2000/01/rdf-schema#subClassOf).

bcorrie · 2021-06-03T16:14:51Z

Good point... Not sure how variable that is and how many ontologies have complex relationships. Has anyone checked??? I have kind of assumed that most of our Ontologies (or the way we thing of them) are considered Trees and therefore a subClassOf relationship probably makes sense (or would suffice). Not sure if we need to specify a relationship (can we assume) and if we need to how do we do it???

schristley · 2021-06-04T03:04:54Z

@bcorrie Looking at this again I think one thing that we need to clarify is which relation in an ontology we would follow. As far as I can see OBO uses subClassOf (http://www.w3.org/2000/01/rdf-schema#subClassOf).

That's the correct relation if you are talking terms that are Class'es, and that's true for all the biomedical ontologies that I'm familiar with. Not all biomedical ontologies are Trees though, but they are DAGs (i.e. multiple class inheritance).

bcorrie · 2023-11-13T20:39:00Z

@schristley we are targeting v2.0 release for AIRR meeting in June. This issue seems to gel well with AIRR Knowledge Commons efforts, but likely this won't hit that deadline. I am suggesting we move this out of the ADC v2.0 Milestone to ADC v2.1 (https://github.com/airr-community/airr-standards/milestone/9). Any objections?

schristley · 2023-11-13T21:33:49Z

@schristley we are targeting v2.0 release for AIRR meeting in June. This issue seems to gel well with AIRR Knowledge Commons efforts, but likely this won't hit that deadline. I am suggesting we move this out of the ADC v2.0 Milestone to ADC v2.1 (https://github.com/airr-community/airr-standards/milestone/9). Any objections?

@bcorrie I guess that's going with the idea that the API version is updated even though there are no API changes, just the schema is changing. I still have mixed feelings about that, but I see pros/cons to both sides. Anyways, regarding the specific question, no objections. Also the PR #550 mentions adding the distinct operator too, not sure if that's a separate issue, if not probably should create one as I expect #550 is too old and will be deleted at some point?

bcorrie · 2023-12-05T23:16:35Z

@schristley if we leave this in v2.0, it boils down to both VDJServer and iReceptor Turnkey implementing it. I am leaning towards leaving it in v2.0, as anything beyond v2.0 is very nebulous. If v2.0 is released in June at the AIRR Meeting, then we would want to implement have this implemented some time shortly after that in the repositories. I think it would be good to have this in v2.0 and implemented in the ADC in some short time frame after that. Thoughts? I think this is doable for iReceptor Turnkey.

schristley · 2023-12-07T19:26:25Z

@bcorrie That's reasonable to me. I think it's doable in the time frame under the assumption this only involves updating the data in the /repertoire end point. I've already started discussions with James Overton as part of AKC work about gathering ontologies (what they call Source of Terminologies - SOT) to support operations such as this. Our though was to start with the airr-standards ontologies. There are different techniques that can be done to handle the query depending upon the database technology.

bcorrie · 2024-02-14T00:51:30Z

Moving this to a non v2.0 tag, as based on discussions around AKC I think we want to do this properly rather than rush for v2.0

schristley added the ADC API V1 AIRR Data Commons API V1 label Aug 27, 2019

schristley added ADC API V2 AIRR Data Commons API V2 and removed ADC API V1 AIRR Data Commons API V1 labels Oct 10, 2019

schristley mentioned this issue May 26, 2020

Consider an API query on ontologies #407

Closed

schristley added this to the ADC V2 milestone Jan 17, 2022

bcorrie mentioned this issue Feb 14, 2024

Behavior of API queries against properties in lists of records #623

Open

bcorrie modified the milestones: ADC 2.0, ADC 2.1 Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

queries on ontology-backed fields, add "isa" operator? #234

queries on ontology-backed fields, add "isa" operator? #234

schristley commented Aug 27, 2019

schristley commented Aug 27, 2019 •

edited by bussec

Loading

schristley commented Aug 27, 2019

bcorrie commented Dec 6, 2019 •

edited

Loading

bcorrie commented May 20, 2021

bcorrie commented May 20, 2021

bussec commented Jun 3, 2021

bcorrie commented Jun 3, 2021

schristley commented Jun 4, 2021 •

edited

Loading

bcorrie commented Nov 13, 2023

schristley commented Nov 13, 2023

bcorrie commented Dec 5, 2023

schristley commented Dec 7, 2023

bcorrie commented Feb 14, 2024

queries on ontology-backed fields, add "isa" operator? #234

queries on ontology-backed fields, add "isa" operator? #234

Comments

schristley commented Aug 27, 2019

schristley commented Aug 27, 2019 • edited by bussec Loading

schristley commented Aug 27, 2019

bcorrie commented Dec 6, 2019 • edited Loading

bcorrie commented May 20, 2021

bcorrie commented May 20, 2021

bussec commented Jun 3, 2021

bcorrie commented Jun 3, 2021

schristley commented Jun 4, 2021 • edited Loading

bcorrie commented Nov 13, 2023

schristley commented Nov 13, 2023

bcorrie commented Dec 5, 2023

schristley commented Dec 7, 2023

bcorrie commented Feb 14, 2024

schristley commented Aug 27, 2019 •

edited by bussec

Loading

bcorrie commented Dec 6, 2019 •

edited

Loading

schristley commented Jun 4, 2021 •

edited

Loading