-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First pass at SpecimenProcessing class #12
Conversation
|
Trying to wrap my head around this in relation to the ADC. Is the intent of this to say that a specific If we are primarily capturing a chain of |
The current model has a |
Yes, the modelling on this PR would use many individual instances of the new On the other hand, if you always want to describe a common specimen processing protocol that's applied to all specimens from all participants in an investigation, then we should model it in a different way. My two initial suggestions would be defining subtypes of I don't know which approach is the right one for your data, but I'm sure we can find something that works. |
This one. This is also how the data is organized in AIRR. There can be commonality across specimens, but implementing that introduces complexity that we don't really need. Semantically things don't change, it's just an optimization to save space and reduce data duplication. |
@jamesaoverton any objections to me jumping in and making changes on this branch to Assay to try and incorporate the Or maybe I should work through this pull request? #10 since it is changing Assay. Alternatively, I can wait for #10 to be merged and then create a new branch/pull request. I think we probably want to generalize |
It looks like #10 is closed to merging, so maybe best to wait for @schristley to approve and merge and then create a separate branch/pull request for Assay changes. |
#10 is ready to merge. I'm hoping @schristley can review and hit the 'Merge' button.
That's fine with me. |
We need to think a bit of the granularity of sample processing, like we did with sequencing assays. I see that there are:
The challenge for these is that I think the AIRR Standard has different fields that are relevant to different processing steps. So it would seem to me that we might want individual AKC classes for these??? Flow cytometry is the best example I suspect. And I think that many of the fields that are assigned to the AIRR-seq Assay are actually fields that should be assigned to something like a Recall the AIRR Standard has |
For example, I would be tempted to create:
|
Actually this class should probably be a
This might be done with Flow Cytometry but possible not... |
I merged in main and resolved the conflicts, I think I did it correctly 8-) |
At first glance, that seems reasonable to me. The slots/fields are the most important pieces. Then we can organize them into classes that either are similar to AIRR, based upon ontology terms, or some combination of both. I want to avoid creating too big of a scope though. Before we go off creating lots of classes, I'd like some feedback about the granularity of the specimen processing coming from ImmPort and/or HIPC. We can certainly focus on what's needed for AIRR-seq, which might require just minimal definitions for a few classes. Specimen processing is also where we expect to have a lot of overlap with Christian's NF4DImmuno where they would have a much richer set of classes. |
A first pass at classes driven by the ADC info.
I have added three sample processing classes as described above, These map pretty naturally to both the AIRR objects as well as OBI In my recent looking at both ImmPort and ImmuneSpace, I wasn't able to find this type of information in their study metadata, but I may have been looking in the wrong place. At least through the web portals for both, it is very hard to find either material processing entities or assays that are AIRR-seq related... |
I also moved some of the fields out of the Not being an expert in running either It is easy to undo this if we decide this is not a good way to go. |
@schristley @jamesaoverton conversion of these classes is working from ADC -> AKC. Do you see any issues or require anything more from IEDB side? Any comment on the above. Is this "good enough" for now? It captures pretty well all of the AIRR fields. Example output from the ADC to AKC conversion is in the Google Drive Study Analysis folder. Should we merge and close this as a working first implementation? |
@jamesaoverton as @schristley suggested, I suppose we should look at the HIPC/ImmPort/ImmuneSpace models for specimen processing as well. Are those available somewhere? |
OK, here is a concrete example: https://immunespace.org/query/study/SDY888 - one of Bjoern's papers. 8-) This study has T-cell repertoire sequencing, but their is no information about any of this in the metadata on Immunespace for this study as far as I can tell. There doesn't appear to be a repertoire sequencing assay, and no information about any of the sample processing that was carried out. There is a detailed cell sort in this study (which would go into So ImmuneSpace certainly has studies that did AIRR-seq, but it does not seem to be capturing any of that information in the study metadata. Is this intentional or am I missing something? |
Here is another that James referenced in another issue. |
Indeed... I have been trying to use both ImmPort and ImmuneSpace to find AIRR-seq data sets, and it is almost impossible. You have to essentially search study metadata for keywords and hope the authors used them somewhere in their abstract or title to get a hit. I have actually confirmed this with ImmPort tech support 8-( So it is intentional that ImmuneSpace doesn't support AIRR-seq studies and AIRR-seq data (at least at this time)? |
HIPC has made a modest change with their single cell template. As you can see in this example, there is a "characteristic" called "library type" that has the value "scBCR-seq". There are other values for TCR and bulk. The issue is that you cannot search those characteristics using NCBI's API. You have to download the SOFT or XML format and search within that. |
On Immport there are 4 assays that one might find relevant:
Which results in 8 studies. |
So this implies to me (as @jamesaoverton suggested in #12 (comment)) that HIPC/Immport/ImmuneSpace won't be adding much to our SpecimenProcessing - so maybe we can merge and close this? |
As we just discussed on the call, ImmuneSpace is planning to include some specimen processing information, but we don't have any yet. (So what I said above wasn't correct concerning ImmuneSpace.) I don't know exactly what we'll need, but I'm happy for you guys to push the design with your immediate needs. The only question I have right now is: how is the order of the specimen processing steps is being captured? I don't see it in the LinkML here, but I might be missing something. Or maybe you don't want to capture that order? |
@schristley @jamesaoverton @bpeters42 it sounds to me like we are fine to use the AIRR standard sample processing as the starting point for the HIPC data model in this case. Currently, there is no such processing in the HIPC model. We have added If these get added to the HIPC model, then the HIPC model will have a sample processing data model that can be directly mapped back to the ADC which is I think what is desirable if I understood the discussion correctly. So - I think we can merge?? 8-) |
@jamesaoverton in the AIRR Standard there is no order other than that implied in the order they appear in the spec. Not sure if that is intentional or not, I would need someone who has actually done some sample processing to comment 8-) How do you show order in a list in LinkML? |
A merge of the ADC and IEDB examples.
I just pushed the combined JSON file, which is a merged AIRRKnowledgeCommons object across all 11 ADC studies and the example IEDB data. So we have 12 investigations along with all of the other data. |
I expect we will care about order inside ImmuneSpace, but I'll discuss that with Bjoern, and that shouldn't block this PR. LinkML represents multivalued fields as JSON arrays, so the simplest way to capture order might be a |
@schristley I think this is ready to merge, I am slowly adding more things on this branch, but they are not really related to specimen processing. Each AKC object instance now has a source_uris slot that points back to the list of ADC_REPERTOIRE:XXX URIs from which the object was created (See airr-knowledge/issues#55 and airr-knowledge/issues#56). See https://github.com/airr-knowledge/ak-schema/tree/specimen-processing/examples/adc for examples. |
First pass at a SpecimenProcessing class to address airr-knowledge/issues#58.
This adds a
SpecimenProcessing
class and a multivaluedspecimen_processing
slot toAssay
. TheSpecimenProcessing
class should have more slots, as required.