-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clone and tree schema feedback / debrief #333
Comments
Thanks, @eharkins. 1. I think we should find a way to resolve this so that there is a field for this in the schema, rather than relying on a custom field. I suspect the count of (unique) sequence in the clone is going to be a very common field, so we should reserve a name/definition for it. (Related: #161) 2-3. 5. Yeah, that's always a problem with these big PR threads. We usually just use labels to organize issues, but I'm sure we could start using the Projects broad. It would give us more granularity without the accompanying mess of having a bunch of extra labels. The problem with in-line comments is that when people commit changes/fixes they lose context. They are great for small and quick changes, but for larger discussions the code and comments get out of sync (we saw this a few times when working on the Clones/Trees.) |
|
In that case, can you use the respective
Let's give it a try? I made a lineage project and added a few issues to it. We'll figure out if it's more burden than value by using it... |
I think @metasoarous and I added
was too strict given our own or others' use of potentially non unique ids in |
#246 is relevant (long). |
@eharkins Can you explain what "dataset" is in Olmsted terms? Is it the same as a single study? Is it orthogonal to a study and just a set of repertoires? For metadata like the study, subject, sample processing, etc., the You could however, if you were interested, allow Olmsted to utilize AIRR repertoire metadata as an option in place of your internal schema, i.e. support both. There are quite a few studies (published data, not examples) loaded up in the data repositories that you could use. Alternatively, we could convert one of your datasets into the AIRR format, which might be more useful as you would know the expected output and functionality. |
I brought this up elsewhere, but what about having an optional alias called |
@schristley Olmsted dataset is not strictly defined in terms of a study. It can contain many subjects and samples so I would say
is probably accurate.
This is the reason why we kept
Can you clarify what metadata you are referring to? |
I'm not familiar enough with Olmsted but as you are mentioning subjects, samples and so forth, I'm assuming that Olmsted is storing information about them? For example in AIRR, the subject has an id, but it also has a species taxonomy code, a biological sex, an age, and etc. That's the subject metadata, and there's metadata for samples and so on. |
Yes we have metadata about those entities as well but as you said don't adhere to AIRR standards in those cases. We could certainly aim to do this in the future! It wasn't as much of a priority for us as Clones and Trees since we're usually dealing with samples from a single study using a single sample processing setup. |
@javh asked:
unique_seqs_count
we kept because the concept of an individual sequence as a rearrangement isn't well defined in Olmsted and given that context, the wording of sequences seemed like it might be more intuitive to new users. Us as partis users might also confuserearrangement
for an entire clonal family sinceevents
in partis are described as:ident
we kept since we dont enforce unique ids otherwiseid
fields in the AIRR context look like<entity>.<entity>_id
which I can't remember but I think is for DB querying reasons. This doesn't make as much sense in our context where we don't need to do any querying and might care more about being able to use code that takes theid
field of any object, which requires all the id fields to be the same key across objects. At the end of the day this seems like not a very big deal.Thanks so much everyone who participated in helping define a schema for Clones and Trees! This will help Olmsted be more widely useful and will hopefully be helpful for other tools and contexts as well.
tagging some folks from our team to be sure they get to include thoughts if they have them @matsen @psathyrella
The text was updated successfully, but these errors were encountered: