-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clone updates #586
Clone updates #586
Conversation
As per #543
@javh @scharch would one of you be able to update the "description" field in the spec for these two counts. I don't know what the right wording should be as per the discussion here: #543 (comment) The old descriptions (which are still in the spec) are:
airr-standards/specs/airr-schema.yaml Line 3763 in 74e2733
I think maybe the clone_count is OK, but the umi_count I am pretty sure needs an update. If you want you could add duplicate_count and a description as well... 8-) |
Thanks, @bcorrie. I pushed some updates for us to hammer on. I also added One of the things we need to figure out is whether
I'm certain we talked about this before - I'll try to find where later. |
I put the description for |
@schristley, I don't think the original description for
In both cases, it's a best-effort to approximate cell count, but it's also not cell count nor raw counts. |
@javh Okay, but your description makes it sound like it's just a count of rearrangement records. The original intent #471 of Non-normalized absolute count of the inferred number of members (immune cells) in this clone. |
"just a count of rearrangement records" will often be correct. Whether it's total sequences or unique sequences will depend upon whether duplicates were removed before calculating "Non-normalized", "inferred" and "immune cells" are all problematic language. |
Some calculations to cover for
These are the most common ones that occur to me; excluding statistical inference methods (model fitting, unseen species correction, etc). My wording didn't cover all these cases either... |
That may be so, but if it's not always true then it's trouble because somebody like Brian is going to read it literally and implement it that way everywhere without considering cases when it isn't true.
How about something like this: Absolute count of the size (number of members) of this clone in the repertoire. This could simply be the number of sequences (Rearrangement records) observed in this clone, or it may be a more sophisticated analysis calculation intertwined with an experimental protocol. Absolute count is provided versus a frequency so that downstream analysis tools can perform their own normalization. The standard frequency can be calculated by dividing by the clone_count sum for all clones in the repertoire. |
@schristley, I like it. That seems to cover everything better than the original wording and my wording. We can probably drop the last sentence, as we don't need to provide suggestions for how to perform relative abundance calculations that aren't covered by a standard field. How about this edit?
|
We already starting making changes for #161 so we should make the other changes as well. |
NOTE: This PR does not address #317 |
@javh there is a the Travis CI check that is not passing. Expected but waiting - is that expected? |
As required by consistency checks.
Really it checks white space!!! 8-)
Trying to get consistency to pass, does it really require the nullable property?
Finally - synced all the specs so they pass consistency check 8-) Hope I didn't screw up any definitions in the progress - @javh you are on for review, can you double check... |
Looks good to me. Thanks, @bcorrie. I'll merge this. That weird Travis "expected but waiting" thing should be gone in every PR now. Also, we need to get in touch with 10x about the |
Closes #543
Closes #161