Clone updates #586

bcorrie · 2022-02-12T00:16:59Z

Closes #543
Closes #161

As per #543

bcorrie · 2022-02-12T00:20:38Z

@javh @scharch would one of you be able to update the "description" field in the spec for these two counts. I don't know what the right wording should be as per the discussion here: #543 (comment)

The old descriptions (which are still in the spec) are:

        umi_count:
            type: integer
            description: Number of Rearrangement records (sequences) included in this clone.
        clone_count:
            type: integer
            description: Non-normalized absolute count of the number of members (immune cells) in this clone.

airr-standards/specs/airr-schema.yaml

Line 3763 in 74e2733

umi_count:

I think maybe the clone_count is OK, but the umi_count I am pretty sure needs an update.

If you want you could add duplicate_count and a description as well... 8-)

javh · 2022-02-21T18:40:20Z

Thanks, @bcorrie. I pushed some updates for us to hammer on. I also added umi_count to Rearrangement and changes some of the wording in there.

One of the things we need to figure out is whether clone_count should be the count of:

Total sequences.
Unique sequence.
Either of the above, depending upon tool and analysis context.

I'm certain we talked about this before - I'll try to find where later.

schristley · 2022-02-22T01:42:08Z

I put the description for clone_count back to its original, which is to imply that the clonal assignment tool, or companion tool, decides what is the clonal abundance (count).

javh · 2022-02-22T17:17:53Z

@schristley, I don't think the original description for clone_count is correct (which is why I changed it). Two examples where this is plainly wrong are:

Classical bulk BCR sequencing clone sizes, which are often calculated as the number of unique variants.
Adaptive's data which does not provide non-normalized counts - only counts normalized to their standard curves.

In both cases, it's a best-effort to approximate cell count, but it's also not cell count nor raw counts.

schristley · 2022-02-22T17:40:02Z

@schristley, I don't think the original description for clone_count is correct (which is why I changed it). Two examples where this is plainly wrong are:

@javh Okay, but your description makes it sound like it's just a count of rearrangement records. The original intent #471 of clone_abundance which got renamed to clone_count was to record what the analysis tool (and/or experimental protocol) inferred to be the clone size. "Non-normalized absolute" is meant to indicate that a frequency shouldn't be provided, as other analysis might want to perform their own normalization. Is this better?

Non-normalized absolute count of the inferred number of members (immune cells) in this clone.

javh · 2022-02-22T17:44:55Z

"just a count of rearrangement records" will often be correct. Whether it's total sequences or unique sequences will depend upon whether duplicates were removed before calculating clone_count.

"Non-normalized", "inferred" and "immune cells" are all problematic language.

javh · 2022-02-22T17:53:37Z

Some calculations to cover for clone_count:

Total number of Rearrangements.
Number of unique variants within the clone. Same as above if you've removed duplicates.
Total number of unique cell_id. In practice this is the same as (1), but with cell_id grouping.
Total number of raw reads and Adaptive's read count corrected by their standard curve. These two are the same thing w/ and w/out PCR amplification bias correction.

These are the most common ones that occur to me; excluding statistical inference methods (model fitting, unseen species correction, etc).

My wording didn't cover all these cases either...

schristley · 2022-02-22T18:03:28Z

"just a count of rearrangement records" will often be correct. Whether it's total sequences or unique sequences will depend upon whether duplicates were removed before calculating clone_count.

That may be so, but if it's not always true then it's trouble because somebody like Brian is going to read it literally and implement it that way everywhere without considering cases when it isn't true.

"Non-normalized", "inferred" and "immune cells" are all problematic language.

How about something like this:

Absolute count of the size (number of members) of this clone in the repertoire. This could simply be the number of sequences (Rearrangement records) observed in this clone, or it may be a more sophisticated analysis calculation intertwined with an experimental protocol. Absolute count is provided versus a frequency so that downstream analysis tools can perform their own normalization. The standard frequency can be calculated by dividing by the clone_count sum for all clones in the repertoire.

javh · 2022-02-22T19:15:47Z

@schristley, I like it. That seems to cover everything better than the original wording and my wording. We can probably drop the last sentence, as we don't need to provide suggestions for how to perform relative abundance calculations that aren't covered by a standard field.

How about this edit?

Absolute count of the size (number of members) of this clone in the repertoire. This could simply be the number of sequences (Rearrangement records) observed in this clone, the number of distinct cell barcodes (unique cell_id values), or a more sophisticated calculation appropriate to the experimental protocol. Absolute count is provided instead of a frequency so that downstream analysis tools can perform their own normalization.

schristley · 2022-02-22T19:27:48Z

We already starting making changes for #161 so we should make the other changes as well.

scharch · 2022-02-22T19:30:56Z

NOTE: This PR does not address #317

…-standards into clone-update

bcorrie · 2022-04-22T16:43:04Z

@javh there is a the Travis CI check that is not passing. Expected but waiting - is that expected?

As required by consistency checks.

Really it checks white space!!! 8-)

Trying to get consistency to pass, does it really require the nullable property?

bcorrie · 2022-04-22T17:22:16Z

Finally - synced all the specs so they pass consistency check 8-)

Hope I didn't screw up any definitions in the progress - @javh you are on for review, can you double check...

javh · 2022-04-25T16:13:47Z

Looks good to me. Thanks, @bcorrie. I'll merge this.

That weird Travis "expected but waiting" thing should be gone in every PR now.

Also, we need to get in touch with 10x about the umi_count field being in v1.4.

Change counts

74e2733

As per #543

Update _count field language and add umi_count to Rearrangement.

c4b6e8a

javh and others added 2 commits February 21, 2022 10:40

Merge branch 'master' into clone-update

7b53fa7

change description back to original

88facd7

schristley added 2 commits February 22, 2022 13:20

update description

a9d7c1b

update description

dd606fc

javh self-requested a review April 18, 2022 18:28

javh added this to the AIRR v1.4.0 milestone Apr 18, 2022

bcorrie added 3 commits April 22, 2022 09:40

Merge branch 'master' into clone-update

6186094

Sync schemas

67ff6df

Merge branch 'clone-update' of https://github.com/airr-community/airr…

33f9e01

…-standards into clone-update

bcorrie added 6 commits April 22, 2022 16:43

Sync spec

5616524

Sync with AIRR v2 spec

595adf5

Update to be synced with v2 spec

667d6ef

As required by consistency checks.

Update nullable attribute

481de11

Removed spaces

6579a86

Really it checks white space!!! 8-)

Adding nullable

cf6fad5

Trying to get consistency to pass, does it really require the nullable property?

More nullable...

a750ac1

javh approved these changes Apr 25, 2022

View reviewed changes

javh mentioned this pull request Apr 25, 2022

Update tools authors on relevant v1.4 changes #604

Closed

1 task

javh merged commit 746e55e into master Apr 25, 2022

javh deleted the clone-update branch April 25, 2022 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clone updates #586

Clone updates #586

bcorrie commented Feb 12, 2022 •

edited by scharch

Loading

bcorrie commented Feb 12, 2022 •

edited

Loading

javh commented Feb 21, 2022

schristley commented Feb 22, 2022

javh commented Feb 22, 2022 •

edited

Loading

schristley commented Feb 22, 2022

javh commented Feb 22, 2022 •

edited

Loading

javh commented Feb 22, 2022 •

edited

Loading

schristley commented Feb 22, 2022

javh commented Feb 22, 2022 •

edited

Loading

schristley commented Feb 22, 2022

scharch commented Feb 22, 2022

bcorrie commented Apr 22, 2022

bcorrie commented Apr 22, 2022

javh commented Apr 25, 2022

Clone updates #586

Clone updates #586

Conversation

bcorrie commented Feb 12, 2022 • edited by scharch Loading

bcorrie commented Feb 12, 2022 • edited Loading

javh commented Feb 21, 2022

schristley commented Feb 22, 2022

javh commented Feb 22, 2022 • edited Loading

schristley commented Feb 22, 2022

javh commented Feb 22, 2022 • edited Loading

javh commented Feb 22, 2022 • edited Loading

schristley commented Feb 22, 2022

javh commented Feb 22, 2022 • edited Loading

schristley commented Feb 22, 2022

scharch commented Feb 22, 2022

bcorrie commented Apr 22, 2022

bcorrie commented Apr 22, 2022

javh commented Apr 25, 2022

bcorrie commented Feb 12, 2022 •

edited by scharch

Loading

bcorrie commented Feb 12, 2022 •

edited

Loading

javh commented Feb 22, 2022 •

edited

Loading

javh commented Feb 22, 2022 •

edited

Loading

javh commented Feb 22, 2022 •

edited

Loading

javh commented Feb 22, 2022 •

edited

Loading