Question about interpretation of noise ceiling #388
Replies: 2 comments
-
I wrote some quick responses to your particular questions below.
More generally, you could read Nili et al. 2014 and Schuett et al. 2023 on
the noise ceiling.
From Nili et al. 2014:
Importantly, the bar graph includes an estimate of the noise ceiling. The
noise ceiling is the expected RDM correlation achieved by the (unknown)
true model, given the noise in the data. An estimate of the noise ceiling
is important for assessing to what extent the failure of a model to reach
an RDM correlation close to 1 is caused by a deficiency of the model or by
the limitations of the experiments (e.g. high measurement noise and/or
limited amount of data). If the best model does not reach the noise
ceiling, we should seek a better model. If the best model reaches the noise
ceiling, but the ceiling is far below 1, we should improve our experimental
technique, so as to gain sensitivity to enable us to detect any remaining
deficiencies of our model.
The noise ceiling is indicated by a gray horizontal bar, whose upper and
lower edges correspond to upper- and lower-bound estimates on the
group-average correlation with the RDM predicted by the unknown true model.
Note that there is a hard upper limit to the average correlation with the
single-subject reference-RDM estimates that any RDM can achieve for a given
data set. Intuitively, the RDM maximizing the group-average correlation
lies at the center of the cloud of single-subject RDM estimates. Where
exactly this “central” RDM falls depends on the chosen correlation type.
For the Pearson correlation, we first z-transform the single-subject RDMs.
For the Spearman correlation, we rank-transform the RDMs. After this
transformation, the squared Euclidean distance is proportional to the
respective correlation distance. This motivates averaging of the
single-subject RDMs to find the RDM that minimizes the average of the
squared Euclidean distances and, thus, maximizes the average correlation
(see Text S1
<https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3990488/#pcbi.1003553.s007> for
the proof). For Kendall's τA, we average the rank-transformed
single-subject RDMs and use an iterative procedure to find the RDM that has
the maximum average correlation to the single-subject RDMs.
The average RDM (computed after the appropriate transform for each
correlation type) can be thought of as an estimate of the true model's RDM.
This estimate is overfitted to the single-subject RDMs. Its average
correlation with the latter therefore overestimates the true model's
average correlation, thus providing an upper bound. To estimate a lower
bound, we employ a leave-one-subject-out approach. We compute each
single-subject RDM's correlation with the average of the other subjects'
RDMs. This prevents overfitting and underestimates the true model's average
correlation because the amount of data is limited, thus providing a lower
bound on the ceiling.
From Schuett et al. 2023:
Noise ceiling for model performance
In addition to comparing models to each other, we compare models to a noise
ceiling and to chance performance. The noise ceiling provides an estimate
of the performance the true (data-generating) model would achieve. A model
that approaches the noise ceiling (i.e. is not significantly below the
noise ceiling) cannot be statistically rejected. We would need more data to
reveal any remaining shortcomings of the model. The noise ceiling is not 1,
because even the true group RDM would not perfectly predict all subjects’
RDMs because of the intersubject variability and noise affecting the RDM
estimates. We estimate an upper and a lower bound for the true model’s
performance (Nili et al., 2014
<https://elifesciences.org/articles/82566#bib83>). The upper bound is
constructed by computing the RDM which performs best among all possible
RDMs. Obviously, no model can perform better than this best RDM, so it
provides a true upper bound. To estimate a lower bound, we use
leave-one-out crossvalidation, computing the best performing RDM for all
but one of the subjects and evaluating on the held-out subject. We can
understand the upper and lower bound of the noise ceiling as
uncrossvalidated and crossvalidated estimates of the performance of an
overly flexible model that contains the true model. The uncrossvalidated
estimate is expected to be higher than the true model’s performance because
it is overfitted. The crossvalidated estimate is expected to be lower than
the true model’s performance because it is compromised by the noise and
subject-sampling variability in the data.
For most RDM comparators, the best performing RDM can be derived
analytically as a mean after adequate normalization of the single subject
RDMs. For cosine similarity, they are normalized to unit norm. For Pearson
correlation, the RDM vectors are normalized to zero mean and unit standard
deviation. For the whitened measures the normalization is based on the norm
induced by the noise precision instead, that is subject RDM vectors d are
divided by dTΣ−1d−−−−−−−√ instead of the standard Euclidean norm dTd−−−−√.
For the Spearman correlation, subject RDM vectors are first transformed to
ranks.
For Kendall’s τa, there is no efficient method to find the optimal RDM for
a dataset, which is one of the reasons for using the Spearman rank
correlation for RDM comparisons. If Kendall τ based inference is chosen
nonetheless, the problem can be solved approximately by applying techniques
for Kemeny–Young voting (Ali and Meilă, 2012
<https://elifesciences.org/articles/82566#bib4>) or by simply using the
average ranks, which is a reasonable approximation, especially if the rank
transformed RDMs are similar across subjects. In the toolbox, we currently
use this approximation without further adjustment.
For the lower bound, we use leave-one-out crossvalidation over subjects. To
do this, each subject is once selected as the left-out subject and the best
RDM to fit all other subjects is computed. The expected average performance
of this RDM is a lower bound on the true model’s performance, because
fitting all distances independently is technically a very flexible model,
which performs the same generalization as the tested models. As all other
models it should thus perform worse than or equal to the correct model.
When flexible models are used, such that crossvalidation over conditions is
performed, the computation of noise ceilings needs to take this into
account (Storrs et al., 2014
<https://elifesciences.org/articles/82566#bib100>). Essentially, the
computation of the noise ceilings is then restricted to the test sets of
the crossvalidation, which takes into account which parts of the RDMs are
used for evaluation.
On Thu, Feb 1, 2024 at 12:38 PM Xujin Chris Liu ***@***.***> wrote:
Hi, when interpreting model comparisons, how should we interpret the noise
ceiling? Sorry for reposting if this is discussed in one of your papers.
Would appreciate some ideas here.
For example, let's say we have 4 models. Model A's similarity is
significantly lower than lower noise ceiling, model B's similarity has a
mean equal to the lower noise ceiling, and model C has a mean higher than
the lower noise ceiling but not the upper noise ceiling.
these can happen.
Finally, model D is higher than upper noise ceiling.
this cannot happen. it's a hard upper bound.
My first question is about comparing different models. Is this following
interpretation about these 4 models correct: In general, we can say model
A<B<C in terms of their ability to capture the variances in the data.
no. the noise ceiling is irrelevant to model comparisons. consider the
model-comparative inferential results to compare models.
But if I'm not mistaken, we need to be careful about comparing C or D
since D is higher than the upper noise ceiling.
D is impossible.
Alternatively, would the correct interpretation be that you can say B,C,D
are better than A but not among themselves?
My second question is about stating whether a model is an appropriate
model. Is it a fair statement to say that in this example, model B,C,D are
all as good of a model as we can get since reaching the noise ceiling
implies we've explained all there is to explain?
yes, any model not significantly below the lower bound of the noise ceiling
could be thought of as not significantly at odds with the data. it cannot
be decisively rejected.
this can happen for any combination of two different reasons:
- the models are good.
- the data are bad.
if you reduce the amount of data, the noise ceiling drops and eventually no
model will be significantly below the noise ceiling anymore -- no matter
how qualitatively different the models are.
if you have multiple models winning in the sense that they are not
significantly below the noise ceiling (and not significantly different from
each other in terms of predictive performance), it means that you need more
data and/or a better experiment (e.g. different conditions) to discern
shortcomings of the models and to see which of the models are better than
which others.
—
… Reply to this email directly, view it on GitHub
<#388>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALWMWYC2AGR6BW3BCR5QN2LYRPHIJAVCNFSM6AAAAABCVKEJJOVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZWGE3DONZYGI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
Aceticia
-
just to clarify: a model not significantly below the lower bound is not "as
good of a model as we can get", it is just as good as required to explain
the data.
the worse the data, the easier it is to reach the lower bound of the noise
ceiling.
if you haven't reached the noise ceiling, then it still makes sense to use
the data to look for better models.
once you have a model reaching the noise ceiling, you might still look for
other models that also reach the noise ceiling, but you might also want to
consider new experiments that reveal remaining shortcomings of your winning
models.
…____________________________________________________
Nikolaus Kriegeskorte, PhD
Professor of Psychology and Neuroscience
Affiliated member, Department of Electrical Engineering
Director of Cognitive Imaging, Zuckerman Mind Brain Behavior Institute
Columbia University
***@***.***
On Thu, Feb 1, 2024 at 3:44 PM Nikolaus Kriegeskorte ***@***.***>
wrote:
I wrote some quick responses to your particular questions below.
More generally, you could read Nili et al. 2014 and Schuett et al. 2023 on
the noise ceiling.
From Nili et al. 2014:
Importantly, the bar graph includes an estimate of the noise ceiling. The
noise ceiling is the expected RDM correlation achieved by the (unknown)
true model, given the noise in the data. An estimate of the noise ceiling
is important for assessing to what extent the failure of a model to reach
an RDM correlation close to 1 is caused by a deficiency of the model or by
the limitations of the experiments (e.g. high measurement noise and/or
limited amount of data). If the best model does not reach the noise
ceiling, we should seek a better model. If the best model reaches the noise
ceiling, but the ceiling is far below 1, we should improve our experimental
technique, so as to gain sensitivity to enable us to detect any remaining
deficiencies of our model.
The noise ceiling is indicated by a gray horizontal bar, whose upper and
lower edges correspond to upper- and lower-bound estimates on the
group-average correlation with the RDM predicted by the unknown true model.
Note that there is a hard upper limit to the average correlation with the
single-subject reference-RDM estimates that any RDM can achieve for a given
data set. Intuitively, the RDM maximizing the group-average correlation
lies at the center of the cloud of single-subject RDM estimates. Where
exactly this “central” RDM falls depends on the chosen correlation type.
For the Pearson correlation, we first z-transform the single-subject RDMs.
For the Spearman correlation, we rank-transform the RDMs. After this
transformation, the squared Euclidean distance is proportional to the
respective correlation distance. This motivates averaging of the
single-subject RDMs to find the RDM that minimizes the average of the
squared Euclidean distances and, thus, maximizes the average correlation
(see Text S1
<https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3990488/#pcbi.1003553.s007> for
the proof). For Kendall's τA, we average the rank-transformed
single-subject RDMs and use an iterative procedure to find the RDM that has
the maximum average correlation to the single-subject RDMs.
The average RDM (computed after the appropriate transform for each
correlation type) can be thought of as an estimate of the true model's RDM.
This estimate is overfitted to the single-subject RDMs. Its average
correlation with the latter therefore overestimates the true model's
average correlation, thus providing an upper bound. To estimate a lower
bound, we employ a leave-one-subject-out approach. We compute each
single-subject RDM's correlation with the average of the other subjects'
RDMs. This prevents overfitting and underestimates the true model's average
correlation because the amount of data is limited, thus providing a lower
bound on the ceiling.
From Schuett et al. 2023:
Noise ceiling for model performance
In addition to comparing models to each other, we compare models to a
noise ceiling and to chance performance. The noise ceiling provides an
estimate of the performance the true (data-generating) model would achieve.
A model that approaches the noise ceiling (i.e. is not significantly below
the noise ceiling) cannot be statistically rejected. We would need more
data to reveal any remaining shortcomings of the model. The noise ceiling
is not 1, because even the true group RDM would not perfectly predict all
subjects’ RDMs because of the intersubject variability and noise affecting
the RDM estimates. We estimate an upper and a lower bound for the true
model’s performance (Nili et al., 2014
<https://elifesciences.org/articles/82566#bib83>). The upper bound is
constructed by computing the RDM which performs best among all possible
RDMs. Obviously, no model can perform better than this best RDM, so it
provides a true upper bound. To estimate a lower bound, we use
leave-one-out crossvalidation, computing the best performing RDM for all
but one of the subjects and evaluating on the held-out subject. We can
understand the upper and lower bound of the noise ceiling as
uncrossvalidated and crossvalidated estimates of the performance of an
overly flexible model that contains the true model. The uncrossvalidated
estimate is expected to be higher than the true model’s performance because
it is overfitted. The crossvalidated estimate is expected to be lower than
the true model’s performance because it is compromised by the noise and
subject-sampling variability in the data.
For most RDM comparators, the best performing RDM can be derived
analytically as a mean after adequate normalization of the single subject
RDMs. For cosine similarity, they are normalized to unit norm. For Pearson
correlation, the RDM vectors are normalized to zero mean and unit standard
deviation. For the whitened measures the normalization is based on the norm
induced by the noise precision instead, that is subject RDM vectors d are
divided by dTΣ−1d−−−−−−−√ instead of the standard Euclidean norm dTd−−−−√.
For the Spearman correlation, subject RDM vectors are first transformed to
ranks.
For Kendall’s τa, there is no efficient method to find the optimal RDM
for a dataset, which is one of the reasons for using the Spearman rank
correlation for RDM comparisons. If Kendall τ based inference is chosen
nonetheless, the problem can be solved approximately by applying techniques
for Kemeny–Young voting (Ali and Meilă, 2012
<https://elifesciences.org/articles/82566#bib4>) or by simply using the
average ranks, which is a reasonable approximation, especially if the rank
transformed RDMs are similar across subjects. In the toolbox, we currently
use this approximation without further adjustment.
For the lower bound, we use leave-one-out crossvalidation over subjects.
To do this, each subject is once selected as the left-out subject and the
best RDM to fit all other subjects is computed. The expected average
performance of this RDM is a lower bound on the true model’s performance,
because fitting all distances independently is technically a very flexible
model, which performs the same generalization as the tested models. As all
other models it should thus perform worse than or equal to the correct
model.
When flexible models are used, such that crossvalidation over conditions
is performed, the computation of noise ceilings needs to take this into
account (Storrs et al., 2014
<https://elifesciences.org/articles/82566#bib100>). Essentially, the
computation of the noise ceilings is then restricted to the test sets of
the crossvalidation, which takes into account which parts of the RDMs are
used for evaluation.
On Thu, Feb 1, 2024 at 12:38 PM Xujin Chris Liu ***@***.***>
wrote:
> Hi, when interpreting model comparisons, how should we interpret the
> noise ceiling? Sorry for reposting if this is discussed in one of your
> papers. Would appreciate some ideas here.
>
> For example, let's say we have 4 models. Model A's similarity is
> significantly lower than lower noise ceiling, model B's similarity has a
> mean equal to the lower noise ceiling, and model C has a mean higher than
> the lower noise ceiling but not the upper noise ceiling.
>
these can happen.
> Finally, model D is higher than upper noise ceiling.
>
this cannot happen. it's a hard upper bound.
> My first question is about comparing different models. Is this following
> interpretation about these 4 models correct: In general, we can say model
> A<B<C in terms of their ability to capture the variances in the data.
>
no. the noise ceiling is irrelevant to model comparisons. consider the
model-comparative inferential results to compare models.
> But if I'm not mistaken, we need to be careful about comparing C or D
> since D is higher than the upper noise ceiling.
>
D is impossible.
> Alternatively, would the correct interpretation be that you can say B,C,D
> are better than A but not among themselves?
>
> My second question is about stating whether a model is an appropriate
> model. Is it a fair statement to say that in this example, model B,C,D are
> all as good of a model as we can get since reaching the noise ceiling
> implies we've explained all there is to explain?
>
yes, any model not significantly below the lower bound of the noise
ceiling could be thought of as not significantly at odds with the data. it
cannot be decisively rejected.
this can happen for any combination of two different reasons:
- the models are good.
- the data are bad.
if you reduce the amount of data, the noise ceiling drops and eventually
no model will be significantly below the noise ceiling anymore -- no matter
how qualitatively different the models are.
if you have multiple models winning in the sense that they are not
significantly below the noise ceiling (and not significantly different from
each other in terms of predictive performance), it means that you need more
data and/or a better experiment (e.g. different conditions) to discern
shortcomings of the models and to see which of the models are better than
which others.
—
> Reply to this email directly, view it on GitHub
> <#388>, or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ALWMWYC2AGR6BW3BCR5QN2LYRPHIJAVCNFSM6AAAAABCVKEJJOVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZWGE3DONZYGI>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, when interpreting model comparisons, how should we interpret the noise ceiling? Sorry for reposting if this is discussed in one of your papers. Would appreciate some ideas here.
For example, let's say we have 4 models. Model A's similarity is significantly lower than lower noise ceiling, model B's similarity has a mean equal to the lower noise ceiling, and model C has a mean higher than the lower noise ceiling but not the upper noise ceiling. Finally, model D is higher than upper noise ceiling.
My first question is about comparing different models. Is this following interpretation about these 4 models correct: In general, we can say model A<B<C in terms of their ability to capture the variances in the data. But if I'm not mistaken, we need to be careful about comparing C or D since D is higher than the upper noise ceiling. Alternatively, would the correct interpretation be that you can say B,C,D are better than A but not among themselves?
My second question is about stating whether a model is an appropriate model. Is it a fair statement to say that in this example, model B,C,D are all as good of a model as we can get since reaching the noise ceiling implies we've explained all there is to explain?
Beta Was this translation helpful? Give feedback.
All reactions