Question about interpretation of noise ceiling #388

Aceticia · 2024-02-01T17:37:59Z

Aceticia
Feb 1, 2024

Hi, when interpreting model comparisons, how should we interpret the noise ceiling? Sorry for reposting if this is discussed in one of your papers. Would appreciate some ideas here.

For example, let's say we have 4 models. Model A's similarity is significantly lower than lower noise ceiling, model B's similarity has a mean equal to the lower noise ceiling, and model C has a mean higher than the lower noise ceiling but not the upper noise ceiling. Finally, model D is higher than upper noise ceiling.

My first question is about comparing different models. Is this following interpretation about these 4 models correct: In general, we can say model A<B<C in terms of their ability to capture the variances in the data. But if I'm not mistaken, we need to be careful about comparing C or D since D is higher than the upper noise ceiling. Alternatively, would the correct interpretation be that you can say B,C,D are better than A but not among themselves?

My second question is about stating whether a model is an appropriate model. Is it a fair statement to say that in this example, model B,C,D are all as good of a model as we can get since reaching the noise ceiling implies we've explained all there is to explain?

Answered by nkriegeskorte

Feb 1, 2024

I wrote some quick responses to your particular questions below. More generally, you could read Nili et al. 2014 and Schuett et al. 2023 on the noise ceiling. From Nili et al. 2014: Importantly, the bar graph includes an estimate of the noise ceiling. The noise ceiling is the expected RDM correlation achieved by the (unknown) true model, given the noise in the data. An estimate of the noise ceiling is important for assessing to what extent the failure of a model to reach an RDM correlation close to 1 is caused by a deficiency of the model or by the limitations of the experiments (e.g. high measurement noise and/or limited amount of data). If the best model does not reach the noise ceiling…

View full answer

nkriegeskorte · 2024-02-01T20:45:21Z

nkriegeskorte
Feb 1, 2024
Maintainer

I wrote some quick responses to your particular questions below. More generally, you could read Nili et al. 2014 and Schuett et al. 2023 on the noise ceiling. From Nili et al. 2014: Importantly, the bar graph includes an estimate of the noise ceiling. The noise ceiling is the expected RDM correlation achieved by the (unknown) true model, given the noise in the data. An estimate of the noise ceiling is important for assessing to what extent the failure of a model to reach an RDM correlation close to 1 is caused by a deficiency of the model or by the limitations of the experiments (e.g. high measurement noise and/or limited amount of data). If the best model does not reach the noise ceiling, we should seek a better model. If the best model reaches the noise ceiling, but the ceiling is far below 1, we should improve our experimental technique, so as to gain sensitivity to enable us to detect any remaining deficiencies of our model. The noise ceiling is indicated by a gray horizontal bar, whose upper and lower edges correspond to upper- and lower-bound estimates on the group-average correlation with the RDM predicted by the unknown true model. Note that there is a hard upper limit to the average correlation with the single-subject reference-RDM estimates that any RDM can achieve for a given data set. Intuitively, the RDM maximizing the group-average correlation lies at the center of the cloud of single-subject RDM estimates. Where exactly this “central” RDM falls depends on the chosen correlation type. For the Pearson correlation, we first z-transform the single-subject RDMs. For the Spearman correlation, we rank-transform the RDMs. After this transformation, the squared Euclidean distance is proportional to the respective correlation distance. This motivates averaging of the single-subject RDMs to find the RDM that minimizes the average of the squared Euclidean distances and, thus, maximizes the average correlation (see Text S1 <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3990488/#pcbi.1003553.s007> for the proof). For Kendall's τA, we average the rank-transformed single-subject RDMs and use an iterative procedure to find the RDM that has the maximum average correlation to the single-subject RDMs. The average RDM (computed after the appropriate transform for each correlation type) can be thought of as an estimate of the true model's RDM. This estimate is overfitted to the single-subject RDMs. Its average correlation with the latter therefore overestimates the true model's average correlation, thus providing an upper bound. To estimate a lower bound, we employ a leave-one-subject-out approach. We compute each single-subject RDM's correlation with the average of the other subjects' RDMs. This prevents overfitting and underestimates the true model's average correlation because the amount of data is limited, thus providing a lower bound on the ceiling. From Schuett et al. 2023: Noise ceiling for model performance In addition to comparing models to each other, we compare models to a noise ceiling and to chance performance. The noise ceiling provides an estimate of the performance the true (data-generating) model would achieve. A model that approaches the noise ceiling (i.e. is not significantly below the noise ceiling) cannot be statistically rejected. We would need more data to reveal any remaining shortcomings of the model. The noise ceiling is not 1, because even the true group RDM would not perfectly predict all subjects’ RDMs because of the intersubject variability and noise affecting the RDM estimates. We estimate an upper and a lower bound for the true model’s performance (Nili et al., 2014 <https://elifesciences.org/articles/82566#bib83>). The upper bound is constructed by computing the RDM which performs best among all possible RDMs. Obviously, no model can perform better than this best RDM, so it provides a true upper bound. To estimate a lower bound, we use leave-one-out crossvalidation, computing the best performing RDM for all but one of the subjects and evaluating on the held-out subject. We can understand the upper and lower bound of the noise ceiling as uncrossvalidated and crossvalidated estimates of the performance of an overly flexible model that contains the true model. The uncrossvalidated estimate is expected to be higher than the true model’s performance because it is overfitted. The crossvalidated estimate is expected to be lower than the true model’s performance because it is compromised by the noise and subject-sampling variability in the data. For most RDM comparators, the best performing RDM can be derived analytically as a mean after adequate normalization of the single subject RDMs. For cosine similarity, they are normalized to unit norm. For Pearson correlation, the RDM vectors are normalized to zero mean and unit standard deviation. For the whitened measures the normalization is based on the norm induced by the noise precision instead, that is subject RDM vectors d are divided by dTΣ−1d−−−−−−−√ instead of the standard Euclidean norm dTd−−−−√. For the Spearman correlation, subject RDM vectors are first transformed to ranks. For Kendall’s τa, there is no efficient method to find the optimal RDM for a dataset, which is one of the reasons for using the Spearman rank correlation for RDM comparisons. If Kendall τ based inference is chosen nonetheless, the problem can be solved approximately by applying techniques for Kemeny–Young voting (Ali and Meilă, 2012 <https://elifesciences.org/articles/82566#bib4>) or by simply using the average ranks, which is a reasonable approximation, especially if the rank transformed RDMs are similar across subjects. In the toolbox, we currently use this approximation without further adjustment. For the lower bound, we use leave-one-out crossvalidation over subjects. To do this, each subject is once selected as the left-out subject and the best RDM to fit all other subjects is computed. The expected average performance of this RDM is a lower bound on the true model’s performance, because fitting all distances independently is technically a very flexible model, which performs the same generalization as the tested models. As all other models it should thus perform worse than or equal to the correct model. When flexible models are used, such that crossvalidation over conditions is performed, the computation of noise ceilings needs to take this into account (Storrs et al., 2014 <https://elifesciences.org/articles/82566#bib100>). Essentially, the computation of the noise ceilings is then restricted to the test sets of the crossvalidation, which takes into account which parts of the RDMs are used for evaluation.

On Thu, Feb 1, 2024 at 12:38 PM Xujin Chris Liu ***@***.***> wrote: Hi, when interpreting model comparisons, how should we interpret the noise ceiling? Sorry for reposting if this is discussed in one of your papers. Would appreciate some ideas here. For example, let's say we have 4 models. Model A's similarity is significantly lower than lower noise ceiling, model B's similarity has a mean equal to the lower noise ceiling, and model C has a mean higher than the lower noise ceiling but not the upper noise ceiling.

these can happen.

Finally, model D is higher than upper noise ceiling.

this cannot happen. it's a hard upper bound.

My first question is about comparing different models. Is this following interpretation about these 4 models correct: In general, we can say model A<B<C in terms of their ability to capture the variances in the data.

no. the noise ceiling is irrelevant to model comparisons. consider the model-comparative inferential results to compare models.

But if I'm not mistaken, we need to be careful about comparing C or D since D is higher than the upper noise ceiling.

D is impossible.

Alternatively, would the correct interpretation be that you can say B,C,D are better than A but not among themselves? My second question is about stating whether a model is an appropriate model. Is it a fair statement to say that in this example, model B,C,D are all as good of a model as we can get since reaching the noise ceiling implies we've explained all there is to explain?

yes, any model not significantly below the lower bound of the noise ceiling could be thought of as not significantly at odds with the data. it cannot be decisively rejected. this can happen for any combination of two different reasons: - the models are good. - the data are bad. if you reduce the amount of data, the noise ceiling drops and eventually no model will be significantly below the noise ceiling anymore -- no matter how qualitatively different the models are. if you have multiple models winning in the sense that they are not significantly below the noise ceiling (and not significantly different from each other in terms of predictive performance), it means that you need more data and/or a better experiment (e.g. different conditions) to discern shortcomings of the models and to see which of the models are better than which others. —

…

Reply to this email directly, view it on GitHub <#388>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALWMWYC2AGR6BW3BCR5QN2LYRPHIJAVCNFSM6AAAAABCVKEJJOVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZWGE3DONZYGI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

0 replies

nkriegeskorte · 2024-02-01T20:52:41Z

nkriegeskorte
Feb 1, 2024
Maintainer

just to clarify: a model not significantly below the lower bound is not "as good of a model as we can get", it is just as good as required to explain the data. the worse the data, the easier it is to reach the lower bound of the noise ceiling. if you haven't reached the noise ceiling, then it still makes sense to use the data to look for better models. once you have a model reaching the noise ceiling, you might still look for other models that also reach the noise ceiling, but you might also want to consider new experiments that reveal remaining shortcomings of your winning models.

…

____________________________________________________ Nikolaus Kriegeskorte, PhD Professor of Psychology and Neuroscience Affiliated member, Department of Electrical Engineering Director of Cognitive Imaging, Zuckerman Mind Brain Behavior Institute Columbia University ***@***.*** On Thu, Feb 1, 2024 at 3:44 PM Nikolaus Kriegeskorte ***@***.***> wrote:

I wrote some quick responses to your particular questions below. More generally, you could read Nili et al. 2014 and Schuett et al. 2023 on the noise ceiling. From Nili et al. 2014: Importantly, the bar graph includes an estimate of the noise ceiling. The noise ceiling is the expected RDM correlation achieved by the (unknown) true model, given the noise in the data. An estimate of the noise ceiling is important for assessing to what extent the failure of a model to reach an RDM correlation close to 1 is caused by a deficiency of the model or by the limitations of the experiments (e.g. high measurement noise and/or limited amount of data). If the best model does not reach the noise ceiling, we should seek a better model. If the best model reaches the noise ceiling, but the ceiling is far below 1, we should improve our experimental technique, so as to gain sensitivity to enable us to detect any remaining deficiencies of our model. The noise ceiling is indicated by a gray horizontal bar, whose upper and lower edges correspond to upper- and lower-bound estimates on the group-average correlation with the RDM predicted by the unknown true model. Note that there is a hard upper limit to the average correlation with the single-subject reference-RDM estimates that any RDM can achieve for a given data set. Intuitively, the RDM maximizing the group-average correlation lies at the center of the cloud of single-subject RDM estimates. Where exactly this “central” RDM falls depends on the chosen correlation type. For the Pearson correlation, we first z-transform the single-subject RDMs. For the Spearman correlation, we rank-transform the RDMs. After this transformation, the squared Euclidean distance is proportional to the respective correlation distance. This motivates averaging of the single-subject RDMs to find the RDM that minimizes the average of the squared Euclidean distances and, thus, maximizes the average correlation (see Text S1 <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3990488/#pcbi.1003553.s007> for the proof). For Kendall's τA, we average the rank-transformed single-subject RDMs and use an iterative procedure to find the RDM that has the maximum average correlation to the single-subject RDMs. The average RDM (computed after the appropriate transform for each correlation type) can be thought of as an estimate of the true model's RDM. This estimate is overfitted to the single-subject RDMs. Its average correlation with the latter therefore overestimates the true model's average correlation, thus providing an upper bound. To estimate a lower bound, we employ a leave-one-subject-out approach. We compute each single-subject RDM's correlation with the average of the other subjects' RDMs. This prevents overfitting and underestimates the true model's average correlation because the amount of data is limited, thus providing a lower bound on the ceiling. From Schuett et al. 2023: Noise ceiling for model performance In addition to comparing models to each other, we compare models to a noise ceiling and to chance performance. The noise ceiling provides an estimate of the performance the true (data-generating) model would achieve. A model that approaches the noise ceiling (i.e. is not significantly below the noise ceiling) cannot be statistically rejected. We would need more data to reveal any remaining shortcomings of the model. The noise ceiling is not 1, because even the true group RDM would not perfectly predict all subjects’ RDMs because of the intersubject variability and noise affecting the RDM estimates. We estimate an upper and a lower bound for the true model’s performance (Nili et al., 2014 <https://elifesciences.org/articles/82566#bib83>). The upper bound is constructed by computing the RDM which performs best among all possible RDMs. Obviously, no model can perform better than this best RDM, so it provides a true upper bound. To estimate a lower bound, we use leave-one-out crossvalidation, computing the best performing RDM for all but one of the subjects and evaluating on the held-out subject. We can understand the upper and lower bound of the noise ceiling as uncrossvalidated and crossvalidated estimates of the performance of an overly flexible model that contains the true model. The uncrossvalidated estimate is expected to be higher than the true model’s performance because it is overfitted. The crossvalidated estimate is expected to be lower than the true model’s performance because it is compromised by the noise and subject-sampling variability in the data. For most RDM comparators, the best performing RDM can be derived analytically as a mean after adequate normalization of the single subject RDMs. For cosine similarity, they are normalized to unit norm. For Pearson correlation, the RDM vectors are normalized to zero mean and unit standard deviation. For the whitened measures the normalization is based on the norm induced by the noise precision instead, that is subject RDM vectors d are divided by dTΣ−1d−−−−−−−√ instead of the standard Euclidean norm dTd−−−−√. For the Spearman correlation, subject RDM vectors are first transformed to ranks. For Kendall’s τa, there is no efficient method to find the optimal RDM for a dataset, which is one of the reasons for using the Spearman rank correlation for RDM comparisons. If Kendall τ based inference is chosen nonetheless, the problem can be solved approximately by applying techniques for Kemeny–Young voting (Ali and Meilă, 2012 <https://elifesciences.org/articles/82566#bib4>) or by simply using the average ranks, which is a reasonable approximation, especially if the rank transformed RDMs are similar across subjects. In the toolbox, we currently use this approximation without further adjustment. For the lower bound, we use leave-one-out crossvalidation over subjects. To do this, each subject is once selected as the left-out subject and the best RDM to fit all other subjects is computed. The expected average performance of this RDM is a lower bound on the true model’s performance, because fitting all distances independently is technically a very flexible model, which performs the same generalization as the tested models. As all other models it should thus perform worse than or equal to the correct model. When flexible models are used, such that crossvalidation over conditions is performed, the computation of noise ceilings needs to take this into account (Storrs et al., 2014 <https://elifesciences.org/articles/82566#bib100>). Essentially, the computation of the noise ceilings is then restricted to the test sets of the crossvalidation, which takes into account which parts of the RDMs are used for evaluation. On Thu, Feb 1, 2024 at 12:38 PM Xujin Chris Liu ***@***.***> wrote: > Hi, when interpreting model comparisons, how should we interpret the > noise ceiling? Sorry for reposting if this is discussed in one of your > papers. Would appreciate some ideas here. > > For example, let's say we have 4 models. Model A's similarity is > significantly lower than lower noise ceiling, model B's similarity has a > mean equal to the lower noise ceiling, and model C has a mean higher than > the lower noise ceiling but not the upper noise ceiling. > these can happen. > Finally, model D is higher than upper noise ceiling. > this cannot happen. it's a hard upper bound. > My first question is about comparing different models. Is this following > interpretation about these 4 models correct: In general, we can say model > A<B<C in terms of their ability to capture the variances in the data. > no. the noise ceiling is irrelevant to model comparisons. consider the model-comparative inferential results to compare models. > But if I'm not mistaken, we need to be careful about comparing C or D > since D is higher than the upper noise ceiling. > D is impossible. > Alternatively, would the correct interpretation be that you can say B,C,D > are better than A but not among themselves? > > My second question is about stating whether a model is an appropriate > model. Is it a fair statement to say that in this example, model B,C,D are > all as good of a model as we can get since reaching the noise ceiling > implies we've explained all there is to explain? > yes, any model not significantly below the lower bound of the noise ceiling could be thought of as not significantly at odds with the data. it cannot be decisively rejected. this can happen for any combination of two different reasons: - the models are good. - the data are bad. if you reduce the amount of data, the noise ceiling drops and eventually no model will be significantly below the noise ceiling anymore -- no matter how qualitatively different the models are. if you have multiple models winning in the sense that they are not significantly below the noise ceiling (and not significantly different from each other in terms of predictive performance), it means that you need more data and/or a better experiment (e.g. different conditions) to discern shortcomings of the models and to see which of the models are better than which others. — > Reply to this email directly, view it on GitHub > <#388>, or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ALWMWYC2AGR6BW3BCR5QN2LYRPHIJAVCNFSM6AAAAABCVKEJJOVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZWGE3DONZYGI> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> >

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about interpretation of noise ceiling #388

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Question about interpretation of noise ceiling #388

Aceticia Feb 1, 2024

Replies: 2 comments

nkriegeskorte Feb 1, 2024 Maintainer

nkriegeskorte Feb 1, 2024 Maintainer

Aceticia
Feb 1, 2024

nkriegeskorte
Feb 1, 2024
Maintainer

nkriegeskorte
Feb 1, 2024
Maintainer