You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Extract those trials where humans: (a) consistently succeed; (b) are close to chance; (c) systematically fail. These can get passed into curiophysics for interestingness annotation.
More detailed error analysis: on which scenarios / instances did humans and models diverge the most?
Extract those trials where Visualize some model predictions -- can we tell why some of the vision models fail/succeed some trials?
Which models’ behavior were most similar to which other models’?
The text was updated successfully, but these errors were encountered:
These can get passed into curiophysics for interestingness annotation.
Extract those trials where Visualize some model predictions -- can we tell why some of the vision models fail/succeed some trials?
The text was updated successfully, but these errors were encountered: