Allow filtering samples by compound expressions including multiple scorers #1073

andrei-apollo · 2025-01-03T22:12:52Z

This is a reincarnation of #911

This PR contains:

What is the current behavior?

Samples can be filtered based on simple conditions including one scorer.

What is the new behavior?

Samples can be filtered by compound expressions like result == "C" and steps <= 10. Additionally, samples can be filtered based on input and target texts.

Expression parsing via filtrex. Supports arithmetic, basic math functions, Python-style boolean operations, chained comparisons.
Filter input via CodeMirror. Supports syntax highlighting and autocompletion.
The filter expression can include any scorer, not just the selected one.
Clicking on a score adds it to the filter. Moreover, for simple categorical scores the UI will automatically suggest expressions like result == "C".

Auxiliary changes:

Merged scorer and score selectors. Did this to keep tool panel width in check now that the filter field is wider.
Moved filter and scorer list to the right so that they are nicely aligned.
Made scorer list collision-proof. Now if two scorers define scores with the same name, the scorers panel will use dot notation to disambiguate, e.g. score.foo vs other_score.foo.

Does this PR introduce a breaking change?

No.

Other information:

Next steps:

I would like to also allow filtering by sample metadata and full-text search over the transcript. This is now easy to do from the UI perspective, but this would require loading the entire samples, not just the summaries.
Consider if the tool order could be improved. I find it a little confusing that the filter is to the right of the scorer selector, yet does not depend on it. Not sure how best to fix this, because I want to keep the filter aligned with the scorer list.

This PR is a work in progress. In particular, I remember that @dragonstyle suggested to only apply filter on Enter. This haven't been done yet. I'm also still figuring out some corner cases with different score types. Still, @dragonstyle, if you could take a look at the current state I would appreciate your feedback. Do you this is moving the right direction?

dragonstyle · 2025-01-06T15:32:39Z

This is looking really great to me, and I love that we'll be able to support much more robust filtering! I believe that the right autocomplete experience can make this nearly as easy to use as the simple selector. I have some suggestions to get there:

When the control is focused on(and empty), we should show autocomplete suggestions for the first 'segment' (the user is still free to type, but will typically see the scorer names as suggestions, for sample). Each time the user completes a 'segment' of the expression, I think we should automatically prompt for the next segment. e.g. Once I have a scorer name, we should suggest the various "==", "<", etc.. Once that is selected, if the scorer is categorical, we could suggest values). This will make the simple case of filtering by scorer just about as simple as it is now. It also will make learning very discoverable as users will see options each step of the way. (Sometimes showing autocomplete for the next 'segment' be possible, but when we can do so reasonably, I think we should).
The click scorer name is a pretty obscure affordance and I don't think we should rely on it (I'd like to see this removed and just solve for discovery using the filter input itself).
If you agree we can remove the scorer link affordance, I think we could move the filter to the left side of the scorer selector- I don't think its proximity to the scorer list is important in this case.
As you noted, filtering as each key is pressed is very disruptive (since all samples will always just 'disappear' until the expression is complete). I very much think we need to use enter, a wait / debounce, background evaluation to ensure it is a 'complete expression' or some other affordance to 'accept' the filter rather than filtering immediately upon key down.
I would make the (i) icon a (?) icon.
For advanced feature (awesome!) like input_contains, I think we should autocomplete to the function with the cursor in between parens if possible (e.g. input_contains(|))

Don't mean to flood with feedback, but this is definitely getting there and I'm looking forward to merging it!!

andrei-apollo · 2025-01-09T17:36:50Z

Thank for the feedback! Implemented all suggestions, please take a look.

dragonstyle · 2025-01-09T18:10:00Z

This is looking amazing! A few suggestions that are hopefully just minor tweaks...

Can we delay showing the expression error until the user has pressed enter or in some indicated that they are done? I think we're showing the red error wrapper too aggressively (and the red squiggles seems like plenty unless the user actually runs an expression which results in an error).
Currently, selecting the entry adds the selected text, but then the user needs to press space between steps to get the next suggestion, would be sweet if user selects options from menu if we just offer next step directly. Do you think its possible to enable the equivalent of this:

focus

select choice (press enter)

select equal (press enter)

select "I" (arrow, then enter)
Question - could we make the delete case just apply immediately since we 'know' that is a complete expression? This too inconsistent?
For long expressions, I am seeing a scroll bars (using safari) which are disruptive...

dragonstyle · 2025-01-09T18:11:33Z

(These check failures are not related to this PR and are related to ruff dependency version changes. They are now fixed on main so if you rebase against main they should go away - sorry!)

Uses filtrex to support compound expressions that allow to filter samples by multiple scores at a time.

… likely to continue

andrei-apollo · 2025-01-09T19:25:01Z

Good idea! That error message was annoying. Done.
Done.
Feels rather inconsistent too me, to be honest.
Hmm. Weird. For me it works fine in Safari as well:

What version do you have? Does the scrollbar always look like this or only sometimes?

dragonstyle · 2025-01-09T19:26:38Z

What version do you have? Does the scrollbar always look like this or only sometimes?

Version 18.1.1 (20619.2.8.11.12).

I only see it once I make the expression long and scroll with the mouse...

dragonstyle · 2025-01-09T19:49:39Z

One other question - rather than show the green feedback treatment once an expression is complete, maybe we should just apply the expression at that point since we know it will work?

It would still result in some changes to filtering in cases where the filters didn't narrow the set (e.g or) but I think that would be worth getting an even smoother experience.

andrei-apollo · 2025-01-09T22:21:08Z

My Safari version is slightly different (Version 18.2 (20620.1.16.11.8)), don't know if it's related or not. To be honest, debugging this kind of failure without being able to reproduce it would be quite hard. I decided to just remove the scroll bar. It's quite unusual for single-line text inputs to have scroll bars anyway.

andrei-apollo · 2025-01-09T22:24:09Z

Agreed. Applying the filter expression immediately but only when it's valid seems like a good approach. Changed the behavior and added color-coding, which is hopefully noticeable enough to make the current state clear, but not so much as to be distracting.

dragonstyle · 2025-01-09T22:34:32Z

This is a great improvement over our current filtering.

Nits:

I personally find the green outline maybe a bit overkill (perhaps we could just reflect the error or incomplete states and treat success states as just having no feedback). That said, I can definitely live with this approach if you feel strongly that the green is needed helpful.
I noticed that the popup options can sometimes be a bit aggressive. I'm not sure what the rule to filter this would be (or if there is a consistent rule to be applied, but I notice for an expression like: choice == "I" or input_contains("parallel") if I go back to edit the 'or' to 'and', it will popup choices after I complete the 'and'.
One tiny thing that might be a side effect of the clickable scorers - I think the duration can go back to the far right and the scorers in the middle now that they don't proximity to the filter.

I haven't looked closely at the code itself - LMK if you think that is ready to go and I can take a look (or just ping me whenever you think its good to go).

dragonstyle · 2025-01-11T20:47:25Z

Also note - good feature suggestion for error filtering here:

https://inspectcommunity.slack.com/archives/C080ET25C81/p1736618476283839

andrei-apollo · 2025-01-13T12:43:52Z

Removed the green outline. It was an experiment, I wasn't sure how I feel about it myself. What is really important IMO is to give some indication whether the filter is applied or not. But graying out inactive filters is sufficient for that.

Restored the secondary bar item order. Sorry I forgot to revert this when moving the filter to the left.

I thought you wanted aggressive popups :) But maybe I misunderstood what you were suggesting earlier. It's true that now you would get a completion suggestion after or. I would like to note that it happens only if you auto-complete the or with an Enter, which is not something you really need to do. So my mental model for the current implementation is: “whenever you press Enter, you want the system to suggest you something afterwards”. But again, I agree that the current completion are aggressive. If you think so too, how about we change the logic to never auto suggest in the middle of an expression?

I think we have a lot of opportunities to add more features to the new filter: integrate epoch filter, filter by sample metadata, add full-text transcript search, etc. I've added the one you suggested to my list, but we can do it later, right? No need to do everything in this PR, I think

dragonstyle · 2025-01-13T13:26:00Z

But again, I agree that the current completion are aggressive. If you think so too, how about we change the logic to never auto suggest in the middle of an expression?

Sounds perfect to me!

I think we have a lot of opportunities to add more features to the new filter: integrate epoch filter, filter by sample metadata, add full-text transcript search, etc. I've added the one you suggested to my list, but we can do it later, right? No need to do everything in this PR, I think

+100!

andrei-apollo · 2025-01-13T14:16:49Z

Disabled automatic suggestions in the middle (except after a dot). WDYT?

…sion

dragonstyle · 2025-01-13T15:13:12Z

Looks great to me. One final tiny nit - since the filter is often the first focusable element on the page, it gets focus by default (which triggers showing the autocompletion). What would you think about filtering this case (just using isTrusted to limit the auto display of the completion to user generated focus)?

Taking a quick pass through the code now.

andrei-apollo · 2025-01-13T15:36:45Z

Good catch. It was actually worse then that: for some reason I decided that the filter field should be focused whenever the filter changes externally, so the call was coming from inside the house. Fixed. And added isTrusted check just in case, although it doesn't seem to be needed at this point.

dragonstyle

Look great having gone through the code. One open question and a few styling things specific to vscode. Could chat anytime in realtime if helpful!

src/inspect_ai/_view/www/src/samples/tools/SelectScorer.mjs

src/inspect_ai/_view/www/src/samples/tools/filters.mjs

dragonstyle · 2025-01-13T17:58:00Z

src/inspect_ai/_view/www/App.css

+[data-tooltip] {
+  position: relative;
+}
+[data-tooltip]:hover::after {


I think this needs:

background: var(--bs-body-bg); opacity: 1;

The body-bg is set dynamically in the case of VSCode (to match the vscode theme). Best test there is to open a log in vscode and toggle theme (usually just selecting a dark theme is enough to uncover issues)

Beyond the tooltip, I think we need to set up the overall theme style (attached image shows what I'm talking bout). Currently we 'forward' vscode colors into css vars as needed in App.css here:

body[class^="vscode-"] { --bs-border-radius: 0; --bs-border-radius-lg: 0; --bs-body-bg: var(--vscode-editor-background); --bs-card-bg: var(--vscode-editor-background); --bs-table-bg: var(--vscode-editor-background); --bs-light-bg-subtle: var(--vscode-sideBar-background); --bs-light-border-subtle: var(--vscode-sideBarSectionHeader-border); --bs-body-color: var(--vscode-editor-foreground); --bs-table-color: var(--vscode-editor-foreground); --bs-accordion-btn-color: var(--vscode-editor-foreground); --bs-emphasis-color: var(--vscode-editor-foreground); --bs-navbar-brand-color: var(--vscode-editor-foreground); --bs-navbar-brand-hover-color: var(--vscode-editor-foreground); --bs-code-color: var(--vscode-editorInfo-foreground); --bs-light: var(--vscode-sideBar-background); --bs-btn-bg: var(--vscode-peekViewTitle-background); --bs-primary: var(--vscode-banner-iconForeground); --bs-nav-pills-link-active-bg: var(--vscode-banner-iconForeground); --bs-secondary: var(--vscode-breadcrumb-foreground); --bs-secondary-bg: var(--vscode-list-inactiveSelectionBackground); --bs-border-color: var(--vscode-editorGroup-border); --bs-card-border-color: var(--vscode-editorGroup-border); --bs-warning-bg-subtle: var(--vscode-inputValidation-warningBackground); --bs-warning-text-emphasis: var(--vscode-input-foreground); --inspect-find-background: var(--vscode-editorWidget-background); --inspect-find-foreground: var(--vscode-editorWidget-foreground); --inspect-input-background: var(--vscode-input-background); --inspect-input-foreground: var(--vscode-input-foreground); --inspect-input-border: var(--vscode-input-border); --inspect-diff-add-color: var(--vscode-diffEditor-insertedTextBackground); --inspect-diff-remove-color: var(--vscode-diffEditor-removedTextBackground); }

The process of discovering vscode colors is more art than science as they are not well documented. I usually just use the 'Open Webview Developer Tools' within vscode and browser the variables available to the viewer and try to select colors that seem correct + might have names that seem to be likely to be semantically related to what they'll represent in inspect.

Happy to chat more real time if useful...

Sorry meant to include this screenshot as well...

Yes, good point! I replaced hard-coded values with CSS variables and fixed the dark theme. At least I hope I did.
Unfortunately I could not find the variables for the build-in selection colors and for the BS focus outline color (--inspect-focus-border-color), so I had to specify those manually. But pretty much everything else is taken from the theme now.

dragonstyle · 2025-01-13T18:28:46Z

src/inspect_ai/_view/www/src/samples/tools/SampleFilter.mjs

+  { tag: tags.number, color: "#164" },
+  { tag: tags.keyword, color: "#708" },
+  { tag: tags.function(tags.variableName), color: "#00c" },
+]);


We should be using CSS vars (if we can) to allow vscode overrides to work properly. Alternatively, we could select middle ground hues of the colors (e.g. the red is currently pretty good in both light and dark, but the others are dark enough that they are difficult in dark themes).

Used built-in token colors. They already have dark theme variations, which is nice. I slightly dislike the fact that there no highlighting for strings in the light theme, but this is a very minor issue. We could always go back to this later and tweak some more if we want.

andrei-apollo force-pushed the main branch from c768a95 to 86736cc Compare January 3, 2025 22:19

jjallaire requested a review from dragonstyle January 4, 2025 00:47

andrei-apollo closed this Jan 9, 2025

andrei-apollo force-pushed the main branch from c879e4e to faf3b7d Compare January 9, 2025 16:27

andrei-apollo reopened this Jan 9, 2025

andrei-apollo marked this pull request as ready for review January 9, 2025 17:32

andrei-apollo added 10 commits January 9, 2025 19:12

Combine scorer and score selectors

3525aac

Filter expressions for samples

20902d8

Uses filtrex to support compound expressions that allow to filter samples by multiple scores at a time.

Add input_contains and target_contains samples filter predicates

a4b26f6

Use CodeMirror for sample filter input

55d905e

Revert score clickability

d3b9454

Smarter filter autocompletion and better score selector

780200f

More robust logic to ensure single line

d4c6042

Apply filter on Enter

90989a3

Automatically insert space after completion when filter expression is…

d7fb76a

… likely to continue

Show filter error text only on Enter

05a4221

andrei-apollo force-pushed the main branch from 3617f14 to 05a4221 Compare January 9, 2025 19:18

andrei-apollo added 3 commits January 9, 2025 22:17

Don't insert space after completing booleans in filter expression

d7511e1

Disable scroll bar on the filter input

cce0268

Apply valid filter expression immediately

cf485a4

Move scorers back to the middle; remove active filter highlight

5095f96

Don't trigger filter completion automatically in the middle of expres…

4ee20bc

…sion

andrei-apollo force-pushed the main branch from b3fea42 to 4ee20bc Compare January 13, 2025 15:12

Fix spurious autofocusing of the filter field

66dddbd

Remove unused CSS

25c4ca3

dragonstyle reviewed Jan 13, 2025

View reviewed changes

andrei-apollo added 3 commits January 13, 2025 22:24

Adapt filter editor to dark theme; use css variables for colors

632a676

Revert scorer selector changes

0c921a9

Remove todo

8fe57e6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow filtering samples by compound expressions including multiple scorers #1073

Allow filtering samples by compound expressions including multiple scorers #1073

andrei-apollo commented Jan 3, 2025 •

edited

Loading

dragonstyle commented Jan 6, 2025

andrei-apollo commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

dragonstyle commented Jan 9, 2025 •

edited

Loading

andrei-apollo commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

andrei-apollo commented Jan 9, 2025

andrei-apollo commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

dragonstyle commented Jan 11, 2025

andrei-apollo commented Jan 13, 2025

dragonstyle commented Jan 13, 2025

andrei-apollo commented Jan 13, 2025

dragonstyle commented Jan 13, 2025

andrei-apollo commented Jan 13, 2025

dragonstyle left a comment

dragonstyle Jan 13, 2025

dragonstyle Jan 13, 2025

dragonstyle Jan 13, 2025

andrei-apollo Jan 13, 2025

dragonstyle Jan 13, 2025

andrei-apollo Jan 13, 2025

Allow filtering samples by compound expressions including multiple scorers #1073

Are you sure you want to change the base?

Allow filtering samples by compound expressions including multiple scorers #1073

Conversation

andrei-apollo commented Jan 3, 2025 • edited Loading

This PR contains:

What is the current behavior?

What is the new behavior?

Does this PR introduce a breaking change?

Other information:

dragonstyle commented Jan 6, 2025

andrei-apollo commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

dragonstyle commented Jan 9, 2025 • edited Loading

andrei-apollo commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

andrei-apollo commented Jan 9, 2025

andrei-apollo commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

dragonstyle commented Jan 11, 2025

andrei-apollo commented Jan 13, 2025

dragonstyle commented Jan 13, 2025

andrei-apollo commented Jan 13, 2025

dragonstyle commented Jan 13, 2025

andrei-apollo commented Jan 13, 2025

dragonstyle left a comment

Choose a reason for hiding this comment

dragonstyle Jan 13, 2025

Choose a reason for hiding this comment

dragonstyle Jan 13, 2025

Choose a reason for hiding this comment

dragonstyle Jan 13, 2025

Choose a reason for hiding this comment

andrei-apollo Jan 13, 2025

Choose a reason for hiding this comment

dragonstyle Jan 13, 2025

Choose a reason for hiding this comment

andrei-apollo Jan 13, 2025

Choose a reason for hiding this comment

andrei-apollo commented Jan 3, 2025 •

edited

Loading

dragonstyle commented Jan 9, 2025 •

edited

Loading