Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow filtering samples by compound expressions including multiple scorers #1073

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

andrei-apollo
Copy link
Contributor

@andrei-apollo andrei-apollo commented Jan 3, 2025

This is a reincarnation of #911

This PR contains:

  • New features
  • Changes to dev-tools e.g. CI config / github tooling
  • Docs
  • Bug fixes
  • Code refactor

What is the current behavior?

Samples can be filtered based on simple conditions including one scorer.

What is the new behavior?

Samples can be filtered by compound expressions like result == "C" and steps <= 10. Additionally, samples can be filtered based on input and target texts.

  • Expression parsing via filtrex. Supports arithmetic, basic math functions, Python-style boolean operations, chained comparisons.
  • Filter input via CodeMirror. Supports syntax highlighting and autocompletion.
  • The filter expression can include any scorer, not just the selected one.
  • Clicking on a score adds it to the filter. Moreover, for simple categorical scores the UI will automatically suggest expressions like result == "C".

Auxiliary changes:

  • Merged scorer and score selectors. Did this to keep tool panel width in check now that the filter field is wider.
  • Moved filter and scorer list to the right so that they are nicely aligned.
  • Made scorer list collision-proof. Now if two scorers define scores with the same name, the scorers panel will use dot notation to disambiguate, e.g. score.foo vs other_score.foo.

Does this PR introduce a breaking change?

No.

Other information:

Next steps:

  • I would like to also allow filtering by sample metadata and full-text search over the transcript. This is now easy to do from the UI perspective, but this would require loading the entire samples, not just the summaries.
  • Consider if the tool order could be improved. I find it a little confusing that the filter is to the right of the scorer selector, yet does not depend on it. Not sure how best to fix this, because I want to keep the filter aligned with the scorer list.

This PR is a work in progress. In particular, I remember that @dragonstyle suggested to only apply filter on Enter. This haven't been done yet. I'm also still figuring out some corner cases with different score types. Still, @dragonstyle, if you could take a look at the current state I would appreciate your feedback. Do you this is moving the right direction?

@dragonstyle
Copy link
Collaborator

This is looking really great to me, and I love that we'll be able to support much more robust filtering! I believe that the right autocomplete experience can make this nearly as easy to use as the simple selector. I have some suggestions to get there:

  1. When the control is focused on(and empty), we should show autocomplete suggestions for the first 'segment' (the user is still free to type, but will typically see the scorer names as suggestions, for sample). Each time the user completes a 'segment' of the expression, I think we should automatically prompt for the next segment. e.g. Once I have a scorer name, we should suggest the various "==", "<", etc.. Once that is selected, if the scorer is categorical, we could suggest values). This will make the simple case of filtering by scorer just about as simple as it is now. It also will make learning very discoverable as users will see options each step of the way. (Sometimes showing autocomplete for the next 'segment' be possible, but when we can do so reasonably, I think we should).

  2. The click scorer name is a pretty obscure affordance and I don't think we should rely on it (I'd like to see this removed and just solve for discovery using the filter input itself).

  3. If you agree we can remove the scorer link affordance, I think we could move the filter to the left side of the scorer selector- I don't think its proximity to the scorer list is important in this case.

  4. As you noted, filtering as each key is pressed is very disruptive (since all samples will always just 'disappear' until the expression is complete). I very much think we need to use enter, a wait / debounce, background evaluation to ensure it is a 'complete expression' or some other affordance to 'accept' the filter rather than filtering immediately upon key down.

  5. I would make the (i) icon a (?) icon.

  6. For advanced feature (awesome!) like input_contains, I think we should autocomplete to the function with the cursor in between parens if possible (e.g. input_contains(|))

Don't mean to flood with feedback, but this is definitely getting there and I'm looking forward to merging it!!

@andrei-apollo andrei-apollo reopened this Jan 9, 2025
@andrei-apollo andrei-apollo marked this pull request as ready for review January 9, 2025 17:32
@andrei-apollo
Copy link
Contributor Author

Thank for the feedback! Implemented all suggestions, please take a look.

@dragonstyle
Copy link
Collaborator

This is looking amazing! A few suggestions that are hopefully just minor tweaks...

  1. Can we delay showing the expression error until the user has pressed enter or in some indicated that they are done? I think we're showing the red error wrapper too aggressively (and the red squiggles seems like plenty unless the user actually runs an expression which results in an error).

  2. Currently, selecting the entry adds the selected text, but then the user needs to press space between steps to get the next suggestion, would be sweet if user selects options from menu if we just offer next step directly. Do you think its possible to enable the equivalent of this:

    focus
    Screenshot 2025-01-09 at 1 00 28 PM

    select choice (press enter)
    Screenshot 2025-01-09 at 1 00 43 PM

    select equal (press enter)
    Screenshot 2025-01-09 at 1 01 05 PM

    select "I" (arrow, then enter)
    Screenshot 2025-01-09 at 1 07 10 PM

  3. Question - could we make the delete case just apply immediately since we 'know' that is a complete expression? This too inconsistent?

  4. For long expressions, I am seeing a scroll bars (using safari) which are disruptive...

    Screenshot 2025-01-09 at 12 56 46 PM

@dragonstyle
Copy link
Collaborator

dragonstyle commented Jan 9, 2025

(These check failures are not related to this PR and are related to ruff dependency version changes. They are now fixed on main so if you rebase against main they should go away - sorry!)

@andrei-apollo
Copy link
Contributor Author

  1. Good idea! That error message was annoying. Done.
  2. Done.
  3. Feels rather inconsistent too me, to be honest.
  4. Hmm. Weird. For me it works fine in Safari as well:
    image
    What version do you have? Does the scrollbar always look like this or only sometimes?

@dragonstyle
Copy link
Collaborator

What version do you have? Does the scrollbar always look like this or only sometimes?

Version 18.1.1 (20619.2.8.11.12).

I only see it once I make the expression long and scroll with the mouse...

@dragonstyle
Copy link
Collaborator

One other question - rather than show the green feedback treatment once an expression is complete, maybe we should just apply the expression at that point since we know it will work?

It would still result in some changes to filtering in cases where the filters didn't narrow the set (e.g or) but I think that would be worth getting an even smoother experience.

@andrei-apollo
Copy link
Contributor Author

My Safari version is slightly different (Version 18.2 (20620.1.16.11.8)), don't know if it's related or not. To be honest, debugging this kind of failure without being able to reproduce it would be quite hard. I decided to just remove the scroll bar. It's quite unusual for single-line text inputs to have scroll bars anyway.

@andrei-apollo
Copy link
Contributor Author

Agreed. Applying the filter expression immediately but only when it's valid seems like a good approach. Changed the behavior and added color-coding, which is hopefully noticeable enough to make the current state clear, but not so much as to be distracting.

@dragonstyle
Copy link
Collaborator

This is a great improvement over our current filtering.

Nits:

  • I personally find the green outline maybe a bit overkill (perhaps we could just reflect the error or incomplete states and treat success states as just having no feedback). That said, I can definitely live with this approach if you feel strongly that the green is needed helpful.

  • I noticed that the popup options can sometimes be a bit aggressive. I'm not sure what the rule to filter this would be (or if there is a consistent rule to be applied, but I notice for an expression like: choice == "I" or input_contains("parallel") if I go back to edit the 'or' to 'and', it will popup choices after I complete the 'and'.

  • One tiny thing that might be a side effect of the clickable scorers - I think the duration can go back to the far right and the scorers in the middle now that they don't proximity to the filter.

I haven't looked closely at the code itself - LMK if you think that is ready to go and I can take a look (or just ping me whenever you think its good to go).

@dragonstyle
Copy link
Collaborator

Also note - good feature suggestion for error filtering here:

https://inspectcommunity.slack.com/archives/C080ET25C81/p1736618476283839

@andrei-apollo
Copy link
Contributor Author

Removed the green outline. It was an experiment, I wasn't sure how I feel about it myself. What is really important IMO is to give some indication whether the filter is applied or not. But graying out inactive filters is sufficient for that.

Restored the secondary bar item order. Sorry I forgot to revert this when moving the filter to the left.

I thought you wanted aggressive popups :) But maybe I misunderstood what you were suggesting earlier. It's true that now you would get a completion suggestion after or. I would like to note that it happens only if you auto-complete the or with an Enter, which is not something you really need to do. So my mental model for the current implementation is: “whenever you press Enter, you want the system to suggest you something afterwards”. But again, I agree that the current completion are aggressive. If you think so too, how about we change the logic to never auto suggest in the middle of an expression?

I think we have a lot of opportunities to add more features to the new filter: integrate epoch filter, filter by sample metadata, add full-text transcript search, etc. I've added the one you suggested to my list, but we can do it later, right? No need to do everything in this PR, I think

@dragonstyle
Copy link
Collaborator

But again, I agree that the current completion are aggressive. If you think so too, how about we change the logic to never auto suggest in the middle of an expression?

Sounds perfect to me!

I think we have a lot of opportunities to add more features to the new filter: integrate epoch filter, filter by sample metadata, add full-text transcript search, etc. I've added the one you suggested to my list, but we can do it later, right? No need to do everything in this PR, I think

+100!

@andrei-apollo
Copy link
Contributor Author

Disabled automatic suggestions in the middle (except after a dot). WDYT?

@dragonstyle
Copy link
Collaborator

Looks great to me. One final tiny nit - since the filter is often the first focusable element on the page, it gets focus by default (which triggers showing the autocompletion). What would you think about filtering this case (just using isTrusted to limit the auto display of the completion to user generated focus)?

Taking a quick pass through the code now.

@andrei-apollo
Copy link
Contributor Author

Good catch. It was actually worse then that: for some reason I decided that the filter field should be focused whenever the filter changes externally, so the call was coming from inside the house. Fixed. And added isTrusted check just in case, although it doesn't seem to be needed at this point.

Copy link
Collaborator

@dragonstyle dragonstyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look great having gone through the code. One open question and a few styling things specific to vscode. Could chat anytime in realtime if helpful!

src/inspect_ai/_view/www/src/samples/tools/filters.mjs Outdated Show resolved Hide resolved
[data-tooltip] {
position: relative;
}
[data-tooltip]:hover::after {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs:

  background: var(--bs-body-bg);
  opacity: 1;

The body-bg is set dynamically in the case of VSCode (to match the vscode theme). Best test there is to open a log in vscode and toggle theme (usually just selecting a dark theme is enough to uncover issues)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond the tooltip, I think we need to set up the overall theme style (attached image shows what I'm talking bout). Currently we 'forward' vscode colors into css vars as needed in App.css here:

body[class^="vscode-"] {
  --bs-border-radius: 0;
  --bs-border-radius-lg: 0;
  --bs-body-bg: var(--vscode-editor-background);
  --bs-card-bg: var(--vscode-editor-background);
  --bs-table-bg: var(--vscode-editor-background);
  --bs-light-bg-subtle: var(--vscode-sideBar-background);
  --bs-light-border-subtle: var(--vscode-sideBarSectionHeader-border);
  --bs-body-color: var(--vscode-editor-foreground);
  --bs-table-color: var(--vscode-editor-foreground);
  --bs-accordion-btn-color: var(--vscode-editor-foreground);
  --bs-emphasis-color: var(--vscode-editor-foreground);
  --bs-navbar-brand-color: var(--vscode-editor-foreground);
  --bs-navbar-brand-hover-color: var(--vscode-editor-foreground);
  --bs-code-color: var(--vscode-editorInfo-foreground);
  --bs-light: var(--vscode-sideBar-background);
  --bs-btn-bg: var(--vscode-peekViewTitle-background);
  --bs-primary: var(--vscode-banner-iconForeground);
  --bs-nav-pills-link-active-bg: var(--vscode-banner-iconForeground);
  --bs-secondary: var(--vscode-breadcrumb-foreground);
  --bs-secondary-bg: var(--vscode-list-inactiveSelectionBackground);
  --bs-border-color: var(--vscode-editorGroup-border);
  --bs-card-border-color: var(--vscode-editorGroup-border);
  --bs-warning-bg-subtle: var(--vscode-inputValidation-warningBackground);
  --bs-warning-text-emphasis: var(--vscode-input-foreground);
  --inspect-find-background: var(--vscode-editorWidget-background);
  --inspect-find-foreground: var(--vscode-editorWidget-foreground);
  --inspect-input-background: var(--vscode-input-background);
  --inspect-input-foreground: var(--vscode-input-foreground);
  --inspect-input-border: var(--vscode-input-border);
  --inspect-diff-add-color: var(--vscode-diffEditor-insertedTextBackground);
  --inspect-diff-remove-color: var(--vscode-diffEditor-removedTextBackground);
}

The process of discovering vscode colors is more art than science as they are not well documented. I usually just use the 'Open Webview Developer Tools' within vscode and browser the variables available to the viewer and try to select colors that seem correct + might have names that seem to be likely to be semantically related to what they'll represent in inspect.

Happy to chat more real time if useful...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry meant to include this screenshot as well...

Screenshot 2025-01-13 at 1 31 46 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point! I replaced hard-coded values with CSS variables and fixed the dark theme. At least I hope I did.
Unfortunately I could not find the variables for the build-in selection colors and for the BS focus outline color (--inspect-focus-border-color), so I had to specify those manually. But pretty much everything else is taken from the theme now.

{ tag: tags.number, color: "#164" },
{ tag: tags.keyword, color: "#708" },
{ tag: tags.function(tags.variableName), color: "#00c" },
]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using CSS vars (if we can) to allow vscode overrides to work properly. Alternatively, we could select middle ground hues of the colors (e.g. the red is currently pretty good in both light and dark, but the others are dark enough that they are difficult in dark themes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used built-in token colors. They already have dark theme variations, which is nice. I slightly dislike the fact that there no highlighting for strings in the light theme, but this is a very minor issue. We could always go back to this later and tweak some more if we want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants