Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(weave): Small-batch, on-demand LLM Judge #2902

Merged
merged 147 commits into from
Nov 19, 2024
Merged
Show file tree
Hide file tree
Changes from 145 commits
Commits
Show all changes
147 commits
Select commit Hold shift + click to select a range
e41ce26
Initial Migration
tssweeney Nov 4, 2024
3b6cef9
Interface and basic validation
tssweeney Nov 4, 2024
3ae1ca2
Added tests and Assertions
tssweeney Nov 4, 2024
1dfcd7a
Modify scorers and uptake changes - make initial test changes
tssweeney Nov 4, 2024
7d03c21
Implemented initial query-side improvements
tssweeney Nov 4, 2024
906a448
Implemented initial feedback query tests (failing)
tssweeney Nov 4, 2024
c0dc641
Implemented initial feedback query tests (failing)
tssweeney Nov 4, 2024
5efcf9c
Merge branch 'master' into tim/enhanced_feedback_data_model
tssweeney Nov 4, 2024
00fd587
Initial sort implementation
tssweeney Nov 5, 2024
29ef40b
Other Sort Tests
tssweeney Nov 5, 2024
2af8f15
Initial filter tests
tssweeney Nov 5, 2024
643ea0e
Finished filter tests
tssweeney Nov 5, 2024
79e83fe
Test fix
tssweeney Nov 5, 2024
b4963d5
Fixed sqlite tests
tssweeney Nov 5, 2024
22e2982
Fixed sqlite tests 2
tssweeney Nov 5, 2024
2ac2557
added one more test
tssweeney Nov 5, 2024
3aca1eb
Merge branch 'master' into tim/enhanced_feedback_data_model
tssweeney Nov 5, 2024
16fa9b1
Merge branch 'master' into tim/enhanced_feedback_data_model
tssweeney Nov 5, 2024
63a37ce
First incoming changes from Online Evals
tssweeney Nov 5, 2024
f3b9e66
Initial Cleanup
tssweeney Nov 5, 2024
5463698
Initial Action changes
tssweeney Nov 5, 2024
001b9d5
Initial Refactor
tssweeney Nov 5, 2024
acea359
A bunch of name changes and adjustments
tssweeney Nov 5, 2024
f21c6b4
Move actions
tssweeney Nov 5, 2024
05c525d
Move actions again
tssweeney Nov 5, 2024
94ee62e
beginning to refactor tests
tssweeney Nov 5, 2024
1a58795
Beginning to fix the tests themselves
tssweeney Nov 5, 2024
554f82f
Added remaining tests
tssweeney Nov 5, 2024
6435506
Initial Master Merge
tssweeney Nov 6, 2024
2b0cf0c
Lint merge
tssweeney Nov 6, 2024
95d463b
Initial Comments
tssweeney Nov 6, 2024
a6dc197
Moved Feedback Symbols
tssweeney Nov 6, 2024
bd89bcb
Merged in
tssweeney Nov 6, 2024
33ad7d1
Merged and linted
tssweeney Nov 6, 2024
0c4fc7a
Implemented initial action dispatching
tssweeney Nov 6, 2024
0c5be92
Implemented contains words
tssweeney Nov 6, 2024
b4af5c6
Correct location
tssweeney Nov 6, 2024
712cbb7
contains words implemented
tssweeney Nov 6, 2024
212f3b4
Implemented lmm judge
tssweeney Nov 6, 2024
d6655e7
Finished basic implementation of LLM judge
tssweeney Nov 6, 2024
7649ef4
All basic functionality working - still some todos, but the base is t…
tssweeney Nov 6, 2024
2765b7d
Merge branch 'master' into tim/basic_batch_actions
tssweeney Nov 6, 2024
406333a
More structured tests
tssweeney Nov 6, 2024
dd08ab7
change name to response_schema
tssweeney Nov 6, 2024
3bb0db5
change name to response_schema
tssweeney Nov 6, 2024
8493689
Fixed structured test
tssweeney Nov 6, 2024
73514c7
Added wb user id to the batch executor
tssweeney Nov 6, 2024
07ee02a
Fixed the tests correctly
tssweeney Nov 6, 2024
f71a0e1
Init Changes
tssweeney Nov 6, 2024
abfa4b9
Moved actions worker
tssweeney Nov 6, 2024
18d7480
Merge branch 'tim/basic_batch_actions' into tim/basic_batch_actions_UI
tssweeney Nov 6, 2024
9e1ee02
Remove window flags
tssweeney Nov 6, 2024
856ba0b
Restore accidental changes
tssweeney Nov 6, 2024
9999c13
Fixed merge
tssweeney Nov 6, 2024
8c51b93
Remove online scorers page
tssweeney Nov 6, 2024
ad73416
Changed name from metrics to scorers
tssweeney Nov 6, 2024
10541ad
Fixed interface
tssweeney Nov 6, 2024
0a68915
Changed ConfiguredAction to AxctionDefinition
tssweeney Nov 6, 2024
0b2152d
Changed ConfiguredAction to AxctionDefinition
tssweeney Nov 6, 2024
7514cd8
Lint
tssweeney Nov 6, 2024
6279d44
Fixed zod
tssweeney Nov 6, 2024
9aec32a
Fixed discriminator
tssweeney Nov 6, 2024
18821f2
Merge branch 'tim/basic_batch_actions' into tim/basic_batch_actions_UI
tssweeney Nov 6, 2024
a45026b
Migrated to new data model
tssweeney Nov 6, 2024
6919cf2
Fixed a few things
tssweeney Nov 6, 2024
2e83235
Fix action executor
tssweeney Nov 6, 2024
546e96a
Call Action Viewer complete
tssweeney Nov 6, 2024
3b85286
Call Action Viewer complete
tssweeney Nov 6, 2024
fb91c21
Call Action Viewer complete
tssweeney Nov 6, 2024
68352df
Fixed up names
tssweeney Nov 6, 2024
901a788
Fixed
tssweeney Nov 6, 2024
20cec3e
Fixed
tssweeney Nov 6, 2024
16a96a1
Added empty states
tssweeney Nov 6, 2024
dfb120e
Fix typing
tssweeney Nov 6, 2024
80aca35
lint
tssweeney Nov 6, 2024
73f3c78
Added more
tssweeney Nov 7, 2024
63c8877
Fixed columns
tssweeney Nov 7, 2024
db6dd19
comments
tssweeney Nov 7, 2024
052c74b
comments
tssweeney Nov 7, 2024
cf81ef1
Merge branch 'master' into tim/basic_batch_actions
tssweeney Nov 7, 2024
608b423
erge branch 'tim/basic_batch_actions' into tim/basic_batch_actions_UI
tssweeney Nov 7, 2024
8e30861
some final fixes
tssweeney Nov 7, 2024
3e8aa1e
lint fix
tssweeney Nov 7, 2024
8553cce
Merge branch 'tim/basic_batch_actions' into tim/basic_batch_actions_UI
tssweeney Nov 7, 2024
ab52cad
Merge branch 'master' into tim/basic_batch_actions_UI
tssweeney Nov 7, 2024
840010f
fixed name requirement
tssweeney Nov 7, 2024
bf99f82
merged
tssweeney Nov 7, 2024
9bc191c
Merge branch 'master' into tim/basic_batch_actions_UI
tssweeney Nov 7, 2024
2c8391d
merged in master
tssweeney Nov 7, 2024
e49112d
small fix
tssweeney Nov 8, 2024
62a5d1d
merged
tssweeney Nov 8, 2024
57c1e3f
pickup new change
tssweeney Nov 8, 2024
39d425f
merged
tssweeney Nov 13, 2024
2cd872b
lint
tssweeney Nov 13, 2024
8daa7ca
lint
tssweeney Nov 13, 2024
3ffac60
done
tssweeney Nov 13, 2024
3a3b70c
done
tssweeney Nov 13, 2024
bb5697e
remove dead code
tssweeney Nov 13, 2024
eeb783c
remove dead code
tssweeney Nov 13, 2024
d694d52
removed line change
tssweeney Nov 13, 2024
24b66e0
merged in master
tssweeney Nov 13, 2024
4ab58a0
revert small ref
tssweeney Nov 13, 2024
1a125a7
Pre-refactor
tssweeney Nov 13, 2024
55f6924
Pre-refactor
tssweeney Nov 13, 2024
9449bb3
Adopting fields
tssweeney Nov 13, 2024
0b802cd
Done with the MVP
tssweeney Nov 13, 2024
a70aed7
Send it
tssweeney Nov 14, 2024
b491943
Send it 2
tssweeney Nov 14, 2024
c082a8b
Merge branch 'master' into tim/basic_batch_actions_UI
tssweeney Nov 14, 2024
bcc9c7a
merged
tssweeney Nov 14, 2024
6501d8d
revert change
tssweeney Nov 14, 2024
a14d757
simplify action page
tssweeney Nov 14, 2024
fb9125e
a bunch of wokr
tssweeney Nov 14, 2024
af97b1d
another refactor
tssweeney Nov 14, 2024
58cc1a2
another refactor
tssweeney Nov 14, 2024
9d8c1c6
merged in master
tssweeney Nov 14, 2024
7a2e181
pulled over changes
tssweeney Nov 14, 2024
f27fc75
small fix
tssweeney Nov 14, 2024
d402062
fixed
tssweeney Nov 14, 2024
815a31c
added
tssweeney Nov 14, 2024
f23a570
Sucked up changes
tssweeney Nov 14, 2024
84ad3f2
moved file
tssweeney Nov 14, 2024
675055c
moved file
tssweeney Nov 14, 2024
4078ffa
little fixes
tssweeney Nov 14, 2024
2a5b48c
Added template selector
tssweeney Nov 14, 2024
ba7eba9
revised
tssweeney Nov 14, 2024
847f7fb
ok, setting up the creator
tssweeney Nov 15, 2024
8cee403
RC
tssweeney Nov 15, 2024
ca3b96e
fixes
tssweeney Nov 15, 2024
c2fc343
ok, done
tssweeney Nov 15, 2024
b7d17cd
cleaning house
tssweeney Nov 15, 2024
f486ea9
merged
tssweeney Nov 15, 2024
d8e90ec
merged in
tssweeney Nov 15, 2024
7e8e0a4
merged in
tssweeney Nov 15, 2024
9de62bd
merged in
tssweeney Nov 15, 2024
b6cb84c
init
tssweeney Nov 15, 2024
a0c1307
init
tssweeney Nov 15, 2024
7f34935
RC
tssweeney Nov 16, 2024
baecf5f
RC
tssweeney Nov 16, 2024
e894f86
RC
tssweeney Nov 16, 2024
402232f
Merge branch 'master' into tim/basic_batch_actions_UI
tssweeney Nov 18, 2024
8d07a9d
Small fix to overflow
tssweeney Nov 18, 2024
c77b8e7
some final adjustments
tssweeney Nov 18, 2024
e9e07fd
Done
tssweeney Nov 18, 2024
99c6095
Done
tssweeney Nov 18, 2024
1f0d461
Merged in
tssweeney Nov 19, 2024
b37c1f0
Merged in
tssweeney Nov 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions wb_schema.gql
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ type User implements Node {
photoUrl: String
deletedAt: DateTime
teams(before: String, after: String, first: Int, last: Int): EntityConnection
admin: Boolean
}

type UserConnection {
Expand Down
3 changes: 3 additions & 0 deletions weave-js/src/common/hooks/useViewerInfo.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ const VIEWER_QUERY = gql`
viewer {
id
username
admin
teams {
edges {
node {
Expand All @@ -28,6 +29,7 @@ type UserInfo = {
id: string;
username: string;
teams: string[];
admin: boolean;
};
type UserInfoResponseLoading = {
loading: true;
Expand Down Expand Up @@ -71,6 +73,7 @@ export const useViewerInfo = (): UserInfoResponse => {
id,
username,
teams,
admin: userInfo.admin,
},
});
});
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
import {Box} from '@mui/material';
import _ from 'lodash';
import React, {useEffect} from 'react';
import React, {useEffect, useMemo} from 'react';

import {useViewerInfo} from '../../../../../common/hooks/useViewerInfo';
import {TargetBlank} from '../../../../../common/util/links';
import {Alert} from '../../../../Alert';
import {Loading} from '../../../../Loading';
import {Tailwind} from '../../../../Tailwind';
import {RUNNABLE_FEEDBACK_TYPE_PREFIX} from '../pages/CallPage/CallScoresViewer';
import {Empty} from '../pages/common/Empty';
import {useWFHooks} from '../pages/wfReactInterface/context';
import {useGetTraceServerClientContext} from '../pages/wfReactInterface/traceServerClientContext';
Expand Down Expand Up @@ -40,6 +41,23 @@ export const FeedbackGrid = ({
// eslint-disable-next-line react-hooks/exhaustive-deps
}, []);

// Exclude runnables as they are presented in a different tab
const withoutRunnables = useMemo(
() =>
(query.result ?? []).filter(
f => !f.feedback_type.startsWith(RUNNABLE_FEEDBACK_TYPE_PREFIX)
),
[query.result]
);

// Group by feedback on this object vs. descendent objects
const grouped = useMemo(
() =>
_.groupBy(withoutRunnables, f => f.weave_ref.substring(weaveRef.length)),
[withoutRunnables, weaveRef]
);
const paths = useMemo(() => Object.keys(grouped).sort(), [grouped]);

if (query.loading || loadingUserInfo) {
return (
<Box
Expand All @@ -61,7 +79,7 @@ export const FeedbackGrid = ({
);
}

if (!query.result || !query.result.length) {
if (!withoutRunnables.length) {
return (
<Empty
size="small"
Expand All @@ -81,12 +99,6 @@ export const FeedbackGrid = ({
);
}

// Group by feedback on this object vs. descendent objects
const grouped = _.groupBy(query.result, f =>
f.weave_ref.substring(weaveRef.length)
);
const paths = Object.keys(grouped).sort();

const currentViewerId = userInfo ? userInfo.id : null;
return (
<Tailwind>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import Box from '@mui/material/Box';
import {useViewerInfo} from '@wandb/weave/common/hooks/useViewerInfo';
import {Loading} from '@wandb/weave/components/Loading';
import {useViewTraceEvent} from '@wandb/weave/integrations/analytics/useViewEvents';
import React, {FC, useCallback, useContext, useEffect, useState} from 'react';
Expand Down Expand Up @@ -31,6 +32,7 @@ import {CallSchema} from '../wfReactInterface/wfDataModelHooksInterface';
import {CallChat} from './CallChat';
import {CallDetails} from './CallDetails';
import {CallOverview} from './CallOverview';
import {CallScoresViewer} from './CallScoresViewer';
import {CallSummary} from './CallSummary';
import {CallTraceView, useCallFlattenedTraceTree} from './CallTraceView';
import {PaginationControls} from './PaginationControls';
Expand All @@ -56,7 +58,13 @@ export const CallPage: FC<{
return <CallPageInnerVertical {...props} call={call.result} />;
};

export const useShowRunnableUI = () => {
const viewerInfo = useViewerInfo();
return viewerInfo.loading ? false : viewerInfo.userInfo?.admin;
};

const useCallTabs = (call: CallSchema) => {
const showScores = useShowRunnableUI();
const codeURI = call.opVersionRef;
const {entity, project, callId} = call;
const weaveRef = makeRefCall(entity, project, callId);
Expand Down Expand Up @@ -128,6 +136,21 @@ const useCallTabs = (call: CallSchema) => {
</Tailwind>
),
},
// For now, we are only showing this tab for W&B admins since the
// feature is in active development. We want to be able to get
// feedback without enabling for all users.
...(showScores
? [
{
label: 'Scores (W&B Admin Preview)',
content: (
<Tailwind>
<CallScoresViewer call={call} />
</Tailwind>
),
},
]
: []),
{
label: 'Use',
content: (
Expand Down
Loading
Loading