Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: evaluate api #463

Merged
merged 13 commits into from
Dec 12, 2024
Merged

feat: evaluate api #463

merged 13 commits into from
Dec 12, 2024

Conversation

Icemap
Copy link
Member

@Icemap Icemap commented Dec 5, 2024

Issue: #376

  1. Added Create Evaluate Task API and Called Celery as Background;
  2. Added a New Queue evaluation in the Celery and Changed the Rest Task to Queue default;
  3. Added Evaluate Task List/Summary/Item Detail APIs.
  4. Added Evaluation Dataset List/Create/Delete/Update APIs, a caveat is the CSV file is only can be used in the Create API.
  5. Added Evaluation Dataset Item List/Create/Delete/Update APIs.

Copy link

vercel bot commented Dec 5, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
tidb-ai-preview ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 11, 2024 1:11pm
tidb-ai-storybook ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 11, 2024 1:11pm

@634750802 634750802 added this to the Release v0.3.1 milestone Dec 5, 2024
backend/app/api/admin_routes/models.py Outdated Show resolved Hide resolved
backend/app/api/admin_routes/evaluation/evaluation_task.py Outdated Show resolved Hide resolved
backend/app/api/admin_routes/evaluation/evaluation_task.py Outdated Show resolved Hide resolved
backend/app/tasks/evaluate.py Outdated Show resolved Hide resolved
@Icemap
Copy link
Member Author

Icemap commented Dec 11, 2024

Updated, PTAL again, thanks. @Mini256

Comment on lines +53 to +54
factual_correctness: Optional[float] = Field(nullable=True)
semantic_similarity: Optional[float] = Field(nullable=True)
Copy link
Member

@Mini256 Mini256 Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about the extensibility of this data model:

  • How should we handle other evaluation metrics if we have more?
  • How to handle user-defined metrics?

It may be a bit early to consider, but the costs of table schema migration are relatively high.

langfuse's data model may be of some help with us:
https://langfuse.com/docs/scores/data-model

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is too early to consider. If we need to add another metric, the effort will be way more than adding a field in the table. And before that, we should fix the retrieved_contexts first. For the user-defined custom metrics, we cannot support this version. PTAL of the ragas.metrics package.

Copy link
Member

@Mini256 Mini256 Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I perfered to store the metrics value in another table like:

class EvaluationTaskItemScore(SQLModel, table=true):
   id: int
   name: varchar(40) # maybe `factual_correctness`, `semantic_similarity`, `faithfulness` and more ...
   value: float
   evaluation_task_item_id: int
   evaluation_task_id: int

Just a suggestion and reminder for extensibility, the hard code columns way is also ok for me.

cc:@wd0517 @sykp241095 What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Langfuse supports user-defined metrics, so storing them in a separate table is required. However, we currently only support a limited set of hardcoded metrics, making a dedicated column for these acceptable for now.

url=settings.TIDB_AI_CHAT_ENDPOINT,
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {settings.TIDB_AI_API_KEY}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIDB_AI_API_KEY is a SecretStr, should use settings.TIDB_AI_API_KEY.get_secret_value()

Comment on lines +53 to +54
factual_correctness: Optional[float] = Field(nullable=True)
semantic_similarity: Optional[float] = Field(nullable=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Langfuse supports user-defined metrics, so storing them in a separate table is required. However, we currently only support a limited set of hardcoded metrics, making a dedicated column for these acceptable for now.

@wd0517
Copy link
Collaborator

wd0517 commented Dec 11, 2024

Please use rye fmt to format the code.

@Icemap
Copy link
Member Author

Icemap commented Dec 11, 2024

Sure, but the rye fmt makes a bunch of code files that were updated.

@Mini256
Copy link
Member

Mini256 commented Dec 11, 2024

🤣 maybe we need to add a GitHub Action to run rye fmt --check

@Icemap
Copy link
Member Author

Icemap commented Dec 11, 2024

🤣 maybe we need to add a GitHub Action to run rye fmt --check

Or a git hook maybe.

Copy link

@sykp241095 sykp241095 merged commit 0cca29d into main Dec 12, 2024
14 checks passed
@sykp241095 sykp241095 deleted the feat-eval-api branch December 12, 2024 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants