Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(wren-ai-service): sql pairs #961

Merged
merged 60 commits into from
Jan 2, 2025
Merged

feat(wren-ai-service): sql pairs #961

merged 60 commits into from
Jan 2, 2025

Conversation

cyyeh
Copy link
Member

@cyyeh cyyeh commented Nov 28, 2024

  • new apis:
    • POST /sql-pairs-preparations
    • DELETE /sql-pairs-preparations
    • GET /sql-pairs-preparations/{sql_pairs_preparation_id}/status
  • remove async_timer, timer
  • rename historical_question to historical_question_retrieval
  • remove user_id, since we don't use it in apis

Summary by CodeRabbit

  • New Features

    • Introduced new pipeline stages for SQL pair handling: sql_pairs_preparation, sql_pairs_deletion, and sql_pairs_retrieval.
    • Added endpoints for managing SQL pairs preparation and deletion in the API.
  • Bug Fixes

    • Enhanced error handling and logging in several services to improve clarity and robustness.
  • Documentation

    • Updated README and configuration files to reflect new pipeline entries and modifications.
  • Chores

    • Removed deprecated settings and consolidated imports across multiple files for better organization.

@cyyeh cyyeh added module/ai-service ai-service related ci/ai-service ai-service related labels Nov 28, 2024
@cyyeh cyyeh force-pushed the feat/ai-service/sql-pairs branch from 0d5aa5d to bf2515b Compare November 29, 2024 05:35
@cyyeh cyyeh marked this pull request as ready for review November 29, 2024 05:39
@cyyeh cyyeh requested a review from paopa November 29, 2024 05:40
@cyyeh cyyeh force-pushed the feat/ai-service/sql-pairs branch from 1f952c1 to 3142348 Compare November 29, 2024 05:46
@cyyeh cyyeh force-pushed the feat/ai-service/sql-pairs branch from 3142348 to 63999d4 Compare December 3, 2024 05:12
Copy link
Contributor

coderabbitai bot commented Dec 26, 2024

Walkthrough

This pull request introduces significant enhancements to the Wren AI service, focusing on SQL pairs management, pipeline restructuring, and removal of timer-related functionality. The changes include new pipelines for SQL pairs preparation, deletion, and retrieval, a consolidation of module imports, and the removal of the enable_timer setting across multiple components. The modifications streamline the codebase, improve modularity, and introduce more flexible SQL pair handling capabilities.

Changes

File/Group Change Summary
Configuration Files - Added new pipeline entries for SQL pairs preparation, deletion, and retrieval
- Removed enable_timer setting
- Renamed some pipeline components
Pipeline Components - Introduced new classes for SQL pairs management
- Removed async_timer and timer decorators
- Restructured imports from generation and retrieval modules
Web Services - Removed user_id from various request models
- Updated service class names for consistency
- Removed timing decorators
Testing - Updated test fixtures and mocks
- Added tests for SQL pairs preparation and deletion

Poem

🐰 Hop, hop, through code so bright,
SQL pairs dance with pure delight!
Decorators shed, timers cease,
Pipelines flow with newfound peace.
A rabbit's refactor, clean and neat! 🌟


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9a028d and 9f9ec0b.

📒 Files selected for processing (1)
  • wren-ai-service/tests/pytest/services/test_ask.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • wren-ai-service/tests/pytest/services/test_ask.py

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (41)
wren-ai-service/src/web/v1/services/chart_adjustment.py (1)

9-9: Confirmed import replacement from async_timer to trace_metadata.

Previously, the async_timer import was used for timing. Now you’ve replaced it with trace_metadata, which focuses on tracing rather than capturing performance metrics. This is okay if timing is no longer needed. If performance measurement or timing is still required, consider adding an alternative.

wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (6)

1-16: Ensure consistent logging strategy for debugging and production.
While the imports and logging setup appear fine, consider whether logger.setLevel(...) or additional contextual logs are needed to help debug issues in production, especially given the asynchronous nature of this pipeline.


18-34: Enhance documentation for OutputFormatter.
This class is straightforward, but consider adding a docstring or clarifying comment explaining how the output structure is intended to be consumed by downstream processes.


61-82: Ensure filter conditions align with Qdrant indexing.
If id is empty or None, the filters object is null. Confirm that an unfiltered retrieval is appropriate. This could cause performance/memory issues when the index is large.


84-90: Parameterize score thresholds in filtered_documents.
The filter uses score=0.7 as a hard-coded threshold. Consider making it configurable, possibly from environment variables or pipeline settings, to allow more flexible tuning.


105-127: Add docstring for the SqlPairsRetrieval class.
This class orchestrates the retrieval pipeline. A brief docstring explaining the main usage, inputs, outputs, and expected environment would help future maintainers.


141-148: Remove or secure the main block in production.
This main block is useful for local testing but may not be intended for production. Consider guarding it with an if DEBUG: check or removing it in deployment contexts.

wren-ai-service/src/web/v1/services/sql_pairs_preparation.py (6)

1-12: Consistently handle exceptions in service modules.
While logging is present, consider a centralized error-handling mechanism or standardized exception classes for consistent error responses across all service modules.


34-37: Validate the necessity of a dedicated response model.
SqlPairsPreparationResponse only has a single field. Consider if it could be combined with the request or a more general wrap.


39-55: Service layering for deletion.
DeleteSqlPairsRequest and DeleteSqlPairsResponse mirror the prepare request/response. Validate usage to avoid duplication. If differences are minimal, a shared model might be more maintainable.


62-69: Assess error details in SqlPairsPreparationStatusResponse.
SqlPairsPreparationError uses a very general code OTHERS. If more error categories exist or are needed, consider enumerating them for clearer telemetry and troubleshooting.


71-81: Ensure TTLCache suits concurrency and memory constraints.
The cache is set to a potentially large maxsize=1_000_000. Confirm that this usage pattern doesn’t cause memory issues under high load.


169-188: Distinct logging for “id not found” in get_prepare_sql_pairs_status.
Currently logs as an exception. Consider logging at a warning or info level if it’s expected that some requests might query missing IDs.

wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py (6)

1-21: Add docstrings or references for domain usage.
The imports and usage are not fully explained in the code. For maintainability, clarify in docstrings how these providers and classes fit into the broader SQL pairs preparation architecture.


80-87: Document usage of prompts function.
This function transforms raw SQL pairs into prompt dicts. A short docstring explaining the rationale for the prompt_builder would help future maintainers.


89-99: Parallel tasks with gather: handle partial failures.
If one generation task fails, asyncio.gather() raises, discarding all results. Consider using return_exceptions=True or a fallback mechanism if partial progress is acceptable.


126-134: Check embedding context.
If the documents are large, consider chunking or partial embedding strategies. Large documents can cause memory or performance bottlenecks.


157-160: Expand on field usage in SqlIntentionGenerationResult.
Currently, only intention is stored. If more fields are needed later (e.g., potential error code, data lineage), consider extending this model now for future-proofing.


173-217: Centralize pipeline component initialization.
SqlPairsPreparation includes numerous components in _components. If more logic is added, consider splitting into separate factory methods or classes to maintain clarity.

wren-ai-service/src/globals.py (13)

9-9: Optimize import style.
While wildcard imports can be convenient, consider using explicit imports for clarity and to avoid naming conflicts, especially in large codebases.


20-21: Consistent naming for new services.
SqlExplanationService and SqlPairsPreparationService follow a different naming style from older classes. Confirm naming guidelines to keep the codebase consistent.


39-41: Documentation for newly introduced class members.
sql_explanation_service, sql_regeneration_service, and sql_pairs_preparation_service are newly added. Add docstrings describing their purpose.


61-61: Track semantic version changes for generation pipeline.
Modules like generation.SemanticsDescription are newly integrated. If new features are introduced, reflect that in service versioning or changelogs.


84-87: Standardize specialized pipeline references.
intent_classification and data_assistance are grouped here. Confirm these belong in generation pipelines. Possibly rename them to reflect domain usage.


100-102: Assess performance implications of sql_pairs_retrieval.
It’s newly introduced, and retrieving large volumes of pairs can be memory-intensive. Evaluate the need for pagination or partial retrieval.


119-122: Guard or configure chart generation.
Chart generation can be resource-hungry. If runtime constraints exist, consider gating or caching repeated chart generation requests.


130-133: Enforce consistent naming between chart_generation and chart_adjustment.
One pipeline is chart_generation, the other is chart_adjustment. Validate that this difference is intentional and documented.


152-155: Docstring improvement for ask_details_service.
Explain how sql_breakdown and sql_summary pipelines collectively handle the request.


174-180: Consolidate naming for sql_summary.
sql_summary is used in multiple services. If they differ in usage or logic, clarify via better naming or shared references.


182-188: Potential to unify explanation logic.
sql_explanation_service might overlap partially with sql_summary. If so, consider unifying or layering them to reduce duplication.


190-198: Document advanced usage for sql_regeneration_service.
Regeneration can be complex. Provide docstrings or references regarding the logic behind regenerating queries.


221-231: Integration tests for SQL pairs prep.
Before finalizing, ensure integration tests are available that call sql_pairs_preparation_service end to end, verifying typical and edge use cases (e.g., empty requests, large-scale requests).

wren-ai-service/src/pipelines/retrieval/__init__.py (1)

1-5: Import clarity for new retrieval classes.
You have introduced five new imports here. Ensure each class is truly needed by external modules and that no cyclical dependencies occur.

wren-ai-service/src/pipelines/indexing/sql_pairs_deletion.py (1)

59-67: Dry run invocation
Providing a main block for quick debugging or demonstration is helpful. Any advanced dry-run logic (e.g., fetching test data) could be added if needed.

wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py (1)

89-89: [Parameterization suggestion] The threshold value 0.9 might be preferable as a configurable setting.
Hardcoding score=0.9 may hamper flexibility if different retrieval contexts demand varying thresholds. Consider passing this as a parameter or configuring it externally for better adaptability.

wren-ai-service/src/pipelines/generation/sql_generation.py (1)

58-62: Include sanitization or validation for sample data.

These lines introduce a template section for SQL samples. Consider validating or sanitizing the sample.summary and sample.sql fields to avoid potential injection or formatting issues in downstream usage.

wren-ai-service/src/web/v1/services/ask.py (1)

270-276: New sql_pairs_retrieval pipeline usage.
The introduction of sql_pairs_retrieval is coherent with the PR’s objective to manage SQL pairs. Ensure you handle empty results or errors consistently in the pipeline.

docker/config.example.yaml (1)

113-113: Fix trailing whitespace.

Remove the trailing space after openai_embedder.text-embedding-3-large.

-    embedder: openai_embedder.text-embedding-3-large 
+    embedder: openai_embedder.text-embedding-3-large
🧰 Tools
🪛 yamllint (1.35.1)

[error] 113-113: trailing spaces

(trailing-spaces)

wren-ai-service/src/pipelines/indexing/__init__.py (1)

90-110: Consider handling the return value from delete_documents.

The implementation looks good, but the return value from delete_documents might contain useful information about the number of documents deleted. Consider either:

  1. Removing the explicit return statement since it's not needed, or
  2. Returning the result to the caller for potential logging or verification.
-        return await self._sql_pairs_store.delete_documents(filters)
+        result = await self._sql_pairs_store.delete_documents(filters)
+        logger.info(f"Deleted {result} SQL pair documents")
+        return result
wren-ai-service/tools/config/config.example.yaml (1)

131-131: Fix trailing whitespace.

Remove the trailing space after openai_embedder.text-embedding-3-large.

-    embedder: openai_embedder.text-embedding-3-large 
+    embedder: openai_embedder.text-embedding-3-large
🧰 Tools
🪛 yamllint (1.35.1)

[error] 131-131: trailing spaces

(trailing-spaces)

wren-ai-service/tools/config/config.full.yaml (1)

150-150: Remove trailing whitespace

There is a trailing space after openai_embedder.text-embedding-3-large.

-    embedder: openai_embedder.text-embedding-3-large 
+    embedder: openai_embedder.text-embedding-3-large
🧰 Tools
🪛 yamllint (1.35.1)

[error] 150-150: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd66a8f and 1afc694.

📒 Files selected for processing (65)
  • deployment/kustomizations/base/cm.yaml (2 hunks)
  • docker/config.example.yaml (1 hunks)
  • wren-ai-service/README.md (0 hunks)
  • wren-ai-service/docs/configuration.md (0 hunks)
  • wren-ai-service/eval/evaluation.py (1 hunks)
  • wren-ai-service/eval/pipelines.py (2 hunks)
  • wren-ai-service/src/config.py (0 hunks)
  • wren-ai-service/src/globals.py (8 hunks)
  • wren-ai-service/src/pipelines/generation/__init__.py (1 hunks)
  • wren-ai-service/src/pipelines/generation/chart_adjustment.py (1 hunks)
  • wren-ai-service/src/pipelines/generation/chart_generation.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/data_assistance.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/followup_sql_generation.py (6 hunks)
  • wren-ai-service/src/pipelines/generation/intent_classification.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/sql_answer.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/sql_breakdown.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/sql_correction.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/sql_expansion.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/sql_explanation.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/sql_generation.py (4 hunks)
  • wren-ai-service/src/pipelines/generation/sql_regeneration.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/sql_summary.py (0 hunks)
  • wren-ai-service/src/pipelines/indexing/__init__.py (1 hunks)
  • wren-ai-service/src/pipelines/indexing/sql_pairs_deletion.py (1 hunks)
  • wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py (1 hunks)
  • wren-ai-service/src/pipelines/retrieval/__init__.py (1 hunks)
  • wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py (5 hunks)
  • wren-ai-service/src/pipelines/retrieval/preprocess_sql_data.py (0 hunks)
  • wren-ai-service/src/pipelines/retrieval/retrieval.py (0 hunks)
  • wren-ai-service/src/pipelines/retrieval/sql_executor.py (0 hunks)
  • wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (1 hunks)
  • wren-ai-service/src/utils.py (0 hunks)
  • wren-ai-service/src/web/development.py (0 hunks)
  • wren-ai-service/src/web/v1/routers/__init__.py (2 hunks)
  • wren-ai-service/src/web/v1/routers/ask.py (0 hunks)
  • wren-ai-service/src/web/v1/routers/ask_details.py (0 hunks)
  • wren-ai-service/src/web/v1/routers/semantics_preparation.py (0 hunks)
  • wren-ai-service/src/web/v1/routers/sql_answers.py (0 hunks)
  • wren-ai-service/src/web/v1/routers/sql_expansions.py (0 hunks)
  • wren-ai-service/src/web/v1/routers/sql_explanations.py (0 hunks)
  • wren-ai-service/src/web/v1/routers/sql_pairs_preparation.py (1 hunks)
  • wren-ai-service/src/web/v1/routers/sql_regenerations.py (0 hunks)
  • wren-ai-service/src/web/v1/services/ask.py (5 hunks)
  • wren-ai-service/src/web/v1/services/ask_details.py (1 hunks)
  • wren-ai-service/src/web/v1/services/chart.py (1 hunks)
  • wren-ai-service/src/web/v1/services/chart_adjustment.py (1 hunks)
  • wren-ai-service/src/web/v1/services/semantics_preparation.py (0 hunks)
  • wren-ai-service/src/web/v1/services/sql_answer.py (1 hunks)
  • wren-ai-service/src/web/v1/services/sql_expansion.py (1 hunks)
  • wren-ai-service/src/web/v1/services/sql_explanation.py (1 hunks)
  • wren-ai-service/src/web/v1/services/sql_pairs_preparation.py (1 hunks)
  • wren-ai-service/src/web/v1/services/sql_regeneration.py (1 hunks)
  • wren-ai-service/tests/data/config.test.yaml (1 hunks)
  • wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs_deletion.py (1 hunks)
  • wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs_preparation.py (1 hunks)
  • wren-ai-service/tests/pytest/services/mocks.py (4 hunks)
  • wren-ai-service/tests/pytest/services/test_ask.py (3 hunks)
  • wren-ai-service/tests/pytest/services/test_ask_details.py (2 hunks)
  • wren-ai-service/tests/pytest/services/test_sql_pairs_preparation.py (1 hunks)
  • wren-ai-service/tests/pytest/test_config.py (0 hunks)
  • wren-ai-service/tests/pytest/test_main.py (1 hunks)
  • wren-ai-service/tests/pytest/test_utils.py (1 hunks)
  • wren-ai-service/tools/config/config.example.yaml (1 hunks)
  • wren-ai-service/tools/config/config.full.yaml (1 hunks)
  • wren-ui/README.md (0 hunks)
💤 Files with no reviewable changes (28)
  • wren-ai-service/src/config.py
  • wren-ui/README.md
  • wren-ai-service/tests/pytest/test_config.py
  • wren-ai-service/src/web/v1/routers/sql_explanations.py
  • wren-ai-service/src/web/v1/routers/ask_details.py
  • wren-ai-service/src/web/v1/services/semantics_preparation.py
  • wren-ai-service/README.md
  • wren-ai-service/docs/configuration.md
  • wren-ai-service/src/web/v1/routers/sql_answers.py
  • wren-ai-service/src/web/development.py
  • wren-ai-service/src/pipelines/retrieval/preprocess_sql_data.py
  • wren-ai-service/src/web/v1/routers/sql_regenerations.py
  • wren-ai-service/src/pipelines/generation/data_assistance.py
  • wren-ai-service/src/pipelines/retrieval/sql_executor.py
  • wren-ai-service/src/pipelines/generation/sql_breakdown.py
  • wren-ai-service/src/pipelines/generation/sql_answer.py
  • wren-ai-service/src/web/v1/routers/sql_expansions.py
  • wren-ai-service/src/utils.py
  • wren-ai-service/src/web/v1/routers/ask.py
  • wren-ai-service/src/pipelines/generation/sql_expansion.py
  • wren-ai-service/src/web/v1/routers/semantics_preparation.py
  • wren-ai-service/src/pipelines/generation/sql_correction.py
  • wren-ai-service/src/pipelines/generation/intent_classification.py
  • wren-ai-service/src/pipelines/generation/sql_summary.py
  • wren-ai-service/src/pipelines/generation/sql_explanation.py
  • wren-ai-service/src/pipelines/generation/sql_regeneration.py
  • wren-ai-service/src/pipelines/generation/chart_generation.py
  • wren-ai-service/src/pipelines/retrieval/retrieval.py
🧰 Additional context used
🪛 yamllint (1.35.1)
deployment/kustomizations/base/cm.yaml

[error] 132-132: syntax error: could not find expected ':'

(syntax)

wren-ai-service/tools/config/config.full.yaml

[error] 150-150: trailing spaces

(trailing-spaces)

docker/config.example.yaml

[error] 113-113: trailing spaces

(trailing-spaces)

wren-ai-service/tools/config/config.example.yaml

[error] 131-131: trailing spaces

(trailing-spaces)

🔇 Additional comments (90)
wren-ai-service/src/pipelines/generation/followup_sql_generation.py (6)

3-3: Import statement looks good.

The newly imported Dict is now utilized for the sql_samples parameter. This aligns well with the code's usage of typed dictionaries in the subsequent changes.


107-116: Check for missing or invalid data in template rendering.

You’re conditionally injecting {{sample.summary}} and {{sample.sql}} within the prompt. It might be worth adding validation or at least a fallback if summary or sql fields are missing or None. This ensures the rendered prompt doesn't break if partial data is passed.


133-133: Optional parameter addition is consistent with Python 3.10+.

Declaring sql_samples: List[Dict] | None = None requires Python 3.10 or above (for the union operator “|”). Ensure the runtime environment is updated accordingly if it isn’t already.


145-145: Parameter forwarding to prompt_builder.run() is valid.

Forwarding sql_samples here is consistent with the prompt function’s updated signature and ensures that sql_samples are utilized in the Jinja template.


220-220: Extended method signature for run().

Adding the sql_samples parameter broadens the function’s scope to incorporate SQL examples. Verify that all upstream invocations provide or handle this parameter properly.


232-232: Ensure consistent usage of sql_samples in downstream pipelines.

The dictionary key sql_samples is correctly added. Verify that any follow-up or third-party pipeline steps expecting additional data still function as intended.

wren-ai-service/src/web/v1/services/chart_adjustment.py (1)

Line range hint 42-65: Double-check for potential gaps left by removing the @async_timer decorator.

The removal of the @async_timer decorator might impact any monitoring, performance analysis, or logging processes that previously relied on it. If there's still a need for execution time tracking, ensure that you have another suitable mechanism in place or confirm that timing is no longer a project requirement.

wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (4)

36-51: Validate filters logic in count_documents.
When id is None, filters are not applied. Ensure that this is intended and that it won’t return documents from unintended projects. If project isolation is crucial, you may want to handle the None case differently.


53-59: Check for empty embeddings usage.
If count_documents is zero, the function returns {}. Validate that an empty embedding dict does not lead to unexpected behavior in later steps, e.g., silent failures or bypassed logic.


92-100: Handle potential empty outputs in formatted_output.
When filtered_documents is empty or doesn’t contain the key "documents", the function returns an empty list by default. Verify the calling code properly handles the scenario of zero documents.


129-139: Confirm resilience in the run method.
The pipeline depends on external components (_components). If any are improperly initialized or fail during execution, confirm that the pipeline either retries or raises a clear exception.

wren-ai-service/src/web/v1/services/sql_pairs_preparation.py (3)

20-34: Ensure each SqlPairsPreparationRequest has a unique query_id.
The _query_id is optional but is used in status tracking. An accidental reuse of query_id may overwrite a previous request’s status.


57-60: Consider input validation for sql_pairs_preparation_id.
No validation is performed on sql_pairs_preparation_id. Add checks for format or allowed characters, especially if used as a key in caches or logs.


83-125: Add concurrency or partial-completion checks in prepare_sql_pairs.
The method updates the cache status on success/failure. If multiple calls with the same query_id occur concurrently, the status might fluctuate unexpectedly. Consider locking or unique IDs.

wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py (5)

51-78: Consider adding validation or a fallback for incomplete fields in the run method (SqlPairsDescriptionConverter).
If sql_pair data is missing keys, .get() returns None. Confirm that id or sql is always present to avoid empty documents or indexing errors.


117-124: Input validation for convert_sql_pairs_to_documents.
Similar to the SqlPairsDescriptionConverter, ensure fields are present and typed correctly.


136-147: Confirm the rationale for calling delete_sql_pairs within the same pipeline.
delete_sql_pairs references sql_pairs_cleaner. If these pairs are also needed for embedding or further processing, ensure the cleaning logic is indeed desired here.


162-171: Monitor JSON schema validity.
Ensure the defined JSON schema in SQL_INTENTION_GENERATION_MODEL_KWARGS remains in sync if additional fields are added to SqlIntentionGenerationResult.


219-226: Validate the final pipeline run for empty input.
If sql_pairs=[], the pipeline might skip certain steps. Confirm that this path is tested and returns a meaningful response to the caller.

wren-ai-service/src/globals.py (5)

96-99: Check historical_question_retrieval usage.
Renaming historical_question to historical_question_retrieval might require changes in tests or data references. Verify consistent usage across the codebase.


105-108: Check completeness of sql_generation and sql_correction logic.
These pipelines can be interdependent. If one fails, confirm the fallback or error route is properly handled.


141-144: Potential concurrency checks in preprocess_sql_data.
If preprocess_sql_data fetches or transforms large data sets, confirm concurrency usage is safe.


168-171: Remove or confirm necessity of sql_correction in expansion.
sql_expansion sometimes duplicates logic found in sql_correction. Confirm there’s no overlap or repeated overhead.


Line range hint 206-215: Verify that retrieval pipelines are correctly reused.
question_recommendation reuses database retrieval logic. Confirm that the pipeline handles different search contexts without collisions.

wren-ai-service/src/pipelines/retrieval/__init__.py (1)

7-13: Use the all list consistently.
Explicitly listing public exports in __all__ is a good practice. Periodically check if new classes need to be added or removed as the module evolves.

wren-ai-service/src/web/v1/routers/__init__.py (4)

11-11: Use of singular naming improves clarity
Renaming semantics_preparations to semantics_preparation provides a clearer singular concept. This is consistent with best practices for naming modules and routes.


15-15: New router inclusion for SQL Pairs Preparation
Great addition of sql_pairs_preparation. This aligns with the PR objective of expanding SQL pairs functionality.


25-25: Confirm references to updated router name
Ensure references to the old module or plural naming (semantics_preparations) are removed throughout the codebase to avoid confusion or import errors.


32-32: SQL pairs preparation route
This newly added route effectively wires up the SQL Pairs Preparation service. Looks good for handling the new functionality.

wren-ai-service/src/pipelines/generation/__init__.py (2)

1-16: Comprehensive imports
All newly imported classes are succinctly grouped. This enhances discoverability and modular usage in downstream code. Ensure these imports do not introduce circular dependencies.


18-35: Explicit public API
Defining __all__ clarifies the intended exports and prevents accidental usage of internal classes. Good practice!

wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs_deletion.py (4)

10-11: Asynchronous test function
Using @pytest.mark.asyncio is correct for testing async code. This allows for better concurrency handling while verifying pipeline behaviors.


21-24: Multiple SQL pairs insertion
The SqlPair instantiation ensures each pair has a unique identifier. This tests that deletion operations can handle multiple entries, which is a solid coverage scenario.


37-38: Validation after first deletion
The assertion that the document count is still 2 provides a partial deletion scenario verification. This is good for ensuring partial or selective deletions operate as expected.


39-42: Full deletion scenario
Asserting 0 documents remain ensures complete deletion functionality is verified. This thoroughly validates the end-to-end flow of the pipeline.

wren-ai-service/src/pipelines/indexing/sql_pairs_deletion.py (3)

16-23: Leverage SqlPairsCleaner effectively
The delete_sql_pairs function simplifies the pipeline approach by delegating deletion to sql_pairs_cleaner. This encapsulation is clean and maintainable.


29-40: Pipeline constructor is coherent
Passing in a DocumentStoreProvider and building a dictionary of components is a structured approach. Ensure that the dataset_name="sql_pairs" remains consistent with any future expansions (e.g. multi-dataset support).


45-58: Final pipeline execution
Using self._pipe.execute within run is flexible, allowing additional steps to be added if the pipeline grows. The observe decorator aligns well with logging and monitoring.

wren-ai-service/tests/pytest/services/mocks.py (5)

3-3: Consolidated imports look good.

The consolidated import from src.pipelines helps keep the imports cleaner. No issues here.


16-16: Ensure consistent naming usage.

HistoricalQuestionMock now inherits from HistoricalQuestionRetrieval. Verify all references to HistoricalQuestionMock and historical_question.HistoricalQuestion across the codebase are updated or removed as needed.


24-24: Inheritance updated to IntentClassification.

Confirm that the prior import references to intent_classification.IntentClassification have been removed or updated.


34-34: Refactored to use generation.SQLGeneration.

Check all references to sql_generation.SQLGeneration to ensure they are replaced with the new generation.SQLGeneration.


55-55: No issues with the renamed SQLSummary.

Inheriting from generation.SQLSummary is consistent with the new import patterns.

wren-ai-service/tests/pytest/test_utils.py (1)

77-77: user_id parameter is set to None.

This ensures consistency with the broader removal of user_id from various request classes. Verify that any logic relying on user_id has been updated accordingly.

wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs_preparation.py (7)

1-6: Imports are modular and clear.

Imports for settings, DocumentStoreProvider, and SqlPairsPreparation are well arranged and explicit.


9-19: Test function naming and structure are descriptive.

The test name test_sql_pairs_preparation_saving_to_document_store accurately conveys the test’s purpose. Setup steps (e.g., store creation) are straightforward.


20-29: SqlPairsPreparation instantiation is well-focused.

Parameters are clearly passed in. Validate that the pipeline components defined in settings.components["sql_pairs_preparation"] exist and are correct.


31-37: Document validations look correct.

All essential assertions (content, meta, sql_pair_id, sql) are present. This ensures robust coverage.


40-50: Second test function effectively covers multiple IDs scenario.

This test replicates the pipeline run with different project IDs, ensuring coverage for varied contexts.


51-69: Repeated run logic is consistent.

Running the pipeline multiple times is a valid approach to confirm isolation by id.


70-79: Filter-based validation is sound.

Filtering documents by project_id checks data partitioning logic and ensures correct categorization per ID.

wren-ai-service/tests/pytest/services/test_ask_details.py (2)

6-6: Broad import of generation is consistent with ongoing refactoring.

No concerns here; it aligns with other updated references to the generation module.


20-23: Creation of SQLBreakdown and SQLSummary is aligned with the new prefix.

Ensure references to the older module or class names (sql_breakdown / sql_summary) are removed or replaced throughout.

wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py (4)

13-13: [Import usage confirmation] ScoreFilter appears to be correctly imported and used.
It’s good to see ScoreFilter continue to be used here. Ensure no references to any removed or renamed classes remain.


107-107: [Rename alignment] The new class name aligns well with its retrieval purpose.
Renaming HistoricalQuestionHistoricalQuestionRetrieval clarifies the pipeline’s intent. Confirm that all references to the old name have been removed throughout the codebase.


132-132: [Log statement consistency] The logger statement is appropriate to the renamed class.
Good job updating the logger message to match the new HistoricalQuestionRetrieval pipeline name.


147-147: [Consistency check] The pipeline’s dry run reference is properly updated to the new class name.
This usage in dry_run_pipeline looks correct.

wren-ai-service/src/web/v1/services/sql_explanation.py (1)

55-55: [Naming practice] Changing the class name from SQLExplanationServiceSqlExplanationService improves consistency.
Ensure references to the previous name are fully removed, and confirm adherence to project naming conventions.

wren-ai-service/src/web/v1/services/sql_answer.py (1)

10-10: [Trace metadata addition] The trace_metadata decorator could provide valuable debugging insight.
No apparent issues with the addition. Confirm that the relevant logs or tracing dashboards can consume this metadata effectively.

wren-ai-service/src/web/v1/services/sql_regeneration.py (1)

78-78: [Service rename] Transitioning from SQLRegenerationService to SqlRegenerationService is consistent with the style used elsewhere.
Verify that all imports or references reflect this updated name.

wren-ai-service/src/pipelines/generation/sql_generation.py (4)

85-85: Parameter rename looks consistent.

Changing samples to sql_samples clarifies usage. Ensure any external references to this parameter have also been updated to avoid runtime errors.


93-93: Good usage of the newly introduced parameter.

Forwarding sql_samples to the prompt builder is correctly aligned with the updated parameter name.


168-168: Addition of the sql_samples parameter aligns with the prompt signature.

The run() method signature now correctly matches the updated function parameters, improving readability.


178-178: Properly passing sql_samples.

Adding sql_samples to the inputs ensures consistent data flow through the pipeline.

wren-ai-service/src/web/v1/routers/sql_pairs_preparation.py (3)

68-88: POST /sql-pairs endpoint creation looks solid.

The endpoint’s asynchronous flow using BackgroundTasks is a neat solution for offloading long-running tasks. Make sure to handle potential exceptions within the background task to keep the status consistent.


91-111: DELETE /sql-pairs endpoint is well-structured.

The logic for generating a unique query ID and tracking deletion status is clearly implemented. Confirm that any exceptions in the deletion process also appropriately update the status.


114-123: GET /sql-pairs/{sql_pairs_preparation_id} endpoint.

Retrieving the current status based on the unique ID completes a full asynchronous workflow. Looks good for monitoring and handling user queries about the process.

wren-ai-service/src/web/v1/services/chart.py (1)

9-9: New import for tracing metadata.

The introduction of trace_metadata aligns with the simplified removal of @async_timer. Ensuring each request is traceable without extra timers makes the code cleaner.

wren-ai-service/tests/pytest/services/test_sql_pairs_preparation.py (3)

18-32: Fixture for sql_pairs_preparation_service.

The fixture sets up the environment and document store for SQL pairs. This is a clean approach to ensure each test starts with the correct, isolated context.


49-93: Test for SQL pairs preparation.

The loop that checks status until "finished" or "failed" ensures asynchronous correctness. Great job verifying the final state of the document store after preparation.


95-164: Test for SQL pairs deletion.

Reusing the same query_id for deletion is a nice approach that demonstrates a real-world flow (prepare → delete). Checking the empty document store confirms the logic is complete.

wren-ai-service/src/web/v1/services/ask_details.py (2)

10-10: Ensure import consistency and usage clarity.

The added import of trace_metadata appears consistent with the usage in the ask_details method. No issues found.


Line range hint 25-36: Confirm removal of user_id from AskDetailsRequest is fully handled.

We see that user_id is no longer part of the class. Please verify that all references to user_id have been removed or refactored accordingly elsewhere, so that downstream functionality is not impacted.

wren-ai-service/eval/evaluation.py (1)

121-122: Validate fallback path when keys are absent.

Switching to meta.get("session_id") and meta.get("user_id") prevents potential KeyError. Ensure that if these values are absent and become None, downstream logic can safely handle it or provide a suitable default.

wren-ai-service/tests/pytest/services/test_ask.py (2)

37-52: Refactored pipeline references look good.

Using generation and retrieval modules consistently improves readability and aligns naming across components. Good job!


163-163: Renamed mock aligns with real service naming.

Updating historical_question to historical_question_retrieval maintains consistency. Looks good!

wren-ai-service/tests/pytest/test_main.py (1)

26-26: Test name correction is fine—ensure references are updated if needed.

Renaming from test_semantics_preparations to test_semantics_preparation clarifies the intent. Verify that other references, if any, have been updated consistently.

wren-ai-service/src/web/v1/services/sql_expansion.py (1)

9-9: Removal of async_timer looks consistent.
No issues spotted with removing the @async_timer. The remaining @observe and @trace_metadata decorators ensure the method still has observability and trace functionality. This change also aligns with other timer removals throughout the PR.

wren-ai-service/eval/pipelines.py (2)

145-146: Graceful fallback on missing session and user data.
Switching to .get("session_id") and .get("user_id") mitigates potential KeyError exceptions when these fields are missing. This improves resilience.


162-163: Consistent session and user ID handling.
Similar improvement here—replacing direct dictionary key access with .get(...) ensures safer metadata retrieval.

wren-ai-service/src/web/v1/services/ask.py (4)

10-10: Trace metadata import usage is correct.
No further action needed; consistent with how other services incorporate tracing.


245-247: Renamed pipeline key from historical_question to historical_question_retrieval.
This aligns with the updated naming convention in configuration files. No functional issues apparent.


286-286: sql_samples parameter for followup_sql_generation.
Accepting sql_samples adds flexibility, but confirm that both the retrieval and generation pipeline handle the relevant data shape and potential empty arrays.


297-297: sql_samples parameter for sql_generation.
The same suggestion applies here—validate or test thoroughly to ensure the new parameter is consistently consumed and does not introduce unexpected behaviors.

wren-ai-service/tests/data/config.test.yaml (1)

45-45: Consistent pipeline rename in config.
Renaming historical_question to historical_question_retrieval keeps the configuration aligned with code changes. This helps avoid runtime mismatches.

docker/config.example.yaml (1)

107-117: LGTM: SQL pairs pipeline configuration.

The new SQL pairs pipelines are properly configured with the required components (document store, embedder, and LLM where needed).

🧰 Tools
🪛 yamllint (1.35.1)

[error] 113-113: trailing spaces

(trailing-spaces)

wren-ai-service/src/pipelines/indexing/__init__.py (1)

119-125: LGTM: Export list updated correctly.

The __all__ list is properly updated to include the new SQL pairs components.

wren-ai-service/tools/config/config.example.yaml (1)

125-135: LGTM: SQL pairs pipeline configuration.

The new SQL pairs pipelines are properly configured with the required components (document store, embedder, and LLM where needed).

🧰 Tools
🪛 yamllint (1.35.1)

[error] 131-131: trailing spaces

(trailing-spaces)

wren-ai-service/tools/config/config.full.yaml (1)

144-147: Consider renaming for consistency with existing patterns

The pipeline name sql_pairs_preparation could be renamed to sql_pairs_indexing to maintain consistency with other indexing operations like db_schema_indexing and historical_question_indexing.

wren-ai-service/src/pipelines/generation/chart_adjustment.py (1)

175-175: LGTM: Clean simplification of post-processor invocation

The change to use direct class invocation instead of the run method simplifies the code while maintaining the same functionality.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
wren-ai-service/src/web/v1/services/ask.py (1)

Pipeline name inconsistency detected - both old and new names are present

The codebase currently contains references to both pipeline names, which needs to be addressed:

  • Old name "historical_question" is still used in:

    • src/globals.py
    • src/web/v1/services/ask.py
    • src/web/v1/services/semantics_preparation.py
    • src/pipelines/indexing/historical_question.py
    • Related test files
  • New name "historical_question_retrieval" is used in:

    • src/globals.py
    • Configuration files (config.example.yaml, config.full.yaml, etc.)
    • src/pipelines/retrieval/historical_question_retrieval.py
    • Related test files

This indicates an incomplete rename that could cause runtime issues. The pipeline name should be consistent across all files.

🔗 Analysis chain

Line range hint 133-141: Verify pipeline name change across configurations.

The pipeline name change from "historical_question" to "historical_question_retrieval" improves clarity. Let's verify this change is consistent across the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining references to the old pipeline name
# and verify the new pipeline name is properly configured.

echo "Checking for old pipeline name references..."
rg "historical_question[^_]" --type yaml --type python

echo "Verifying new pipeline name configuration..."
rg "historical_question_retrieval" --type yaml --type python

Length of output: 377


Script:

#!/bin/bash
# Description: Check for any remaining references to the old pipeline name
# and verify the new pipeline name is properly configured.

echo "Checking for old pipeline name references..."
rg "historical_question[^_]"

echo "Verifying new pipeline name configuration..."
rg "historical_question_retrieval"

Length of output: 2532

🧹 Nitpick comments (1)
wren-ai-service/src/web/v1/services/ask.py (1)

Line range hint 272-299: Consider validating SQL samples before usage.

The SQL samples retrieved from sql_pairs_retrieval are passed directly to the generation pipelines. Consider adding validation to ensure the samples are well-formed and relevant before usage.

Example validation:

 sql_samples = (
     await self._pipelines["sql_pairs_retrieval"].run(
         query=ask_request.query,
         id=ask_request.project_id,
     )
 )["formatted_output"].get("documents", [])
+
+# Validate SQL samples before usage
+valid_samples = [
+    sample for sample in sql_samples
+    if sample and isinstance(sample, dict) and "sql" in sample
+]
+if not valid_samples:
+    logger.warning("No valid SQL samples retrieved for query: %s", ask_request.query)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d96661 and 128ec16.

📒 Files selected for processing (3)
  • wren-ai-service/src/pipelines/generation/intent_classification.py (0 hunks)
  • wren-ai-service/src/web/v1/services/ask.py (4 hunks)
  • wren-ai-service/tests/pytest/services/mocks.py (4 hunks)
💤 Files with no reviewable changes (1)
  • wren-ai-service/src/pipelines/generation/intent_classification.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • wren-ai-service/tests/pytest/services/mocks.py
🔇 Additional comments (2)
wren-ai-service/src/web/v1/services/ask.py (2)

Line range hint 1-67: LGTM! Clean removal of unused components.

The removal of the async_timer import and user_id field aligns with the PR objectives. The class structure remains clean and well-typed.


272-277: LGTM! Clean integration of SQL pairs retrieval.

The new SQL pairs retrieval pipeline is well-integrated into the existing flow.

@paopa paopa self-requested a review December 30, 2024 08:32
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py (1)

149-155: 🛠️ Refactor suggestion

**** Disable partial writes if a portion fails
This is similar to a past review comment on partial writes. Consider a transaction or rollback mechanism within write_sql_pairs to ensure atomicity.

🧹 Nitpick comments (5)
wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py (2)

51-78: Ensure JSON parse resilience when constructing Documents
Although the current logic appears straightforward, consider robust error handling in case sql_pair lacks the "sql" or "intention" keys, or if future changes alter their existence. Returning partial results or raising descriptive exceptions can help diagnose data issues.


136-147: Consider transaction-like approach to prevent partial deletions
If a portion of the sql_pairs_cleaner.run operation fails, you currently still proceed without rollback or follow-up. Consider implementing an all-or-nothing approach or logging partial failures for clarity.

wren-ai-service/tools/config/config.example.yaml (1)

125-135: Remove trailing spaces and ensure pipeline naming consistency
There's trailing whitespace on line 131 flagged by static analysis. Also, confirm that the pipeline names (sql_pairs_preparation, etc.) are consistent with the rest of your service to prevent confusion.

-    embedder: openai_embedder.text-embedding-3-large 
+    embedder: openai_embedder.text-embedding-3-large
🧰 Tools
🪛 yamllint (1.35.1)

[error] 131-131: trailing spaces

(trailing-spaces)

wren-ai-service/tools/config/config.full.yaml (1)

144-158: Tidy up trailing whitespace and ensure naming alignment
Line 150 shows trailing spaces. Remove them to keep the config unambiguous. Also verify that all newly introduced pipeline names match any references in code or docs.

-    embedder: openai_embedder.text-embedding-3-large 
+    embedder: openai_embedder.text-embedding-3-large
🧰 Tools
🪛 yamllint (1.35.1)

[error] 150-150: trailing spaces

(trailing-spaces)

deployment/kustomizations/base/cm.yaml (1)

155-165: Clean up trailing spaces to avoid config parse issues
Static analysis flags trailing spaces on line 161. While minor, such whitespace can cause inconsistent YAML parsing in certain tooling. Remove them to maintain a tidy config.

-        embedder: openai_embedder.text-embedding-3-large 
+        embedder: openai_embedder.text-embedding-3-large
🧰 Tools
🪛 yamllint (1.35.1)

[error] 161-161: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 128ec16 and d3e4861.

📒 Files selected for processing (4)
  • deployment/kustomizations/base/cm.yaml (1 hunks)
  • wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py (1 hunks)
  • wren-ai-service/tools/config/config.example.yaml (1 hunks)
  • wren-ai-service/tools/config/config.full.yaml (1 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
deployment/kustomizations/base/cm.yaml

[error] 161-161: trailing spaces

(trailing-spaces)

wren-ai-service/tools/config/config.example.yaml

[error] 131-131: trailing spaces

(trailing-spaces)

wren-ai-service/tools/config/config.full.yaml

[error] 150-150: trailing spaces

(trailing-spaces)

🔇 Additional comments (1)
wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py (1)

173-217: Pipeline design looks solid; consider error handling for each step
The pipeline orchestrates multiple async tasks. For better reliability, consider adding per-step error handling or a strategy pattern for handling exceptions. This way, subsequent steps can either gracefully skip or fail early with descriptive logs.

Comment on lines +101 to +115
@observe()
def post_process(
generate_sql_intention: List[dict],
sql_pairs: List[SqlPair],
) -> List[Dict[str, Any]]:
intentions = [
orjson.loads(result["replies"][0])["intention"]
for result in generate_sql_intention
]

return [
{"id": sql_pair.id, "sql": sql_pair.sql, "intention": intention}
for sql_pair, intention in zip(sql_pairs, intentions)
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Validate orjson decoding to handle malformed or missing data
Decoding replies[0] without checks can raise errors if replies is empty or the structure isn't as expected. Safeguard the code by verifying replies length and using try-except around orjson.loads to improve resilience and clarify any data mismatch issues.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
wren-ai-service/src/pipelines/common.py (2)

1-1: Consider maximizing forward-compatible type hints.
Since Python 3.9 supports built-in type hints like list and dict, gradually migrating away from typing.List and typing.Dict can simplify annotations in the future. However, this is a matter of preference and codebase consistency, so it's merely a suggestion.


49-50: Minor formatting observation.
Multiple consecutive blank lines may be unnecessary. Consider reducing them to one blank line for a cleaner code flow. PEP8 typically suggests a single blank line to separate logical sections in a file.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd8d4f0 and 4c6611a.

📒 Files selected for processing (2)
  • wren-ai-service/src/pipelines/common.py (2 hunks)
  • wren-ai-service/src/pipelines/retrieval/historical_question.py (0 hunks)
💤 Files with no reviewable changes (1)
  • wren-ai-service/src/pipelines/retrieval/historical_question.py
🔇 Additional comments (1)
wren-ai-service/src/pipelines/common.py (1)

51-63: Filter and sorting logic looks good, but consider edge cases.
The run method effectively filters out documents below the score threshold and sorts them in descending order by score. This is correct for typical usage. However, consider the following:

  1. If any documents do not have a .score attribute, the code may raise an AttributeError.
  2. If the user wants to include some margin beyond equality (e.g., document.score > score), a parameter or clarification might help.

Otherwise, the functionality is well organized and easily testable.

Would you like me to write a quick test snippet verifying that documents are filtered and sorted as expected?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
wren-ai-service/src/globals.py (1)

Line range hint 1-231: Consider grouping related services into sub-containers.

As the number of services grows, consider organizing related services (e.g., SQL-related services) into logical sub-containers to improve maintainability and reduce the size of the ServiceContainer class. This could help with:

  • Better organization of related functionality
  • Easier dependency management
  • Improved testability
  • Clearer service boundaries

Example structure:

@dataclass
class SQLServices:
    explanation_service: SqlExplanationService
    regeneration_service: SqlRegenerationService
    pairs_preparation_service: SqlPairsPreparationService
    # ... other SQL-related services

@dataclass
class ServiceContainer:
    sql_services: SQLServices
    # ... other service groups
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c6611a and 7d16dff.

📒 Files selected for processing (1)
  • wren-ai-service/src/globals.py (8 hunks)
🔇 Additional comments (4)
wren-ai-service/src/globals.py (4)

9-9: LGTM! Clean import organization.

The consolidation of pipeline imports and addition of SQL-related service imports improve code organization and maintainability.

Also applies to: 20-22


39-41: LGTM! Well-structured service container updates.

The addition of SQL-related services maintains consistent naming conventions and properly integrates the new SQL pairs functionality described in the PR objectives.


96-101: LGTM! Proper integration of SQL pairs retrieval.

The SQL pairs retrieval pipeline is well-integrated and properly configured with the necessary components.


221-231: Verify SQL pairs preparation configuration.

The SQL pairs preparation service is properly structured with both preparation and deletion pipelines. However, we should verify that the corresponding pipeline components are properly configured in the settings.

✅ Verification successful

SQL pairs preparation configuration is properly configured

The verification shows that the SQL pairs preparation service is correctly configured with all necessary components:

  • Configuration files contain proper settings for both sql_pairs_preparation and sql_pairs_deletion pipelines with required components:
    • Document store (qdrant)
    • Embedder (openai_embedder.text-embedding-3-large)
    • LLM (gpt-4o-mini-2024-07-18)
  • Implementation classes (SqlPairsPreparation and SqlPairsDeletion) are properly defined with necessary dependencies and components
  • Service structure matches the configuration with proper initialization of document stores and required providers
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify SQL pairs pipeline configuration in settings

# Check for required pipeline components in configuration files
rg -l "sql_pairs_preparation|sql_pairs_deletion" --type yaml

# Check for any TODO or FIXME comments related to SQL pairs configuration
rg -i "todo|fixme|xxx|hack|bug" -C 3 "sql_pairs"

Length of output: 322


Script:

#!/bin/bash
# Let's examine the configuration files to verify the SQL pairs pipeline components

# Check the content of example configuration files
echo "=== Content from docker/config.example.yaml ==="
cat docker/config.example.yaml | grep -A 10 "sql_pairs"

echo -e "\n=== Content from wren-ai-service/tools/config/config.example.yaml ==="
cat wren-ai-service/tools/config/config.example.yaml | grep -A 10 "sql_pairs"

# Let's also check the actual implementation of these pipeline components
echo -e "\n=== SQL Pairs Implementation ==="
rg -A 10 "class SqlPairsPreparation|class SqlPairsDeletion" 

Length of output: 7894

Copy link
Member

@paopa paopa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/ai-service ai-service related module/ai-service ai-service related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants