Skip to content

Refactor VectorStoreFactory to use registration functionality like StorageFactory #2006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Jul 22, 2025

This PR refactors the VectorStoreFactory class to follow the same registration-based pattern as StorageFactory, making the codebase more consistent and extensible.

Changes Made

🔄 Architectural Refactoring

  • Replaced hardcoded match/case statements with a registry-based approach using _vector_store_registry
  • Added factory functions for each built-in vector store type:
    • create_lancedb_vector_store()
    • create_azure_ai_search_vector_store()
    • create_cosmosdb_vector_store()
  • Unified registration mechanism - built-in vector stores now use the same registration system as custom ones

New Features

  • Utility methods following StorageFactory pattern:
    • get_vector_store_types() - Returns list of registered vector store types
    • is_supported_vector_store_type() - Checks if a vector store type is supported
  • Enhanced registration - Supports both callable factory functions and direct class types

🔒 Backward Compatibility

  • ✅ All existing create_vector_store() calls continue to work unchanged
  • ✅ Original register() method signature still supports direct class registration
  • vector_store_types attribute maintained for legacy code access
  • ✅ Both enum and string-based vector store type parameters supported

📝 Example Usage

from graphrag.vector_stores.factory import VectorStoreFactory, VectorStoreType

# Existing usage continues to work
vector_store = VectorStoreFactory().create_vector_store(
    VectorStoreType.LanceDB, 
    {"collection_name": "test", "db_uri": "/tmp/db"}
)

# New utility methods
available_types = VectorStoreFactory.get_vector_store_types()
is_supported = VectorStoreFactory.is_supported_vector_store_type("lancedb")

# Custom registration (enhanced)
def create_my_vector_store(**kwargs):
    return MyVectorStore(**kwargs)

VectorStoreFactory.register("my_store", create_my_vector_store)

🧪 Testing

Added comprehensive test suite with 10 test cases covering:

  • Built-in vector store creation
  • Custom vector store registration (both function and class-based)
  • Backward compatibility scenarios
  • Error handling for unknown types
  • Utility method functionality

🎯 Benefits

  • Consistency: VectorStoreFactory now follows the same pattern as StorageFactory
  • Extensibility: Easier to add new vector store implementations
  • Maintainability: Cleaner, more modular code structure
  • Type Safety: Improved type annotations and error handling

Fixes #2005.


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

@Copilot Copilot AI changed the title [WIP] [Feature Request]: Refactor VectorStoreFactory class to use registration functionality Refactor VectorStoreFactory to use registration functionality like StorageFactory Jul 22, 2025
@Copilot Copilot AI requested a review from jgbradley1 July 22, 2025 11:14
Copilot finished work on behalf of jgbradley1 July 22, 2025 11:14
@jgbradley1
Copy link
Collaborator

Remove backward compatibility support from the VectorStoreFactory and the StorageFactory classes.

Copy link
Contributor Author

Copilot AI commented Jul 22, 2025

Remove backward compatibility support from the VectorStoreFactory and the StorageFactory classes.

Done! I've removed all backward compatibility support from both VectorStoreFactory and StorageFactory classes. The changes include:

Removed:

  • vector_store_types and storage_types class variables
  • Complex registration logic that handled both callable factories and direct class types
  • All backward compatibility code paths

Enhanced:

  • Added runtime validation to explicitly reject class registration with clear error messages
  • Simplified register() methods to only accept factory functions
  • Updated tests to verify the new behavior

The factories now have a cleaner, more consistent API without the backward compatibility cruft. All existing functionality continues to work - only the deprecated features have been removed.

Commit: 5aae40b

Copilot finished work on behalf of jgbradley1 July 22, 2025 11:43
@jgbradley1 jgbradley1 marked this pull request as ready for review July 22, 2025 12:35
@jgbradley1 jgbradley1 requested a review from a team as a code owner July 22, 2025 12:35
@jgbradley1
Copy link
Collaborator

Run ruff check —-fix and ruff format to fix any listing and formatting errors. Also create a semversioner file with the appropriate commit message

Copy link
Contributor Author

Copilot AI commented Jul 22, 2025

Run ruff check —-fix and ruff format to fix any listing and formatting errors. Also create a semversioner file with the appropriate commit message

Done! I've run ruff check --fix and ruff format to fix linting and formatting issues, and created a semversioner file for the changes.

The changes include:

  • Formatted test files to comply with code style standards
  • Created semversioner file with patch-level change description
  • Fixed line length and whitespace formatting issues

Commit: 7586c4c

Copilot finished work on behalf of jgbradley1 July 22, 2025 12:47
@jgbradley1
Copy link
Collaborator

The following error occurs in the pytests. Update the pytests if necessary to reflect the changes that have been made in the StorageFactory and VectorStoreFactory classes.

/home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py
  /home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py:124:13 - error: Method "get" overrides class "PipelineStorage" in an incompatible manner
    Positional parameter count mismatch; base method has 4, but override has 2
    Parameter 3 mismatch: base parameter "as_bytes" is keyword parameter, override parameter is position-only
    Parameter 4 mismatch: base parameter "encoding" is keyword parameter, override parameter is position-only
    Return type mismatch: base method returns type "CoroutineType[Any, Any, Any]", override returns type "bytes | None"
      Type "bytes | None" is not assignable to type "CoroutineType[Any, Any, Any]"
        "bytes" is not assignable to "CoroutineType[Any, Any, Any]" (reportIncompatibleMethodOverride)
  /home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py:127:13 - error: Method "set" overrides class "PipelineStorage" in an incompatible manner
    Positional parameter count mismatch; base method has 4, but override has 3
    Parameter 4 mismatch: base parameter "encoding" is keyword parameter, override parameter is position-only
    Return type mismatch: base method returns type "CoroutineType[Any, Any, None]", override returns type "None"
      "None" is not assignable to "CoroutineType[Any, Any, None]" (reportIncompatibleMethodOverride)
  /home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py:130:13 - error: Method "delete" overrides class "PipelineStorage" in an incompatible manner
    Return type mismatch: base method returns type "CoroutineType[Any, Any, None]", override returns type "None"
      "None" is not assignable to "CoroutineType[Any, Any, None]" (reportIncompatibleMethodOverride)
  /home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py:133:13 - error: Method "has" overrides class "PipelineStorage" in an incompatible manner
    Return type mismatch: base method returns type "CoroutineType[Any, Any, bool]", override returns type "bool"
      "bool" is not assignable to "CoroutineType[Any, Any, bool]" (reportIncompatibleMethodOverride)
  /home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py:139:13 - error: Method "clear" overrides class "PipelineStorage" in an incompatible manner
    Return type mismatch: base method returns type "CoroutineType[Any, Any, None]", override returns type "None"
      "None" is not assignable to "CoroutineType[Any, Any, None]" (reportIncompatibleMethodOverride)
5 errors, 0 warnings, 0 informations

Copy link
Contributor Author

Copilot AI commented Jul 22, 2025

The following error occurs in the pytests. Update the pytests if necessary to reflect the changes that have been made in the StorageFactory and VectorStoreFactory classes.

...

Fixed the pytest errors by updating the CustomStorage test class to properly implement the PipelineStorage interface. The errors were caused by method signature mismatches - the test was using synchronous methods while PipelineStorage defines async methods with different parameter lists.

Key changes:

  • Updated all storage methods to be async with correct signatures
  • Added missing abstract methods (find, child, keys, get_creation_date)
  • Fixed parameter types and counts to match interface requirements

Also ran ruff check --fix and ruff format as requested, fixing a dataclass annotation issue.

Commit: 12e1744

Copilot finished work on behalf of jgbradley1 July 22, 2025 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: Refactor VectorStoreFactory class to use registration functionality
2 participants