Skip to content

Add the use_jsonb parameter to PGEngine for storing metadata using JSONB #222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

zhcn000000
Copy link
Contributor

Add the use_jsonb parameter to PGEngine for storing metadata using JSONB , and the default value is False

@averikitsch
Copy link
Collaborator

Hi @zhcn000000, thank you for this PR. Can you provide more details on the purpose of this change? Currently, we recommend that any metadata that should be indexed and filtered on to be specified as specific "metadata_columns" for even better performance than using JSONB. Additionally, the JSON data type has faster insertion performance than JSONB.

@zhcn000000
Copy link
Contributor Author

Although this may not have an obvious effect, jsonb is faster in reading and json is faster in writing, but it can provide users with additional options, just like the use_jsonb option in traditional engines PGVector

    def __init__(
        self,
        embeddings: Embeddings,
        *,
        connection: Union[None, DBConnection, Engine, AsyncEngine, str] = None,
        embedding_length: Optional[int] = None,
        collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
        collection_metadata: Optional[dict] = None,
        distance_strategy: DistanceStrategy = DEFAULT_DISTANCE_STRATEGY,
        pre_delete_collection: bool = False,
        logger: Optional[logging.Logger] = None,
        relevance_score_fn: Optional[Callable[[float], float]] = None,
        engine_args: Optional[dict[str, Any]] = None,
        use_jsonb: bool = True,
        create_extension: bool = True,
        async_mode: bool = False,
    ) -> None:
        """Initialize the PGVector store.
        For an async version, use `PGVector.acreate()` instead.

        Args:
            connection: Postgres connection string or (async)engine.
            embeddings: Any embedding function implementing
                `langchain.embeddings.base.Embeddings` interface.
            embedding_length: The length of the embedding vector. (default: None)
                NOTE: This is not mandatory. Defining it will prevent vectors of
                any other size to be added to the embeddings table but, without it,
                the embeddings can't be indexed.
            collection_name: The name of the collection to use. (default: langchain)
                NOTE: This is not the name of the table, but the name of the collection.
                The tables will be created when initializing the store (if not exists)
                So, make sure the user has the right permissions to create tables.
            distance_strategy: The distance strategy to use. (default: COSINE)
            pre_delete_collection: If True, will delete the collection if it exists.
                (default: False). Useful for testing.
            engine_args: SQLAlchemy's create engine arguments.
            use_jsonb: Use JSONB instead of JSON for metadata. (default: True)
                Strongly discouraged from using JSON as it's not as efficient
                for querying.
                It's provided here for backwards compatibility with older versions,
                and will be removed in the future.
            create_extension: If True, will create the vector extension if it
                doesn't exist. disabling creation is useful when using ReadOnly
                Databases.
        """

@zhcn000000
Copy link
Contributor Author

Setting the default value of use_jsonb to false enables users to still store using the original scheme (json) by default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants