Self-query parser crashing for some user prompts #7207

soutot · 2024-11-14T20:51:11Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain.js documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain.js rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

NextJS endpoint

import {HNSWLib} from '@langchain/community/vectorstores/hnswlib'
import {BaseMessage} from '@langchain/core/messages'
import {ChatPromptTemplate} from '@langchain/core/prompts'
import {Runnable} from '@langchain/core/runnables'
import {ChatOpenAI, OpenAIEmbeddings} from '@langchain/openai'
import {StreamingTextResponse, LangChainStream} from 'ai'
import {createStuffDocumentsChain} from 'langchain/chains/combine_documents'
import {createRetrievalChain} from 'langchain/chains/retrieval'
import {Document} from 'langchain/document'
import {FunctionalTranslator, SelfQueryRetriever} from 'langchain/retrievers/self_query'
import {AttributeInfo} from 'langchain/schema/query_constructor'
import {NextResponse} from 'next/server'
import {z} from 'zod'

const QA_PROMPT_TEMPLATE = `You are a good assistant that answers questions. Your knowledge is strictly limited to the following piece of context. Use it to answer the question at the end.
  If the answer can't be found in the context, just say you don't know. *DO NOT* try to make up an answer.
  If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.
  Give a response in the same language as the question.
  
  Context: """"{context}"""

  Question: """{input}"""
  Helpful answer in markdown:`

type RetrievalChainType = Runnable<
  {
    input: string
    chat_history?: BaseMessage[] | string
  } & {
    [key: string]: unknown
  },
  {
    context: Document[]
    answer: any
  } & {
    [key: string]: unknown
  }
>

const getDocumentsContents = async (chain: RetrievalChainType) => {
  const result = await chain.invoke({input: "Describe what's this content about in one sentence"})
  return result.answer
}

const getSelfQueryDocs = async ({vectorStore, prompt}: {vectorStore: HNSWLib; prompt: string}) => {
  const llm = new ChatOpenAI({
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
    modelName: 'gpt-4o-mini',
  })

  const questionAnswerChain = await createStuffDocumentsChain({
    llm,
    prompt: ChatPromptTemplate.fromTemplate(QA_PROMPT_TEMPLATE),
  })

  const chain = await createRetrievalChain({
    retriever: vectorStore.asRetriever(),
    combineDocsChain: questionAnswerChain,
  })

  const documentContents = await getDocumentsContents(chain)

  if (!documentContents) {
    return []
  }

  const attributeInfo: AttributeInfo[] = [
    {
      name: 'version',
      description: 'The version number of the document, e.g., "v3.1", "4.0"',
      type: 'string',
    },
  ]

  const retriever = await SelfQueryRetriever.fromLLM({
    documentContents,
    vectorStore,
    llm,
    structuredQueryTranslator: new FunctionalTranslator(),
    attributeInfo,
  })

  const selfQueryDocsResult = await retriever.invoke(prompt)

  return selfQueryDocsResult
}

export async function POST(request: Request) {
  const body = await request.json()
  const bodySchema = z.object({
    prompt: z.string(),
  })

  const {prompt} = bodySchema.parse(body)

  try {
    const embeddings = new OpenAIEmbeddings({
      openAIApiKey: process.env.OPENAI_API_KEY,
    })

    const vectorStore = await HNSWLib.load('vectorstore/rag-store.index', embeddings)

    const {stream, handlers} = LangChainStream()

    const llm = new ChatOpenAI({
      temperature: 0,
      openAIApiKey: process.env.OPENAI_API_KEY,
      streaming: true,
      modelName: 'gpt-4o-mini',
      callbacks: [handlers],
    })

    const selfQueryDocs = await getSelfQueryDocs({
      vectorStore,
      prompt,
    })

    const selfQueryRetriever = await HNSWLib.fromDocuments(selfQueryDocs, embeddings)

    const questionAnswerChain = await createStuffDocumentsChain({
      llm,
      prompt: ChatPromptTemplate.fromTemplate(QA_PROMPT_TEMPLATE),
    })

    const chain = await createRetrievalChain({
      retriever: selfQueryRetriever.asRetriever(),
      combineDocsChain: questionAnswerChain,
    })

    chain.invoke({input: prompt})

    return new StreamingTextResponse(stream)
  } catch (error) {
    console.log('error', error)
    return new NextResponse(JSON.stringify({error}), {
      status: 500,
      headers: {'content-type': 'application/json'},
    })
  }
}

Error Message and Stack Trace (if applicable)

Error message

Failed to import peggy. Please install peggy (i.e. "npm install peggy" or "yarn add peggy").

Stack trace

error Error: Failed to import peggy. Please install peggy (i.e. "npm install peggy" or "yarn add peggy").
    at ASTParser.importASTParser (webpack-internal:///(rsc)/./node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected][email protected][email protected]_/node_modules/langchain/dist/output_parsers/expression_type_handlers/base.js:43:19)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async ExpressionParser.ensureParser (webpack-internal:///(rsc)/./node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected][email protected][email protected]_/node_modules/langchain/dist/output_parsers/expression.js:55:27)
    at async ExpressionParser.parse (webpack-internal:///(rsc)/./node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected][email protected][email protected]_/node_modules/langchain/dist/output_parsers/expression.js:66:9)
    at async QueryTransformer.parse (webpack-internal:///(rsc)/./node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected][email protected][email protected]_/node_modules/langchain/dist/chains/query_constructor/parser.js:120:25)
    at async StructuredQueryOutputParser.outputProcessor (webpack-internal:///(rsc)/./node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected][email protected][email protected]_/node_modules/langchain/dist/chains/query_constructor/index.js:101:34)
    at async StructuredQueryOutputParser._callWithConfig (webpack-internal:///(rsc)/./node_modules/.pnpm/@[email protected][email protected][email protected]_/node_modules/@langchain/core/dist/runnables/base.js:254:22)
    at async RunnableSequence.invoke (webpack-internal:///(rsc)/./node_modules/.pnpm/@[email protected][email protected][email protected]_/node_modules/@langchain/core/dist/runnables/base.js:1288:27)
    at async SelfQueryRetriever._getRelevantDocuments (webpack-internal:///(rsc)/./node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected][email protected][email protected]_/node_modules/langchain/dist/retrievers/self_query/index.js:86:42)
    at async SelfQueryRetriever.getRelevantDocuments (webpack-internal:///(rsc)/./node_modules/.pnpm/@[email protected][email protected][email protected]_/node_modules/@langchain/core/dist/retrievers/index.js:125:29)
    at async getSelfQueryDocs (webpack-internal:///(rsc)/./src/app/api/route.ts:93:33)
    at async POST (webpack-internal:///(rsc)/./src/app/api/route.ts:117:31)
    at async /langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/compiled/next-server/app-route.runtime.dev.js:6:63809
    at async eU.execute (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/compiled/next-server/app-route.runtime.dev.js:6:53964)
    at async eU.handle (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/compiled/next-server/app-route.runtime.dev.js:6:65062)
    at async doRender (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/server/base-server.js:1317:42)
    at async cacheEntry.responseCache.get.routeKind (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/server/base-server.js:1539:28)
    at async DevServer.renderToResponseWithComponentsImpl (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/server/base-server.js:1447:28)
    at async DevServer.renderPageComponent (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/server/base-server.js:1844:24)
    at async DevServer.renderToResponseImpl (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/server/base-server.js:1882:32)
    at async DevServer.pipeImpl (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/server/base-server.js:895:25)
    at async NextNodeServer.handleCatchallRenderRequest (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/server/next-server.js:269:17)
    at async DevServer.handleRequestImpl (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/server/base-server.js:791:17)
    at async /langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/server/dev/next-dev-server.js:331:20
    at async Span.traceAsyncFn (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/trace/trace.js:151:20)
    at async DevServer.handleRequest (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/server/dev/next-dev-server.js:328:24)
    at async invokeRender (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/server/lib/router-server.js:174:21)
    at async handleRequest (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/server/lib/router-server.js:353:24)
    at async requestHandlerImpl (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/server/lib/router-server.js:377:13)
    at async Server.requestListener (/langchain-selfquery/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected]/node_modules/next/dist/server/lib/start-server.js:140:13)

Description

I'm trying to use SelfQueryRetriever but it's throwing an error of a missing package on some very specific cases.
Error: Failed to import peggy. Please install peggy (i.e. "npm install peggy" or "yarn add peggy").
It's thrown by /node_modules/langchain/dist/output_parsers/expression_type_handlers/base.js

Following are some examples of prompts that are breaking:

How do CAN and LIN signals join in automotive networks as of ISO 11898 version 2.0?
Where do Zigbee and Wi-Fi signals coexist in IoT devices as of IEEE 802.15.4?
How do OPC and MQTT protocols select data in IIoT as defined by IEC 62541 version 1.4?
Where do USB and Ethernet standards join in smart homes as per IEEE 802.3 version 1.0?

Looks like the criteria is sending a user prompt with SQL words (as, where, select, from, join) along with a word matching the AttributeInfo ('version` in this case) in some specific length. Anything different from that would work, which is very weird to me.

I got a repo set with a reproducible scenario: https://github.com/soutot/langchain-selfquery
You can pull it and follow the instructions to run the code and how to generate your vectorstore and reproduce the error.

I guess it should either include peggy package as Langchain's dependency, or handle the prompt in a different way to prevent crashing. Especially because this is very hard to catch as we don't know what the users prompts could be.

System Info

[email protected] | MIT | deps: 12 | versions: 300
Typescript bindings for langchain
https://github.com/langchain-ai/langchainjs/tree/main/langchain/

keywords: llm, ai, gpt3, chain, prompt, prompt engineering, chatgpt, machine learning, ml, openai, embeddings, vectorstores

dist
.tarball: https://registry.npmjs.org/langchain/-/langchain-0.3.5.tgz
.shasum: 87b282454bc215b12b920d4dd5e35ed58030bad1
.integrity: sha512-Gq0xC45Sq6nszS8kQG9suCrmBsuXH0INMmiF7D2TwPb6mtG35Jiq4grCk9ykpwPsarTHdty3SzUbII/FqiYSSw==
.unpackedSize: 4.7 MB

dependencies:
@langchain/openai: >=0.1.0 <0.4.0
@langchain/textsplitters: >=0.0.0 <0.2.0
js-tiktoken: ^1.0.12
js-yaml: ^4.1.0
jsonpointer: ^5.0.1
langsmith: ^0.2.0
openapi-types: ^12.1.3
p-retry: 4
uuid: ^10.0.0
yaml: ^2.2.1
zod-to-json-schema: ^3.22.3
zod: ^3.22.4

maintainers:

nfcampos [email protected]
jacoblee93 [email protected]
andrewnguonly [email protected]
davidduong [email protected]
hwchase17 [email protected]
basproul [email protected]

dist-tags:
latest: 0.3.5
next: 0.3.2-rc.0
tag-for-publishing-older-releases: 0.2.20

published 2 weeks ago by basproul [email protected]

The text was updated successfully, but these errors were encountered:

dosubot · 2024-11-14T20:52:30Z

The error you're encountering with the SelfQueryRetriever in LangChain.js is due to the missing peggy package, which is required as a peer dependency for the self-querying retriever to function properly. This package is necessary for parsing capabilities, especially when processing queries that contain SQL keywords and the word "version" ^[1].

To resolve this issue, you need to install the peggy package in your project. You can do this by running:

npm install peggy

This should resolve the error and allow the SelfQueryRetriever to process user prompts correctly.

To continue talking to Dosu, mention @dosu.

dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self-query parser crashing for some user prompts #7207

Self-query parser crashing for some user prompts #7207

soutot commented Nov 14, 2024

dosubot bot commented Nov 14, 2024

Self-query parser crashing for some user prompts #7207

Self-query parser crashing for some user prompts #7207

Comments

soutot commented Nov 14, 2024

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Error message

Stack trace

Description

System Info

dosubot bot commented Nov 14, 2024