-
-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR ### Here is the buggy response - easily resolved #22
Comments
I got the same message using Ollama and a long text (Quixote book in txt) |
I dunno if this is useful, but I found a workaround the whole json issue. It will require using a nested function and having the AI output the text formatted AS JSON but in reality, its markdown: import nltk
nltk.download('punkt')
# Define the nest_sentences function for batching
def nest_sentences(document):
nested = []
sent = []
length = 0
for sentence in nltk.sent_tokenize(document):
length += len(sentence)
if length < 1024:
sent.append(sentence)
else:
nested.append(" ".join(sent))
sent = [sentence]
length = len(sentence)
if sent:
nested.append(" ".join(sent))
return nested
def extractConcepts(prompt: str, model='mistral:latest'):
SYS_PROMPT = (
"Your task is to extract the key entities mentioned in the users input.\n"
"Entities may include - event, concept, person, place, object, document, organisation, artifact, misc, etc.\n"
"Format your output as a list of json with the following structure.\n"
"[{\n"
" \"entity\": The Entity string\n"
" \"importance\": How important is the entity given the context on a scale of 1 to 5, 5 being the highest.\n"
" \"type\": Type of entity\n"
"}, { }]"
)
response, context = client.generate(model_name=model, system=SYS_PROMPT, prompt=prompt)
# Initialize markdown_output at the start of the function
markdown_output = ""
# Check if response is in the expected list of dictionaries format
if isinstance(response, list) and all(isinstance(item, dict) for item in response):
for item in response:
markdown_output += (
f"## {item['entity']} ({item['type']})\n- {item['question']}\n\n"
)
return markdown_output
# Process each page's content in batches and generate questions
all_questions = []
for page in pages:
batches = nest_sentences(page.page_content)
for batch in batches:
batch_questions = extractConcepts(prompt=batch)
if batch_questions:
all_questions.extend(batch_questions)
for question in batch_questions:
print(batch_questions) This is the output: [
{
"entity": "A",
"importance": 3,
"type": "concept"
},
{
"entity": "Ockham",
"importance": 4,
"type": "person"
},
{
"entity": "God",
"importance": 5,
"type": "deity"
},
{
"entity": "power",
"importance": 2,
"type": "concept"
},
{
"entity": "cognition",
"importance": 4,
"type": "concept"
},
{
"entity": "proposition",
"importance": 3,
"type": "concept"
}
] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I haven't created a PR for this as I'm not using OpenAI or Ollama and have modified multiple things.
Your SYSTEM PROMPT is currently equal to:
The
, {...}\n"
seems to be causing poor JSON outputs every now again, because some of the results literally have a trailing,{...}
. This seems due to hallucination and poor understanding by the LLM. Probably more prone in an open source one.This is easy to resolve by simply removing that trailing
,{...}
. I've changed the wording slightly and am now getting no"ERROR ### Here is the buggy response:"
errorsThe text was updated successfully, but these errors were encountered: