Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of hallucinations? #2

Open
joshmouch opened this issue Jan 18, 2025 · 5 comments
Open

Lots of hallucinations? #2

joshmouch opened this issue Jan 18, 2025 · 5 comments

Comments

@joshmouch
Copy link

joshmouch commented Jan 18, 2025

I know this is a simple demo, but I was expecting fewer hallucinations and something I could hypothetically use in a non demo app.

Random switching to other languages. Thinking i spelled Jose when i said J-o-s-h. Mr. Butte when i spell b-u-t-t. ;). Phone numbers incorrectly formatted. And most tool calls never getting used... like I asked to be signed up for a promotion because I saw there was a tool for it, but I never got it to call the tool.

Is this maybe holding out for o1 or o3 before it would work in a production scenario? Is it maybe expected that the models should be fine tuned in a real usage scenario?

Im wondering if maybe in practice there needs to be some fine tuning that occurs before this is used. But if that's the case then I think the training instructions and expectations should be included with the demo. Maybe a way to evaluate how well the agents are working?

@nm-openai
Copy link
Collaborator

Hi @joshmouch thanks for this feedback! Going through point by point:

  • I'd be curious to learn more about the situation in which you're seeing random language switching
  • We're working on an improvement for spelling comprehension
  • For phone number formatting, I think I have it in some of the tool calls, but you can actually constrain the phone number format in the tool schema, like
phone_number: {
            type: "string",
            description:
              "User's phone number used for verification. Formatted like '(111) 222-3333'",
            pattern: "^\\(\\d{3}\\) \\d{3}-\\d{4}$",
          },
  • For the offer tool call, I always saw it work if you go through the snowboard authentication flow and accept the offer. Was that not working for you? I'd also update the state machine and the tool call instructions to make it more explicit when you'd like the tool to be called. Also, if you're using 4o-mini I'd expect 4o to be better here.

Across the board (esp language switching and spelling), if you could please share session_ids we can use to debug that would be super helpful!

@scycs
Copy link

scycs commented Jan 27, 2025

@joshmouch Was it switching to German? The instructions in simulatedHuman.ts include You respond only in German.

@lcwyo
Copy link

lcwyo commented Jan 31, 2025

If I am running it in the development mode, npm run dev and I make changes to the files, it suddenly starts speaking Spanish. It seems to me like it starts a new thread, or call to the realtime API, but then it answers in Spanish and English in parallel.

@nm-openai
Copy link
Collaborator

nm-openai commented Feb 1, 2025 via email

@hasani114
Copy link

Generally it seems less stable than AVM that is available from OpenAI. I get too many hallucinations and misunderstanding which is not common in AVM, is this using a smaller model? Even with gpt-4o (not mini) I've been having issues.

Also, strangely, using push to talk doesn't override/truncate the current AI Message (neither does sending a text message). I've read online the conversation.item.truncate function doesn't work, is this a known issue that OAI is working on resolving?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants