Lots of hallucinations? #2

joshmouch · 2025-01-18T05:53:04Z

I know this is a simple demo, but I was expecting fewer hallucinations and something I could hypothetically use in a non demo app.

Random switching to other languages. Thinking i spelled Jose when i said J-o-s-h. Mr. Butte when i spell b-u-t-t. ;). Phone numbers incorrectly formatted. And most tool calls never getting used... like I asked to be signed up for a promotion because I saw there was a tool for it, but I never got it to call the tool.

Is this maybe holding out for o1 or o3 before it would work in a production scenario? Is it maybe expected that the models should be fine tuned in a real usage scenario?

Im wondering if maybe in practice there needs to be some fine tuning that occurs before this is used. But if that's the case then I think the training instructions and expectations should be included with the demo. Maybe a way to evaluate how well the agents are working?

nm-openai · 2025-01-23T00:20:13Z

Hi @joshmouch thanks for this feedback! Going through point by point:

I'd be curious to learn more about the situation in which you're seeing random language switching
We're working on an improvement for spelling comprehension
For phone number formatting, I think I have it in some of the tool calls, but you can actually constrain the phone number format in the tool schema, like

phone_number: {
            type: "string",
            description:
              "User's phone number used for verification. Formatted like '(111) 222-3333'",
            pattern: "^\\(\\d{3}\\) \\d{3}-\\d{4}$",
          },

For the offer tool call, I always saw it work if you go through the snowboard authentication flow and accept the offer. Was that not working for you? I'd also update the state machine and the tool call instructions to make it more explicit when you'd like the tool to be called. Also, if you're using 4o-mini I'd expect 4o to be better here.

Across the board (esp language switching and spelling), if you could please share session_ids we can use to debug that would be super helpful!

scycs · 2025-01-27T18:02:04Z

@joshmouch Was it switching to German? The instructions in simulatedHuman.ts include You respond only in German.

lcwyo · 2025-01-31T14:12:40Z

If I am running it in the development mode, npm run dev and I make changes to the files, it suddenly starts speaking Spanish. It seems to me like it starts a new thread, or call to the realtime API, but then it answers in Spanish and English in parallel.

nm-openai · 2025-02-01T01:41:38Z

Oh strange, sorry about that. I haven't extensively tested the reload behavior, and sometimes I do get dupe sessions or strange behavior. I'd just lean on refreshing for now and I'll try to look at it when I can! Best, Noah

…

On Fri, Jan 31, 2025 at 6:13 AM Lance Chatwell ***@***.***> wrote: If I am running it in the development mode, npm run dev and I make changes to the files, it suddenly starts speaking Spanish. It seems to me like it starts a new thread, or call to the realtime API, but then it answers in Spanish and English in parallel. — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BI6EWJHR6DO6HUNTOFGNMBT2NOAG5AVCNFSM6AAAAABVNHE366VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRXGQ2TCMBZHE> . You are receiving this because you commented.Message ID: ***@***.***>

hasani114 · 2025-02-26T20:30:54Z

Generally it seems less stable than AVM that is available from OpenAI. I get too many hallucinations and misunderstanding which is not common in AVM, is this using a smaller model? Even with gpt-4o (not mini) I've been having issues.

Also, strangely, using push to talk doesn't override/truncate the current AI Message (neither does sending a text message). I've read online the conversation.item.truncate function doesn't work, is this a known issue that OAI is working on resolving?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lots of hallucinations? #2

Lots of hallucinations? #2

joshmouch commented Jan 18, 2025 •

edited

Loading

nm-openai commented Jan 23, 2025

scycs commented Jan 27, 2025

lcwyo commented Jan 31, 2025

nm-openai commented Feb 1, 2025 via email

hasani114 commented Feb 26, 2025

Lots of hallucinations? #2

Lots of hallucinations? #2

Comments

joshmouch commented Jan 18, 2025 • edited Loading

nm-openai commented Jan 23, 2025

scycs commented Jan 27, 2025

lcwyo commented Jan 31, 2025

nm-openai commented Feb 1, 2025 via email

hasani114 commented Feb 26, 2025

joshmouch commented Jan 18, 2025 •

edited

Loading