Add OpenAI moderation #81

ccstan99 · 2023-08-16T19:17:28Z

There's been some potential misuse of the Stampy API key which puts our account at risk of being banned. It's unclear if the abuse was from stampy-chat or elsewhere. A few steps:

OpenAI advised us to use their moderation endpoint
Create a separate API only to be used for only for DEPLOYED stampy-chat to isolate usage.
Should also set openai.organization so bill for devs using personal keys are charged accordingly.
Check logs before 8/11 (when notice was received) to see if there's any prompts/queries we can protect against. See Refine prompts #3 for ideas to protect against prompt injection and somewhat limit scope of answers.

FraserLee · 2023-08-17T14:45:02Z

Is our org key something we need to keep private?

ccstan99 · 2023-08-18T00:35:59Z

I don't think it's as secret like the API key since member in the org can see it (you're a member of the org too) but just to be safe, I'd avoid posting it publicly. I've DMed you the value just in case you need it.

ccstan99 · 2023-08-21T20:45:36Z

From the email (below), it looks like BEFORE we make any calls to openai.ChatCompletion and openai.Embedding too, we should make a call to the moderation endpoint first (to the query along and again with full prompt with context) to ensure we're in compliance.

We are reaching out to you as a user of OpenAI’s API because some of the content in your requests has been flagged by our systems to be in violation of our policies against generating or attempting to generate sexual content.

We request you take action to remediate this activity within one week of receiving this message. If remedial action is not taken, please note your API access may be terminated in accordance with our Terms of Use.

To help you monitor your traffic for content violations, we recommend that you implement the Moderations endpoint, which is available free of charge.

If you're concerned that someone else may have accessed your account without your consent, please rotate your API key on the API Keys page to keep your account secure.

FraserLee · 2023-08-22T05:44:19Z

I did a some testing, and it doesn't seem like feeding the whole prompt is always enough to identify issues (the query can get drowned out in sources), so I'm checking on both the entire prompt and just the query. Seems this way will always catch it.

If moderation seems overly strict going forwards (easy to imagine on topics of politics, etc) we can be more selective than just checking "flagged".

ccstan99 · 2023-08-22T05:58:41Z

That sounds like a good plan. I think we need to catch both whether the query is problematic and also the context blocks too. Hopefully, there's not a huge lag. Politics doesn't seem to a category specified in moderation api but yes, we should log and keep an eye out for the type of things get flagged. Given the circumstance, probably better to start off strict and slow relax as needed.

ccstan99 added the help wanted Extra attention is needed label Aug 16, 2023

ccstan99 changed the title ~~Minimize misuse~~ Add OpenAI moderation Aug 16, 2023

ccstan99 assigned FraserLee Aug 20, 2023

FraserLee mentioned this issue Aug 22, 2023

implement moderation #85

Merged

FraserLee closed this as completed in #85 Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenAI moderation #81

Add OpenAI moderation #81

ccstan99 commented Aug 16, 2023 •

edited

Loading

FraserLee commented Aug 17, 2023

ccstan99 commented Aug 18, 2023

ccstan99 commented Aug 21, 2023 •

edited

Loading

FraserLee commented Aug 22, 2023 •

edited

Loading

ccstan99 commented Aug 22, 2023

Add OpenAI moderation #81

Add OpenAI moderation #81

Comments

ccstan99 commented Aug 16, 2023 • edited Loading

FraserLee commented Aug 17, 2023

ccstan99 commented Aug 18, 2023

ccstan99 commented Aug 21, 2023 • edited Loading

FraserLee commented Aug 22, 2023 • edited Loading

ccstan99 commented Aug 22, 2023

ccstan99 commented Aug 16, 2023 •

edited

Loading

ccstan99 commented Aug 21, 2023 •

edited

Loading

FraserLee commented Aug 22, 2023 •

edited

Loading