Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenAI moderation #81

Closed
ccstan99 opened this issue Aug 16, 2023 · 5 comments · Fixed by #85
Closed

Add OpenAI moderation #81

ccstan99 opened this issue Aug 16, 2023 · 5 comments · Fixed by #85
Assignees
Labels
help wanted Extra attention is needed

Comments

@ccstan99
Copy link
Collaborator

ccstan99 commented Aug 16, 2023

There's been some potential misuse of the Stampy API key which puts our account at risk of being banned. It's unclear if the abuse was from stampy-chat or elsewhere. A few steps:

  1. OpenAI advised us to use their moderation endpoint
  2. Create a separate API only to be used for only for DEPLOYED stampy-chat to isolate usage.
    Should also set openai.organization so bill for devs using personal keys are charged accordingly.
  3. Check logs before 8/11 (when notice was received) to see if there's any prompts/queries we can protect against. See Refine prompts #3 for ideas to protect against prompt injection and somewhat limit scope of answers.
@ccstan99 ccstan99 added the help wanted Extra attention is needed label Aug 16, 2023
@ccstan99 ccstan99 changed the title Minimize misuse Add OpenAI moderation Aug 16, 2023
@FraserLee
Copy link
Collaborator

Is our org key something we need to keep private?

@ccstan99
Copy link
Collaborator Author

I don't think it's as secret like the API key since member in the org can see it (you're a member of the org too) but just to be safe, I'd avoid posting it publicly. I've DMed you the value just in case you need it.

@ccstan99
Copy link
Collaborator Author

ccstan99 commented Aug 21, 2023

From the email (below), it looks like BEFORE we make any calls to openai.ChatCompletion and openai.Embedding too, we should make a call to the moderation endpoint first (to the query along and again with full prompt with context) to ensure we're in compliance.

We are reaching out to you as a user of OpenAI’s API because some of the content in your requests has been flagged by our systems to be in violation of our policies against generating or attempting to generate sexual content.

We request you take action to remediate this activity within one week of receiving this message. If remedial action is not taken, please note your API access may be terminated in accordance with our Terms of Use.

To help you monitor your traffic for content violations, we recommend that you implement the Moderations endpoint, which is available free of charge.

If you're concerned that someone else may have accessed your account without your consent, please rotate your API key on the API Keys page to keep your account secure.

@FraserLee
Copy link
Collaborator

FraserLee commented Aug 22, 2023

I did a some testing, and it doesn't seem like feeding the whole prompt is always enough to identify issues (the query can get drowned out in sources), so I'm checking on both the entire prompt and just the query. Seems this way will always catch it.

If moderation seems overly strict going forwards (easy to imagine on topics of politics, etc) we can be more selective than just checking "flagged".

@ccstan99
Copy link
Collaborator Author

That sounds like a good plan. I think we need to catch both whether the query is problematic and also the context blocks too. Hopefully, there's not a huge lag. Politics doesn't seem to a category specified in moderation api but yes, we should log and keep an eye out for the type of things get flagged. Given the circumstance, probably better to start off strict and slow relax as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
Development

Successfully merging a pull request may close this issue.

2 participants