Policy: Auto-Moderation

Background

OpenAI has a free moderation API which they recommend using to evaluate all user-provided content prior to passing it to OpenAI’s completion or embedding APIs. Not properly moderating content can lead to getting banned by OpenAI. While you can use the moderation API via OpenAI’s SDKs, it requires updates to your codebase to inject additional pre-processing code before each API call. When using tools like LangChain, the actual API call may be abstracted, making it more difficult to determine where to perform the moderation call.

Usage Panda solves this problem by dynamically injecting a call to the moderation API via its proxy. If the moderation call returns a high likelihood of sensitive content, the request is blocked and not set to the completion or embedding API. This is done fully within the Usage Panda proxy with no changes to your codebase.

Enabling the Setting

To perform auto-moderation of API calls:

Navigate to the API Keys page
Click the gear (settings) icon on the API key you wish to modify
Scroll down to the “Auto-Moderate” setting and toggle the setting
Click “Save”

Setting via Headers

You can optionally override this setting on a per-request basis by passing the x-usagepanda-auto-moderate header, like so:

response = openai.Completion.create(
  model="text-davinci-003",
  prompt="[sensitive content here]",
  headers={ # Usage Panda Auth
    "x-usagepanda-api-key": USAGE_PANDA_KEY,
    "x-usagepanda-auto-moderate": "true"
  }
)
output = response.choices[0].text

The above request will fail if sensitive content is defined in the prompt and include a reason (e.g., self-harm)

openai.error.APIError: Usage Panda: Moderation flagged this request: self-harm {"error":{"message":"Usage Panda: Moderation flagged this request: self-harm","type":"invalid_request","param":null,"code":null}} 422 {'error': {'message': 'Usage Panda: Moderation flagged this request: self-harm', 'type': 'invalid_request', 'param': None, 'code': None}} {'Access-Control-Allow-Headers': '*', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'OPTIONS,POST,GET', 'Content-Type': 'application/json', 'Date': 'Thu, 01 Jun 2023 23:29:08 GMT', 'Connection': 'keep-alive', 'Keep-Alive': 'timeout=5', 'Transfer-Encoding': 'chunked'}

Flagged Requests

Requests that are blocked because of the auto-moderation setting will be flagged in the logs:

Usage Panda Auto-Moderate