Policy: Enforce Max Tokens

Background

The OpenAI API supports a parameter called max_tokens for completion and chat completion requests. This parameter determines how many tokens are used to generate the response. If this parameter is not set, the length of the response can vary, and the associated costs can vary as well. To more carefully control these costs, it is recommended to set this parameter according to your use case and cost sensitivity for your application.

Usage Panda can help enforce that this parameter is always set in your requests by blocking requests that either do not define the max_tokens parameter, or set it to a value higher than a pre-defined threshold. This is a relatively simple, but effective, cost control mechanism.

Enabling the Setting

To enforce the max tokens parameter:

Navigate to the API Keys page
Click the gear (settings) icon on the API key you wish to modify
Scroll down to the “Enforce Max Tokens” setting and enter a value
Click “Save”

Setting via Headers

You can optionally override this setting on a per-request basis by passing the x-usagepanda-max-tokens header, like so:

response = openai.Completion.create(
  model="text-davinci-003",
  prompt="Hello there",
  headers={ # Usage Panda Auth
    "x-usagepanda-api-key": USAGE_PANDA_KEY,
    "x-usagepanda-max-tokens": "100"
  }
)
output = response.choices[0].text

The above request will fail because the max_tokens parameter is not set:

openai.error.APIError: Usage Panda: Config set to max tokens of: 10; request was: 50 {"error":{"message":"Usage Panda: Config set to max tokens of: 10; request was: 50","type":"invalid_request","param":null,"code":null}} 422 {'error': {'message': 'Usage Panda: Config set to max tokens of: 10; request was: 50', 'type': 'invalid_request', 'param': None, 'code': None}} {'Access-Control-Allow-Headers': '*', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'OPTIONS,POST,GET', 'Content-Type': 'application/json', 'Date': 'Thu, 01 Jun 2023 23:11:11 GMT', 'Connection': 'keep-alive', 'Keep-Alive': 'timeout=5', 'Transfer-Encoding': 'chunked'}

Flagged Requests

Requests that are blocked because of the max tokens setting will be flagged in the logs: