Basic Information
Public Service Address
https://api.moonshot.aiMoonshot offers API services based on HTTP, and for most APIs, we are compatible with the OpenAI SDK.
Quickstart
Single-turn chat
The official OpenAI SDK supports Python (opens in a new tab) and Node.js (opens in a new tab). Below are examples of how to interact with the API using OpenAI SDK and Curl:
from openai import OpenAI
client = OpenAI(
api_key = "$MOONSHOT_API_KEY",
base_url = "https://api.moonshot.ai/v1",
)
completion = client.chat.completions.create(
model = "kimi-k2.5",
messages = [
{"role": "system", "content": "You are Kimi, an AI assistant provided by Moonshot AI. You are proficient in Chinese and English conversations. You provide users with safe, helpful, and accurate answers. You will reject any questions involving terrorism, racism, or explicit content. Moonshot AI is a proper noun and should not be translated."},
{"role": "user", "content": "Hello, my name is Li Lei. What is 1+1?"}
]
)
print(completion.choices[0].message.content)Replace $MOONSHOT_API_KEY with the API Key you created on the platform.
When running the code in the documentation using the OpenAI SDK, ensure that your Python version is at least 3.7.1, your Node.js version is at least 18, and your OpenAI SDK version is no lower than 1.0.0.
pip install --upgrade 'openai>=1.0'You can easily check the version of your library like this:
python -c 'import openai; print("version =",openai.__version__)' # The output might be version = 1.10.0, indicating that the current python is using the v1.10.0 library of openai
Multi-turn chat
In the single-turn chat example above, the language model takes a list of user messages as input and returns the generated response as output. Sometimes, we can also use the model's output as part of the input to achieve multi-turn chat. Below is a simple example of implementing multi-turn chat:
from openai import OpenAI
client = OpenAI(
api_key = "$MOONSHOT_API_KEY",
base_url = "https://api.moonshot.ai/v1",
)
history = [
{"role": "system", "content": "You are Kimi, an AI assistant provided by Moonshot AI. You are proficient in Chinese and English conversations. You provide users with safe, helpful, and accurate answers. You will reject any questions involving terrorism, racism, or explicit content. Moonshot AI is a proper noun and should not be translated."}
]
def chat(query, history):
history.append({
"role": "user",
"content": query
})
completion = client.chat.completions.create(
model="kimi-k2.5",
messages=history
)
result = completion.choices[0].message.content
history.append({
"role": "assistant",
"content": result
})
return result
print(chat("What is the rotation period of the Earth?", history))
print(chat("What about the Moon?", history))It is worth noting that as the chat progresses, the number of tokens the model needs to process will increase linearly. When necessary, some optimization strategies should be employed, such as retaining only the most recent few rounds of chat.
API Documentation
Chat Completion
Request URL
POST https://api.moonshot.ai/v1/chat/completionsRequest
Example
{
"model": "kimi-k2.5",
"messages": [
{
"role": "system",
"content": "You are Kimi, an AI assistant provided by Moonshot AI. You are proficient in Chinese and English conversations. You aim to provide users with safe, helpful, and accurate responses. You will refuse to answer any questions related to terrorism, racism, or explicit content. Moonshot AI is a proper noun and should not be translated into other languages."
},
{ "role": "user", "content": "Hello, my name is Li Lei. What is 1+1?" }
]
}Request body
| Field | Required | Description | Type | Values |
|---|---|---|---|---|
| messages | required | A list of messages that have been exchanged in the conversation so far | List[Dict] | This is a list of structured elements, each similar to: {"role": "user", "content": "Hello"} The role can only be one of system, user, assistant, and the content must not be empty. See Content Field Description for detailed information about the content field formats |
| model | required | Model ID, which can be obtained through List Models | string | Currently one of kimi-k2.5,kimi-k2-0905-preview, kimi-k2-0711-preview, kimi-k2-turbo-preview, kimi-k2-thinking-turbo, kimi-k2-thinking, moonshot-v1-8k,moonshot-v1-32k,moonshot-v1-128k, moonshot-v1-auto,moonshot-v1-8k-vision-preview,moonshot-v1-32k-vision-preview,moonshot-v1-128k-vision-preview |
| max_tokens | optional | Deprecated, please refer to max_completion_tokens | int | - |
| max_completion_tokens | optional | The maximum number of tokens to generate for the chat completion. If the result reaches the maximum number of tokens without ending, the finish reason will be "length"; otherwise, it will be "stop" | int | It is recommended to provide a reasonable value as needed. If not provided, we will use a good default integer like 1024. Note: This max_completion_tokens refers to the length of the tokens you expect us to return, not the total length of input plus output. For example, for a moonshot-v1-8k model, the maximum total length of input plus output is 8192. When the total length of the input messages is 4096, you can set this to a maximum of 4096; otherwise, our service will return an invalid input parameter (invalid_request_error) and refuse to respond. If you want to know the "exact number of input tokens," you can use the "Token Calculation" API below to get the count using our calculator |
| temperature | optional | The sampling temperature to use, ranging from 0 to 1. A higher value (e.g., 0.7) will make the output more random, while a lower value (e.g., 0.2) will make it more focused and deterministic | float | Default is 0.0 for moonshot-v1 series models, 0.6 for kimi-k2 models and 1.0 for kimi-k2-thinking models. This parameter cannot be modified for the kimi-k2.5 model. |
| top_p | optional | Another sampling method, where the model considers the results of tokens with a cumulative probability mass of top_p. Thus, 0.1 means only considering the top 10% of tokens by probability mass. Generally, we suggest changing either this or the temperature, but not both at the same time | float | Default is 1.0 for moonshot-v1 series and kimi-k2 models, 0.95 for kimi-k2.5 model. This parameter cannot be modified for the k2.5 model. |
| n | optional | The number of results to generate for each input message | int | Default is 1 for moonshot-v1 series and kimi-k2 models, and it must not exceed 5. Specifically, when the temperature is very close to 0, we can only return one result. If n is set and > 1 in this case, our service will return an invalid input parameter (invalid_request_error). Default is 1 for kimi-k2.5 model and it cannot be modified. |
| presence_penalty | optional | Presence penalty, a number between -2.0 and 2.0. A positive value will penalize new tokens based on whether they appear in the text, increasing the likelihood of the model discussing new topics | float | Default is 0. This parameter cannot be modified for the kimi-k2.5 model. |
| frequency_penalty | optional | Frequency penalty, a number between -2.0 and 2.0. A positive value will penalize new tokens based on their existing frequency in the text, reducing the likelihood of the model repeating the same phrases verbatim | float | Default is 0. This parameter cannot be modified for the kimi-k2.5 model. |
| response_format | optional | Setting this to {"type": "json_object"} enables JSON mode, ensuring that the generated information is valid JSON. When you set response_format to {"type": "json_object"}, you must explicitly guide the model to output JSON-formatted content in the prompt and specify the exact format of the JSON, otherwise it may result in unexpected outcomes. | object | Default is {"type": "text"} |
| stop | optional | Stop words, which will halt the output when a full match is found. The matched words themselves will not be output. A maximum of 5 strings is allowed, and each string must not exceed 32 bytes | String, List[String] | Default is null |
| thinking | optional | Only available for kimi-k2.5 model. This parameter controls if the thinking is enabled for this request | object | Default to be {"type": "enabled"}. Value can only be one of {"type": "enabled"} or {"type": "disabled"} |
| stream | optional | Whether to return the response in a streaming fashion | bool | Default is false, and true is an option |
| stream_options.include_usage | optional | If set, an additional chunk will be streamed before the data: [DONE] message. The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array. All other chunks will also include a usage field, but with a null value. NOTE: If the stream is interrupted, you may not receive the final usage chunk which contains the total token usage for the request | bool | Default is false |
| prompt_cache_key | optional | Used to cache responses for similar requests to optimize cache hit rates | string | Default is null. For Coding Agents, this is typically a session id or task id representing a single session; if the session is exited and later resumed, this value should remain the same. For Kimi Code Plan, this field is required to improve cache hit rates. For other agents involving multi-turn conversations, it is also recommended to implement this field |
| safety_identifier | optional | A stable identifier used to help detect users of your application that may be violating usage policies. The ID should be a string that uniquely identifies each user. It is recommended to hash the username or email address to avoid sending any identifying information | string | Default is null |
Content Field Description
The content field in the message can have different types of values:
- plain text, just
string List[Dict]when you need to pass more complex information and each dict can have following fields:typefield is always necessary and is used to identify type of content. Its value should be one oftext,image_urlorvideo_url.textfield is necessary whentypeistext. Its value should be plain text.image_urlfield is necessary whentypeisimage_url. Its value should be a dict indicating content of image like{"url": "data:image/png;base64,abc123xxxxx==}video_urlfield is necessary whentypeisvideo_url. Its value should be a dict indicating content of videl like{"url": "data:video/mp4;base64,def456yyyyy==}
The following are all valid content field examples:
"Hello"[{"type": "text", "text": "Hello"}][{"type": "image_url", "image_url": {"url": "data:image/png;base64,abc123xxxxx=="}}][{"type": "video_url", "video_url": {"url": "data:video/mp4;base64,def456yyyyy=="}}][{"type": "text", "text": "这是什么?"}, {"type": "image_url", "image_url": {"url": "data:image/png;base64,abc123xxxxx=="}}]
Note that url field of image_url and video_url can be base64 format or ms://<file_id>.
Please refer to Use the Kimi Vision Model for detail.
Return
For non-streaming responses, the return format is similar to the following:
{
"id": "cmpl-04ea926191a14749b7f2c7a48a68abc6",
"object": "chat.completion",
"created": 1698999496,
"model": "kimi-k2.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello, Li Lei! 1+1 equals 2. If you have any other questions, feel free to ask!"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 19,
"completion_tokens": 21,
"total_tokens": 40,
"cached_tokens": 10 # The number of tokens hit by the cache, only models that support automatic caching will return this field
}
}For streaming responses, the return format is similar to the following:
data: {"id":"cmpl-1305b94c570f447fbde3180560736287","object":"chat.completion.chunk","created":1698999575,"model":"kimi-k2.5","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"cmpl-1305b94c570f447fbde3180560736287","object":"chat.completion.chunk","created":1698999575,"model":"kimi-k2.5","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
...
data: {"id":"cmpl-1305b94c570f447fbde3180560736287","object":"chat.completion.chunk","created":1698999575,"model":"kimi-k2.5","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}
data: {"id":"cmpl-1305b94c570f447fbde3180560736287","object":"chat.completion.chunk","created":1698999575,"model":"kimi-k2.5","choices":[{"index":0,"delta":{},"finish_reason":"stop","usage":{"prompt_tokens":19,"completion_tokens":13,"total_tokens":32}}]}
data: [DONE]Example Request
For simple calls, refer to the previous example. For streaming calls, you can refer to the following code snippet:
from openai import OpenAI
client = OpenAI(
api_key = "$MOONSHOT_API_KEY",
base_url = "https://api.moonshot.ai/v1",
)
response = client.chat.completions.create(
model="kimi-k2.5",
messages=[
{
"role": "system",
"content": "You are Kimi, an AI assistant provided by Moonshot AI. You excel at conversing in Chinese and English. You provide users with safe, helpful, and accurate responses. You refuse to answer any questions related to terrorism, racism, or explicit content. Moonshot AI is a proper noun and should not be translated into other languages.",
},
{"role": "user", "content": "Hello, my name is Li Lei. What is 1+1?"},
],
stream=True,
)
collected_messages = []
for idx, chunk in enumerate(response):
# print("Chunk received, value: ", chunk)
chunk_message = chunk.choices[0].delta
if not chunk_message.content:
continue
collected_messages.append(chunk_message) # save the message
print(f"#{idx}: {''.join([m.content for m in collected_messages])}")
print(f"Full conversation received: {''.join([m.content for m in collected_messages])}")Vision
Example
{
"model": "kimi-k2.5",
"messages":
[
{
"role": "system",
"content": "You are Kimi, an AI assistant provided by Moonshot AI. You are proficient in both Chinese and English conversations. You aim to provide users with safe, helpful, and accurate answers. You will refuse to answer any questions related to terrorism, racism, pornography, or violence. Moonshot AI is a proper noun and should not be translated into any other language."
},
{
"role": "user",
"content":
[
{
"type": "image_url",
"image_url":
{
"url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAGAAAABhCAYAAAApxKSdAAAACXBIWXMAACE4AAAhOAFFljFgAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAUUSURBVHgB7Z29bhtHFIWPHQN2J7lKqnhYpYvpIukCbJEAKQJEegLReYFIT0DrCSI9QEDqCSIDaQIEIOukiJwyza5SJWlId3FFz+HuGmuSSw6p+dlZ3g84luhdUeI9M3fmziyXgBCUe/DHYY0Wj/tgWmjV42zFcWe4MIBBPNJ6qqW0uvAbXFvQgKzQK62bQhkaCIPc10q1Zi3XH1o/IG9cwUm0RogrgDY1KmLgHYX9DvyiBvDYI77XmiD+oLlQHw7hIDoCMBOt1U9w0BsU9mOAtaUUFk3oQoIfzAQFCf5dNMEdTFCQ4NtQih1NSIGgf3ibxOJt5UrAB1gNK72vIdjiI61HWr+YnNxDXK0rJiULsV65GJeiIescLSTTeobKSutiCuojX8kU3MBx4I3WeNVBBRl4fWiCyoB8v2JAAkk9PmDwT8sH1TEghRjgC27scCx41wO43KAg+ILxTvhNaUACwTc04Z0B30LwzTzm5Rjw3sgseIG1wGMawMBPIOQcqvzrNIMHOg9Q5KK953O90/rFC+BhJRH8PQZ+fu7SjC7HAIV95yu99vjlxfvBJx8nwHd6IfNJAkccOjHg6OgIs9lsra6vr2GTNE03/k7q8HAhyJ/2gM9O65/4kT7/mwEcoZwYsPQiV3BwcABb9Ho9KKU2njccDjGdLlxx+InBBPBAAR86ydRPaIC9SASi3+8bnXd+fr78nw8NJ39uDJjXAVFPP7dp/VmWLR9g6w6Huo/IOTk5MTpvZesn/93AiP/dXCwd9SyILT9Jko3n1bZ+8s8rGPGvoVHbEXcPMM39V1dX9Qd/19PPNxta959D4HUGF0RrAFs/8/8mxuPxXLUwtfx2WX+cxdivZ3DFA0SKldZPuPTAKrikbOlMOX+9zFu/Q2iAQoSY5H7mfeb/tXCT8MdneU9wNNCuQUXZA0ynnrUznyqOcrspUY4BJunHqPU3gOgMsNr6G0B0BpgUXrG0fhKVAaaF1/HxMWIhKgNMcj9Tz82Nk6rVGdav/tJ5eraJ0Wi01XPq1r/xOS8uLkJc6XYnRTMNXdf62eIvLy+jyftVghnQ7Xahe8FW59fBTRYOzosDNI1hJdz0lBQkBflkMBjMU5iL13pXRb8fYAJrB/a2db0oFHthAOEUliaYFHE+aaUBdZsvvFhApyM0idYZwOCvW4JmIWdSzPmidQaYrAGZ7iX4oFUGnJ2dGdUCTRqMozeANQCLsE6nA10JG/0Mx4KmDMbBCjEWR2yxu8LAM98vXelmCA2ovVLCI8EMYODWbpbvCXtTBzQVMSAwYkBgxIDAtNKAXWdGIRADAiMpKDA0IIMQikx6QGDEgMCIAYGRMSAsMgaEhgbcQgjFa+kBYZnIGBCWWzEgLPNBOJ6Fk/aR8Y5ZCvktKwX/PJZ7xoVjfs+4chYU11tK2sE85qUBLyH4Zh5z6QHhGPOf6r2j+TEbcgdFP2RaHX5TrYQlDflj5RXE5Q1cG/lWnhYpReUGKdUewGnRmhvnCJbgmxey8sHiZ8iwF3AsUBBckKHI/SWLq6HsBc8huML4DiK80D6WnBqLzN68UFCmopheYJOVYgcU5FOVbAVfYUcUZGoaLPglCtITdg2+tZUFBTFh2+ArWEYh/7z0WIIQSiM43lt5AWAmWhLHylN4QmkNEXfAbGqEQKsHSfHLYwiSq8AnaAAKeaW3D8VbijwNW5nh3IN9FPI/jnpaPKZi2/SfFuJu4W3x9RqWL+N5C+7ruKpBAgLkAAAAAElFTkSuQmCC"
}
},
{
"type": "text",
"text": "Please describe this image."
}
]
}
]
}Image Content Field Description
When using the Vision model, the message.content field will change from str to List[Object[str, any]]. Each element in the List has the following fields:
| Parameter Name | Required | Description | Type |
|---|---|---|---|
| type | required | Supports only text type (text) or image type (image_url) | string |
| image_url | required | Object for transmitting the image | Dict[str, any] |
The fields for the image_url parameter are as follows:
| Parameter Name | Required | Description | Type |
|---|---|---|---|
| url | required | Image content encoded in base64 or identified by file id | string |
Example Request
import os
import base64
from openai import OpenAI
client = OpenAI(
api_key = os.environ.get("MOONSHOT_API_KEY"),
base_url = "https://api.moonshot.ai/v1",
)
# Encode the image in base64
with open("your_image_path", 'rb') as f:
img_base = base64.b64encode(f.read()).decode('utf-8')
response = client.chat.completions.create(
model="kimi-k2.5",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{img_base}"
}
},
{
"type": "text",
"text": "Please describe this image."
}
]
}
]
)
print(response.choices[0].message.content)List Models
Request URL
GET https://api.moonshot.ai/v1/modelsExample request
from openai import OpenAI
client = OpenAI(
api_key = "$MOONSHOT_API_KEY",
base_url = "https://api.moonshot.ai/v1",
)
model_list = client.models.list()
model_data = model_list.data
for i, model in enumerate(model_data):
print(f"model[{i}]:", model.id)Error Explanation
Here are some examples of error responses:
{
"error": {
"type": "content_filter",
"message": "The request was rejected because it was considered high risk"
}
}Below are explanations for the main errors:
| HTTP Status Code | error type | error message | Detailed Description |
|---|---|---|---|
| 400 | content_filter | The request was rejected because it was considered high risk | Content review rejection, your input or generated content may contain unsafe or sensitive information. Please avoid prompts that could generate sensitive content. Thank you. |
| 400 | invalid_request_error | Invalid request: {error_details} | Invalid request, usually due to incorrect request format or missing necessary parameters. Please check and retry. |
| 400 | invalid_request_error | Input token length too long | The length of tokens in the request is too long. Do not exceed the model's maximum token limit. |
| 400 | invalid_request_error | Your request exceeded model token limit : {max_model_length} | The sum of the tokens in the request and the set max_tokens exceeds the model's specification length. Please check the request body's specifications or choose a model with an appropriate length. |
| 400 | invalid_request_error | Invalid purpose: only 'file-extract' accepted | The purpose (purpose) in the request is incorrect. Currently, only 'file-extract' is accepted. Please modify and retry. |
| 400 | invalid_request_error | File size is too large, max file size is 100MB, please confirm and re-upload the file | The uploaded file size exceeds the limit. Please re-upload. |
| 400 | invalid_request_error | File size is zero, please confirm and re-upload the file | The uploaded file size is 0. Please re-upload. |
| 400 | invalid_request_error | The number of files you have uploaded exceeded the max file count {max_file_count}, please delete previous uploaded files | The total number of uploaded files exceeds the limit. Please delete unnecessary earlier files and re-upload. |
| 401 | invalid_authentication_error | Invalid Authentication | Authentication failed. Please check if the apikey is correct and retry. |
| 401 | incorrect_api_key_error | Incorrect API key provided | Authentication failed. Please check if the apikey is provided and correct, then retry. |
| 429 | exceeded_current_quota_error | Your account {organization-id}<{ak-id}> is suspended, please check your plan and billing details | Account balance is insufficient. Please check your account balance. |
| 403 | permission_denied_error | The API you are accessing is not open | The API you are trying to access is not currently open. |
| 403 | permission_denied_error | You are not allowed to get other user info | Accessing other users' information is not permitted. Please check. |
| 404 | resource_not_found_error | Not found the model {model-id} or Permission denied | The model does not exist or you do not have permission to access it. Please check and retry. |
| 429 | engine_overloaded_error | The engine is currently overloaded, please try again later | There are currently too many concurrent requests, and the node is rate-limited. Please retry later. It is recommended to upgrade your tier for a smoother experience. |
| 429 | exceeded_current_quota_error | You exceeded your current token quota: <{organization_id}> {token_credit}, please check your account balance | Your account balance is insufficient. Please check your account balance and ensure it can cover the cost of your token consumption before retrying. |
| 429 | rate_limit_reached_error | Your account {organization-id}<{ak-id}> request reached organization max concurrency: {Concurrency}, please try again after {time} seconds | Your request has reached the account's concurrency limit. Please wait for the specified time before retrying. |
| 429 | rate_limit_reached_error | Your account {organization-id}<{ak-id}> request reached organization max RPM: {RPM}, please try again after {time} seconds | Your request has reached the account's RPM rate limit. Please wait for the specified time before retrying. |
| 429 | rate_limit_reached_error | Your account {organization-id}<{ak-id}> request reached organization TPM rate limit, current:{current_tpm}, limit:{max_tpm} | Your request has reached the account's TPM rate limit. Please wait for the specified time before retrying. |
| 429 | rate_limit_reached_error | Your account {organization-id}<{ak-id}> request reached organization TPD rate limit,current:{current_tpd}, limit:{max_tpd} | Your request has reached the account's TPD rate limit. Please wait for the specified time before retrying. |
| 500 | server_error | Failed to extract file: {error} | Failed to parse the file. Please retry. |
| 500 | unexpected_output | invalid state transition | Internal error. Please contact the administrator. |