🎉 New kimi k2.5 Multi-modal Model released! Now supports multimodal understanding and processing.
Docs
API Reference
Chat

Basic Information

Public Service Address

https://api.moonshot.ai

Moonshot offers API services based on HTTP, and for most APIs, we are compatible with the OpenAI SDK.

Quickstart

Single-turn chat

The official OpenAI SDK supports Python (opens in a new tab) and Node.js (opens in a new tab). Below are examples of how to interact with the API using OpenAI SDK and Curl:

from openai import OpenAI
 
client = OpenAI(
    api_key = "$MOONSHOT_API_KEY",
    base_url = "https://api.moonshot.ai/v1",
)
 
completion = client.chat.completions.create(
    model = "kimi-k2.5",
    messages = [
        {"role": "system", "content": "You are Kimi, an AI assistant provided by Moonshot AI. You are proficient in Chinese and English conversations. You provide users with safe, helpful, and accurate answers. You will reject any questions involving terrorism, racism, or explicit content. Moonshot AI is a proper noun and should not be translated."},
        {"role": "user", "content": "Hello, my name is Li Lei. What is 1+1?"}
    ]
)
 
print(completion.choices[0].message.content)

Replace $MOONSHOT_API_KEY with the API Key you created on the platform.

When running the code in the documentation using the OpenAI SDK, ensure that your Python version is at least 3.7.1, your Node.js version is at least 18, and your OpenAI SDK version is no lower than 1.0.0.

pip install --upgrade 'openai>=1.0'

You can easily check the version of your library like this:

python -c 'import openai; print("version =",openai.__version__)'
# The output might be version = 1.10.0, indicating that the current python is using the v1.10.0 library of openai

Multi-turn chat

In the single-turn chat example above, the language model takes a list of user messages as input and returns the generated response as output. Sometimes, we can also use the model's output as part of the input to achieve multi-turn chat. Below is a simple example of implementing multi-turn chat:

from openai import OpenAI
 
client = OpenAI(
    api_key = "$MOONSHOT_API_KEY",
    base_url = "https://api.moonshot.ai/v1",
)
 
history = [
    {"role": "system", "content": "You are Kimi, an AI assistant provided by Moonshot AI. You are proficient in Chinese and English conversations. You provide users with safe, helpful, and accurate answers. You will reject any questions involving terrorism, racism, or explicit content. Moonshot AI is a proper noun and should not be translated."}
]
 
def chat(query, history):
    history.append({
        "role": "user", 
        "content": query
    })
    completion = client.chat.completions.create(
        model="kimi-k2.5",
        messages=history
    )
    result = completion.choices[0].message.content
    history.append({
        "role": "assistant",
        "content": result
    })
    return result
 
print(chat("What is the rotation period of the Earth?", history))
print(chat("What about the Moon?", history))

It is worth noting that as the chat progresses, the number of tokens the model needs to process will increase linearly. When necessary, some optimization strategies should be employed, such as retaining only the most recent few rounds of chat.

API Documentation

Chat Completion

Request URL

POST https://api.moonshot.ai/v1/chat/completions

Request

Example

{
    "model": "kimi-k2.5",
    "messages": [
        {
            "role": "system",
            "content": "You are Kimi, an AI assistant provided by Moonshot AI. You are proficient in Chinese and English conversations. You aim to provide users with safe, helpful, and accurate responses. You will refuse to answer any questions related to terrorism, racism, or explicit content. Moonshot AI is a proper noun and should not be translated into other languages."
        },
        { "role": "user", "content": "Hello, my name is Li Lei. What is 1+1?" }
    ]
}

Request body

FieldRequiredDescriptionTypeValues
messagesrequiredA list of messages that have been exchanged in the conversation so farList[Dict]This is a list of structured elements, each similar to: {"role": "user", "content": "Hello"} The role can only be one of system, user, assistant, and the content must not be empty. See Content Field Description for detailed information about the content field formats
modelrequiredModel ID, which can be obtained through List ModelsstringCurrently one of kimi-k2.5,kimi-k2-0905-preview, kimi-k2-0711-preview, kimi-k2-turbo-preview, kimi-k2-thinking-turbo, kimi-k2-thinking, moonshot-v1-8k,moonshot-v1-32k,moonshot-v1-128k, moonshot-v1-auto,moonshot-v1-8k-vision-preview,moonshot-v1-32k-vision-preview,moonshot-v1-128k-vision-preview
max_tokensoptionalDeprecated, please refer to max_completion_tokensint-
max_completion_tokensoptionalThe maximum number of tokens to generate for the chat completion. If the result reaches the maximum number of tokens without ending, the finish reason will be "length"; otherwise, it will be "stop"intIt is recommended to provide a reasonable value as needed. If not provided, we will use a good default integer like 1024. Note: This max_completion_tokens refers to the length of the tokens you expect us to return, not the total length of input plus output. For example, for a moonshot-v1-8k model, the maximum total length of input plus output is 8192. When the total length of the input messages is 4096, you can set this to a maximum of 4096; otherwise, our service will return an invalid input parameter (invalid_request_error) and refuse to respond. If you want to know the "exact number of input tokens," you can use the "Token Calculation" API below to get the count using our calculator
temperatureoptionalThe sampling temperature to use, ranging from 0 to 1. A higher value (e.g., 0.7) will make the output more random, while a lower value (e.g., 0.2) will make it more focused and deterministicfloatDefault is 0.0 for moonshot-v1 series models, 0.6 for kimi-k2 models and 1.0 for kimi-k2-thinking models. This parameter cannot be modified for the kimi-k2.5 model.
top_poptionalAnother sampling method, where the model considers the results of tokens with a cumulative probability mass of top_p. Thus, 0.1 means only considering the top 10% of tokens by probability mass. Generally, we suggest changing either this or the temperature, but not both at the same timefloatDefault is 1.0 for moonshot-v1 series and kimi-k2 models, 0.95 for kimi-k2.5 model. This parameter cannot be modified for the k2.5 model.
noptionalThe number of results to generate for each input messageintDefault is 1 for moonshot-v1 series and kimi-k2 models, and it must not exceed 5. Specifically, when the temperature is very close to 0, we can only return one result. If n is set and > 1 in this case, our service will return an invalid input parameter (invalid_request_error). Default is 1 for kimi-k2.5 model and it cannot be modified.
presence_penaltyoptionalPresence penalty, a number between -2.0 and 2.0. A positive value will penalize new tokens based on whether they appear in the text, increasing the likelihood of the model discussing new topicsfloatDefault is 0. This parameter cannot be modified for the kimi-k2.5 model.
frequency_penaltyoptionalFrequency penalty, a number between -2.0 and 2.0. A positive value will penalize new tokens based on their existing frequency in the text, reducing the likelihood of the model repeating the same phrases verbatimfloatDefault is 0. This parameter cannot be modified for the kimi-k2.5 model.
response_formatoptionalSetting this to {"type": "json_object"} enables JSON mode, ensuring that the generated information is valid JSON. When you set response_format to {"type": "json_object"}, you must explicitly guide the model to output JSON-formatted content in the prompt and specify the exact format of the JSON, otherwise it may result in unexpected outcomes.objectDefault is {"type": "text"}
stopoptionalStop words, which will halt the output when a full match is found. The matched words themselves will not be output. A maximum of 5 strings is allowed, and each string must not exceed 32 bytesString, List[String]Default is null
thinkingoptionalOnly available for kimi-k2.5 model. This parameter controls if the thinking is enabled for this requestobjectDefault to be {"type": "enabled"}. Value can only be one of {"type": "enabled"} or {"type": "disabled"}
streamoptionalWhether to return the response in a streaming fashionboolDefault is false, and true is an option
stream_options.include_usageoptionalIf set, an additional chunk will be streamed before the data: [DONE] message. The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array. All other chunks will also include a usage field, but with a null value. NOTE: If the stream is interrupted, you may not receive the final usage chunk which contains the total token usage for the requestboolDefault is false
prompt_cache_keyoptionalUsed to cache responses for similar requests to optimize cache hit ratesstringDefault is null. For Coding Agents, this is typically a session id or task id representing a single session; if the session is exited and later resumed, this value should remain the same. For Kimi Code Plan, this field is required to improve cache hit rates. For other agents involving multi-turn conversations, it is also recommended to implement this field
safety_identifieroptionalA stable identifier used to help detect users of your application that may be violating usage policies. The ID should be a string that uniquely identifies each user. It is recommended to hash the username or email address to avoid sending any identifying informationstringDefault is null

Content Field Description

The content field in the message can have different types of values:

  • plain text, just string
  • List[Dict] when you need to pass more complex information and each dict can have following fields:
    • type field is always necessary and is used to identify type of content. Its value should be one of text, image_url or video_url.
    • text field is necessary whentype is text. Its value should be plain text.
    • image_url field is necessary when type is image_url. Its value should be a dict indicating content of image like {"url": "data:image/png;base64,abc123xxxxx==}
    • video_url field is necessary when type is video_url. Its value should be a dict indicating content of videl like {"url": "data:video/mp4;base64,def456yyyyy==}

The following are all valid content field examples:

  • "Hello"
  • [{"type": "text", "text": "Hello"}]
  • [{"type": "image_url", "image_url": {"url": "data:image/png;base64,abc123xxxxx=="}}]
  • [{"type": "video_url", "video_url": {"url": "data:video/mp4;base64,def456yyyyy=="}}]
  • [{"type": "text", "text": "这是什么?"}, {"type": "image_url", "image_url": {"url": "data:image/png;base64,abc123xxxxx=="}}]

Note that url field of image_url and video_url can be base64 format or ms://<file_id>. Please refer to Use the Kimi Vision Model for detail.

Return

For non-streaming responses, the return format is similar to the following:

{
    "id": "cmpl-04ea926191a14749b7f2c7a48a68abc6",
    "object": "chat.completion",
    "created": 1698999496,
    "model": "kimi-k2.5",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello, Li Lei! 1+1 equals 2. If you have any other questions, feel free to ask!"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 19,
        "completion_tokens": 21,
        "total_tokens": 40,
        "cached_tokens": 10  # The number of tokens hit by the cache, only models that support automatic caching will return this field
    }
}

For streaming responses, the return format is similar to the following:

data: {"id":"cmpl-1305b94c570f447fbde3180560736287","object":"chat.completion.chunk","created":1698999575,"model":"kimi-k2.5","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
 
data: {"id":"cmpl-1305b94c570f447fbde3180560736287","object":"chat.completion.chunk","created":1698999575,"model":"kimi-k2.5","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
 
...
 
data: {"id":"cmpl-1305b94c570f447fbde3180560736287","object":"chat.completion.chunk","created":1698999575,"model":"kimi-k2.5","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}
 
data: {"id":"cmpl-1305b94c570f447fbde3180560736287","object":"chat.completion.chunk","created":1698999575,"model":"kimi-k2.5","choices":[{"index":0,"delta":{},"finish_reason":"stop","usage":{"prompt_tokens":19,"completion_tokens":13,"total_tokens":32}}]}
 
data: [DONE]

Example Request

For simple calls, refer to the previous example. For streaming calls, you can refer to the following code snippet:

from openai import OpenAI
 
client = OpenAI(
    api_key = "$MOONSHOT_API_KEY",
    base_url = "https://api.moonshot.ai/v1",
)
 
response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {
            "role": "system",
            "content": "You are Kimi, an AI assistant provided by Moonshot AI. You excel at conversing in Chinese and English. You provide users with safe, helpful, and accurate responses. You refuse to answer any questions related to terrorism, racism, or explicit content. Moonshot AI is a proper noun and should not be translated into other languages.",
        },
        {"role": "user", "content": "Hello, my name is Li Lei. What is 1+1?"},
    ],
    stream=True,
)
 
collected_messages = []
for idx, chunk in enumerate(response):
    # print("Chunk received, value: ", chunk)
    chunk_message = chunk.choices[0].delta
    if not chunk_message.content:
        continue
    collected_messages.append(chunk_message)  # save the message
    print(f"#{idx}: {''.join([m.content for m in collected_messages])}")
print(f"Full conversation received: {''.join([m.content for m in collected_messages])}")

Vision

Example

{
    "model": "kimi-k2.5",
    "messages":
    [
        {
            "role": "system",
            "content": "You are Kimi, an AI assistant provided by Moonshot AI. You are proficient in both Chinese and English conversations. You aim to provide users with safe, helpful, and accurate answers. You will refuse to answer any questions related to terrorism, racism, pornography, or violence. Moonshot AI is a proper noun and should not be translated into any other language."
        },
        {
            "role": "user",
            "content":
            [
                {
                    "type": "image_url",
                    "image_url":
                    {
                        "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAGAAAABhCAYAAAApxKSdAAAACXBIWXMAACE4AAAhOAFFljFgAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAUUSURBVHgB7Z29bhtHFIWPHQN2J7lKqnhYpYvpIukCbJEAKQJEegLReYFIT0DrCSI9QEDqCSIDaQIEIOukiJwyza5SJWlId3FFz+HuGmuSSw6p+dlZ3g84luhdUeI9M3fmziyXgBCUe/DHYY0Wj/tgWmjV42zFcWe4MIBBPNJ6qqW0uvAbXFvQgKzQK62bQhkaCIPc10q1Zi3XH1o/IG9cwUm0RogrgDY1KmLgHYX9DvyiBvDYI77XmiD+oLlQHw7hIDoCMBOt1U9w0BsU9mOAtaUUFk3oQoIfzAQFCf5dNMEdTFCQ4NtQih1NSIGgf3ibxOJt5UrAB1gNK72vIdjiI61HWr+YnNxDXK0rJiULsV65GJeiIescLSTTeobKSutiCuojX8kU3MBx4I3WeNVBBRl4fWiCyoB8v2JAAkk9PmDwT8sH1TEghRjgC27scCx41wO43KAg+ILxTvhNaUACwTc04Z0B30LwzTzm5Rjw3sgseIG1wGMawMBPIOQcqvzrNIMHOg9Q5KK953O90/rFC+BhJRH8PQZ+fu7SjC7HAIV95yu99vjlxfvBJx8nwHd6IfNJAkccOjHg6OgIs9lsra6vr2GTNE03/k7q8HAhyJ/2gM9O65/4kT7/mwEcoZwYsPQiV3BwcABb9Ho9KKU2njccDjGdLlxx+InBBPBAAR86ydRPaIC9SASi3+8bnXd+fr78nw8NJ39uDJjXAVFPP7dp/VmWLR9g6w6Huo/IOTk5MTpvZesn/93AiP/dXCwd9SyILT9Jko3n1bZ+8s8rGPGvoVHbEXcPMM39V1dX9Qd/19PPNxta959D4HUGF0RrAFs/8/8mxuPxXLUwtfx2WX+cxdivZ3DFA0SKldZPuPTAKrikbOlMOX+9zFu/Q2iAQoSY5H7mfeb/tXCT8MdneU9wNNCuQUXZA0ynnrUznyqOcrspUY4BJunHqPU3gOgMsNr6G0B0BpgUXrG0fhKVAaaF1/HxMWIhKgNMcj9Tz82Nk6rVGdav/tJ5eraJ0Wi01XPq1r/xOS8uLkJc6XYnRTMNXdf62eIvLy+jyftVghnQ7Xahe8FW59fBTRYOzosDNI1hJdz0lBQkBflkMBjMU5iL13pXRb8fYAJrB/a2db0oFHthAOEUliaYFHE+aaUBdZsvvFhApyM0idYZwOCvW4JmIWdSzPmidQaYrAGZ7iX4oFUGnJ2dGdUCTRqMozeANQCLsE6nA10JG/0Mx4KmDMbBCjEWR2yxu8LAM98vXelmCA2ovVLCI8EMYODWbpbvCXtTBzQVMSAwYkBgxIDAtNKAXWdGIRADAiMpKDA0IIMQikx6QGDEgMCIAYGRMSAsMgaEhgbcQgjFa+kBYZnIGBCWWzEgLPNBOJ6Fk/aR8Y5ZCvktKwX/PJZ7xoVjfs+4chYU11tK2sE85qUBLyH4Zh5z6QHhGPOf6r2j+TEbcgdFP2RaHX5TrYQlDflj5RXE5Q1cG/lWnhYpReUGKdUewGnRmhvnCJbgmxey8sHiZ8iwF3AsUBBckKHI/SWLq6HsBc8huML4DiK80D6WnBqLzN68UFCmopheYJOVYgcU5FOVbAVfYUcUZGoaLPglCtITdg2+tZUFBTFh2+ArWEYh/7z0WIIQSiM43lt5AWAmWhLHylN4QmkNEXfAbGqEQKsHSfHLYwiSq8AnaAAKeaW3D8VbijwNW5nh3IN9FPI/jnpaPKZi2/SfFuJu4W3x9RqWL+N5C+7ruKpBAgLkAAAAAElFTkSuQmCC"
                    }
                },
                {
                    "type": "text",
                    "text": "Please describe this image."
                }
            ]
        }
    ]
}

Image Content Field Description

When using the Vision model, the message.content field will change from str to List[Object[str, any]]. Each element in the List has the following fields:

Parameter NameRequiredDescriptionType
typerequiredSupports only text type (text) or image type (image_url)string
image_urlrequiredObject for transmitting the imageDict[str, any]

The fields for the image_url parameter are as follows:

Parameter NameRequiredDescriptionType
urlrequiredImage content encoded in base64 or identified by file idstring

Example Request

import os
import base64
 
from openai import OpenAI
 
client = OpenAI(
    api_key = os.environ.get("MOONSHOT_API_KEY"), 
    base_url = "https://api.moonshot.ai/v1",
)
 
# Encode the image in base64
with open("your_image_path", 'rb') as f:
    img_base = base64.b64encode(f.read()).decode('utf-8')
 
response = client.chat.completions.create(
    model="kimi-k2.5", 
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{img_base}"
                    }
                },
                {
                    "type": "text",
                    "text": "Please describe this image."
                }
            ]
        }
    ]
)
print(response.choices[0].message.content)

List Models

Request URL

GET https://api.moonshot.ai/v1/models

Example request

from openai import OpenAI
 
client = OpenAI(
    api_key = "$MOONSHOT_API_KEY",
    base_url = "https://api.moonshot.ai/v1",
)
 
model_list = client.models.list()
model_data = model_list.data
 
for i, model in enumerate(model_data):
    print(f"model[{i}]:", model.id)

Error Explanation

Here are some examples of error responses:

{
    "error": {
        "type": "content_filter",
        "message": "The request was rejected because it was considered high risk"
    }
}

Below are explanations for the main errors:

HTTP Status Codeerror typeerror messageDetailed Description
400content_filterThe request was rejected because it was considered high riskContent review rejection, your input or generated content may contain unsafe or sensitive information. Please avoid prompts that could generate sensitive content. Thank you.
400invalid_request_errorInvalid request: {error_details}Invalid request, usually due to incorrect request format or missing necessary parameters. Please check and retry.
400invalid_request_errorInput token length too longThe length of tokens in the request is too long. Do not exceed the model's maximum token limit.
400invalid_request_errorYour request exceeded model token limit : {max_model_length}The sum of the tokens in the request and the set max_tokens exceeds the model's specification length. Please check the request body's specifications or choose a model with an appropriate length.
400invalid_request_errorInvalid purpose: only 'file-extract' acceptedThe purpose (purpose) in the request is incorrect. Currently, only 'file-extract' is accepted. Please modify and retry.
400invalid_request_errorFile size is too large, max file size is 100MB, please confirm and re-upload the fileThe uploaded file size exceeds the limit. Please re-upload.
400invalid_request_errorFile size is zero, please confirm and re-upload the fileThe uploaded file size is 0. Please re-upload.
400invalid_request_errorThe number of files you have uploaded exceeded the max file count {max_file_count}, please delete previous uploaded filesThe total number of uploaded files exceeds the limit. Please delete unnecessary earlier files and re-upload.
401invalid_authentication_errorInvalid AuthenticationAuthentication failed. Please check if the apikey is correct and retry.
401incorrect_api_key_errorIncorrect API key providedAuthentication failed. Please check if the apikey is provided and correct, then retry.
429exceeded_current_quota_errorYour account {organization-id}<{ak-id}> is suspended, please check your plan and billing detailsAccount balance is insufficient. Please check your account balance.
403permission_denied_errorThe API you are accessing is not openThe API you are trying to access is not currently open.
403permission_denied_errorYou are not allowed to get other user infoAccessing other users' information is not permitted. Please check.
404resource_not_found_errorNot found the model {model-id} or Permission deniedThe model does not exist or you do not have permission to access it. Please check and retry.
429engine_overloaded_errorThe engine is currently overloaded, please try again laterThere are currently too many concurrent requests, and the node is rate-limited. Please retry later. It is recommended to upgrade your tier for a smoother experience.
429exceeded_current_quota_errorYou exceeded your current token quota: <{organization_id}> {token_credit}, please check your account balanceYour account balance is insufficient. Please check your account balance and ensure it can cover the cost of your token consumption before retrying.
429rate_limit_reached_errorYour account {organization-id}<{ak-id}> request reached organization max concurrency: {Concurrency}, please try again after {time} secondsYour request has reached the account's concurrency limit. Please wait for the specified time before retrying.
429rate_limit_reached_errorYour account {organization-id}<{ak-id}> request reached organization max RPM: {RPM}, please try again after {time} secondsYour request has reached the account's RPM rate limit. Please wait for the specified time before retrying.
429rate_limit_reached_errorYour account {organization-id}<{ak-id}> request reached organization TPM rate limit, current:{current_tpm}, limit:{max_tpm}Your request has reached the account's TPM rate limit. Please wait for the specified time before retrying.
429rate_limit_reached_errorYour account {organization-id}<{ak-id}> request reached organization TPD rate limit,current:{current_tpd}, limit:{max_tpd}Your request has reached the account's TPD rate limit. Please wait for the specified time before retrying.
500server_errorFailed to extract file: {error}Failed to parse the file. Please retry.
500unexpected_outputinvalid state transitionInternal error. Please contact the administrator.