Basic Information

Public Service Address

https://api.moonshot.ai

Moonshot offers API services based on HTTP, and for most APIs, we are compatible with the OpenAI SDK.

Quickstart

Single-turn chat

The official OpenAI SDK supports Python (opens in a new tab) and Node.js (opens in a new tab). Below are examples of how to interact with the API using OpenAI SDK and Curl:

from openai import OpenAI
 
client = OpenAI(
    api_key = "$MOONSHOT_API_KEY",
    base_url = "https://api.moonshot.ai/v1",
)
 
completion = client.chat.completions.create(
    model = "kimi-k2.5",
    messages = [
        {"role": "system", "content": "You are Kimi, an AI assistant provided by Moonshot AI. You are proficient in Chinese and English conversations. You provide users with safe, helpful, and accurate answers. You will reject any questions involving terrorism, racism, or explicit content. Moonshot AI is a proper noun and should not be translated."},
        {"role": "user", "content": "Hello, my name is Li Lei. What is 1+1?"}
    ]
)
 
print(completion.choices[0].message.content)

Replace $MOONSHOT_API_KEY with the API Key you created on the platform.

When running the code in the documentation using the OpenAI SDK, ensure that your Python version is at least 3.7.1, your Node.js version is at least 18, and your OpenAI SDK version is no lower than 1.0.0.

pip install --upgrade 'openai>=1.0'

You can easily check the version of your library like this:

python -c 'import openai; print("version =",openai.__version__)'
# The output might be version = 1.10.0, indicating that the current python is using the v1.10.0 library of openai

Multi-turn chat

In the single-turn chat example above, the language model takes a list of user messages as input and returns the generated response as output. Sometimes, we can also use the model's output as part of the input to achieve multi-turn chat. Below is a simple example of implementing multi-turn chat:

from openai import OpenAI
 
client = OpenAI(
    api_key = "$MOONSHOT_API_KEY",
    base_url = "https://api.moonshot.ai/v1",
)
 
history = [
    {"role": "system", "content": "You are Kimi, an AI assistant provided by Moonshot AI. You are proficient in Chinese and English conversations. You provide users with safe, helpful, and accurate answers. You will reject any questions involving terrorism, racism, or explicit content. Moonshot AI is a proper noun and should not be translated."}
]
 
def chat(query, history):
    history.append({
        "role": "user", 
        "content": query
    })
    completion = client.chat.completions.create(
        model="kimi-k2.5",
        messages=history
    )
    result = completion.choices[0].message.content
    history.append({
        "role": "assistant",
        "content": result
    })
    return result
 
print(chat("What is the rotation period of the Earth?", history))
print(chat("What about the Moon?", history))

It is worth noting that as the chat progresses, the number of tokens the model needs to process will increase linearly. When necessary, some optimization strategies should be employed, such as retaining only the most recent few rounds of chat.

API Documentation

Chat Completion

Request URL

POST https://api.moonshot.ai/v1/chat/completions

Request

Example

{
    "model": "kimi-k2.5",
    "messages": [
        {
            "role": "system",
            "content": "You are Kimi, an AI assistant provided by Moonshot AI. You are proficient in Chinese and English conversations. You aim to provide users with safe, helpful, and accurate responses. You will refuse to answer any questions related to terrorism, racism, or explicit content. Moonshot AI is a proper noun and should not be translated into other languages."
        },
        { "role": "user", "content": "Hello, my name is Li Lei. What is 1+1?" }
    ]
}

Request body

Field	Required	Description	Type	Values
messages	required	A list of messages that have been exchanged in the conversation so far	List[Dict]	This is a list of structured elements, each similar to: `{"role": "user", "content": "Hello"}` The role can only be one of `system`, `user`, `assistant`, and the content must not be empty. See Content Field Description for detailed information about the content field formats
model	required	Model ID, which can be obtained through List Models	string	Currently one of `kimi-k2.5`,`kimi-k2-0905-preview`, `kimi-k2-0711-preview`, `kimi-k2-turbo-preview`, `kimi-k2-thinking-turbo`, `kimi-k2-thinking`, `moonshot-v1-8k`,`moonshot-v1-32k`,`moonshot-v1-128k`, `moonshot-v1-auto`,`moonshot-v1-8k-vision-preview`,`moonshot-v1-32k-vision-preview`,`moonshot-v1-128k-vision-preview`
max_tokens	optional	Deprecated, please refer to max_completion_tokens	int	-
max_completion_tokens	optional	The maximum number of tokens to generate for the chat completion. If the result reaches the maximum number of tokens without ending, the finish reason will be "length"; otherwise, it will be "stop"	int	It is recommended to provide a reasonable value as needed. If not provided, we will use a good default integer like 1024. Note: This `max_completion_tokens` refers to the length of the tokens you expect us to return, not the total length of input plus output. For example, for a `moonshot-v1-8k` model, the maximum total length of input plus output is 8192. When the total length of the input messages is 4096, you can set this to a maximum of 4096; otherwise, our service will return an invalid input parameter (invalid_request_error) and refuse to respond. If you want to know the "exact number of input tokens," you can use the "Token Calculation" API below to get the count using our calculator
temperature	optional	The sampling temperature to use, ranging from 0 to 1. A higher value (e.g., 0.7) will make the output more random, while a lower value (e.g., 0.2) will make it more focused and deterministic	float	Default is 0.0 for `moonshot-v1` series models, 0.6 for `kimi-k2` models and 1.0 for `kimi-k2-thinking` models. This parameter cannot be modified for the `kimi-k2.5` model.
top_p	optional	Another sampling method, where the model considers the results of tokens with a cumulative probability mass of top_p. Thus, 0.1 means only considering the top 10% of tokens by probability mass. Generally, we suggest changing either this or the temperature, but not both at the same time	float	Default is 1.0 for `moonshot-v1` series and `kimi-k2` models, 0.95 for `kimi-k2.5` model. This parameter cannot be modified for the `k2.5` model.
n	optional	The number of results to generate for each input message	int	Default is 1 for `moonshot-v1` series and `kimi-k2` models, and it must not exceed 5. Specifically, when the temperature is very close to 0, we can only return one result. If n is set and > 1 in this case, our service will return an invalid input parameter (invalid_request_error). Default is 1 for `kimi-k2.5` model and it cannot be modified.
presence_penalty	optional	Presence penalty, a number between -2.0 and 2.0. A positive value will penalize new tokens based on whether they appear in the text, increasing the likelihood of the model discussing new topics	float	Default is 0. This parameter cannot be modified for the `kimi-k2.5` model.
frequency_penalty	optional	Frequency penalty, a number between -2.0 and 2.0. A positive value will penalize new tokens based on their existing frequency in the text, reducing the likelihood of the model repeating the same phrases verbatim	float	Default is 0. This parameter cannot be modified for the `kimi-k2.5` model.
response_format	optional	Setting this to `{"type": "json_object"}` enables JSON mode, ensuring that the generated information is valid JSON. When you set response_format to `{"type": "json_object"}`, you must explicitly guide the model to output JSON-formatted content in the prompt and specify the exact format of the JSON, otherwise it may result in unexpected outcomes.	object	Default is `{"type": "text"}`
stop	optional	Stop words, which will halt the output when a full match is found. The matched words themselves will not be output. A maximum of 5 strings is allowed, and each string must not exceed 32 bytes	String, List[String]	Default is null
thinking	optional	Only available for `kimi-k2.5` model. This parameter controls if the thinking is enabled for this request	object	Default to be `{"type": "enabled"}`. Value can only be one of `{"type": "enabled"}` or `{"type": "disabled"}`
stream	optional	Whether to return the response in a streaming fashion	bool	Default is false, and true is an option
stream_options.include_usage	optional	If set, an additional chunk will be streamed before the `data: [DONE]` message. The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array. All other chunks will also include a usage field, but with a null value. NOTE: If the stream is interrupted, you may not receive the final usage chunk which contains the total token usage for the request	bool	Default is false
prompt_cache_key	optional	Used to cache responses for similar requests to optimize cache hit rates	string	Default is null. For Coding Agents, this is typically a session id or task id representing a single session; if the session is exited and later resumed, this value should remain the same. For Kimi Code Plan, this field is required to improve cache hit rates. For other agents involving multi-turn conversations, it is also recommended to implement this field
safety_identifier	optional	A stable identifier used to help detect users of your application that may be violating usage policies. The ID should be a string that uniquely identifies each user. It is recommended to hash the username or email address to avoid sending any identifying information	string	Default is null

Content Field Description

The content field in the message can have different types of values:

plain text, just string
List[Dict] when you need to pass more complex information and each dict can have following fields:
- type field is always necessary and is used to identify type of content. Its value should be one of text, image_url or video_url.
- text field is necessary whentype is text. Its value should be plain text.
- image_url field is necessary when type is image_url. Its value should be a dict indicating content of image like {"url": "data:image/png;base64,abc123xxxxx==}
- video_url field is necessary when type is video_url. Its value should be a dict indicating content of videl like {"url": "data:video/mp4;base64,def456yyyyy==}

The following are all valid content field examples:

"Hello"
[{"type": "text", "text": "Hello"}]
[{"type": "image_url", "image_url": {"url": "data:image/png;base64,abc123xxxxx=="}}]
[{"type": "video_url", "video_url": {"url": "data:video/mp4;base64,def456yyyyy=="}}]
[{"type": "text", "text": "这是什么？"}, {"type": "image_url", "image_url": {"url": "data:image/png;base64,abc123xxxxx=="}}]

Note that url field of image_url and video_url can be base64 format or ms://<file_id>. Please refer to Use the Kimi Vision Model for detail.

Return

For non-streaming responses, the return format is similar to the following:

{
    "id": "cmpl-04ea926191a14749b7f2c7a48a68abc6",
    "object": "chat.completion",
    "created": 1698999496,
    "model": "kimi-k2.5",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello, Li Lei! 1+1 equals 2. If you have any other questions, feel free to ask!"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 19,
        "completion_tokens": 21,
        "total_tokens": 40,
        "cached_tokens": 10  # The number of tokens hit by the cache, only models that support automatic caching will return this field
    }
}

For streaming responses, the return format is similar to the following:

data: {"id":"cmpl-1305b94c570f447fbde3180560736287","object":"chat.completion.chunk","created":1698999575,"model":"kimi-k2.5","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
 
data: {"id":"cmpl-1305b94c570f447fbde3180560736287","object":"chat.completion.chunk","created":1698999575,"model":"kimi-k2.5","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
 
...
 
data: {"id":"cmpl-1305b94c570f447fbde3180560736287","object":"chat.completion.chunk","created":1698999575,"model":"kimi-k2.5","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}
 
data: {"id":"cmpl-1305b94c570f447fbde3180560736287","object":"chat.completion.chunk","created":1698999575,"model":"kimi-k2.5","choices":[{"index":0,"delta":{},"finish_reason":"stop","usage":{"prompt_tokens":19,"completion_tokens":13,"total_tokens":32}}]}
 
data: [DONE]

Example Request

For simple calls, refer to the previous example. For streaming calls, you can refer to the following code snippet:

from openai import OpenAI
 
client = OpenAI(
    api_key = "$MOONSHOT_API_KEY",
    base_url = "https://api.moonshot.ai/v1",
)
 
response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {
            "role": "system",
            "content": "You are Kimi, an AI assistant provided by Moonshot AI. You excel at conversing in Chinese and English. You provide users with safe, helpful, and accurate responses. You refuse to answer any questions related to terrorism, racism, or explicit content. Moonshot AI is a proper noun and should not be translated into other languages.",
        },
        {"role": "user", "content": "Hello, my name is Li Lei. What is 1+1?"},
    ],
    stream=True,
)
 
collected_messages = []
for idx, chunk in enumerate(response):
    # print("Chunk received, value: ", chunk)
    chunk_message = chunk.choices[0].delta
    if not chunk_message.content:
        continue
    collected_messages.append(chunk_message)  # save the message
    print(f"#{idx}: {''.join([m.content for m in collected_messages])}")
print(f"Full conversation received: {''.join([m.content for m in collected_messages])}")

Vision

Example

{
    "model": "kimi-k2.5",
    "messages":
    [
        {
            "role": "system",
            "content": "You are Kimi, an AI assistant provided by Moonshot AI. You are proficient in both Chinese and English conversations. You aim to provide users with safe, helpful, and accurate answers. You will refuse to answer any questions related to terrorism, racism, pornography, or violence. Moonshot AI is a proper noun and should not be translated into any other language."
        },
        {
            "role": "user",
            "content":
            [
                {
                    "type": "image_url",
                    "image_url":
                    {
                        "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAGAAAABhCAYAAAApxKSdAAAACXBIWXMAACE4AAAhOAFFljFgAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAUUSURBVHgB7Z29bhtHFIWPHQN2J7lKqnhYpYvpIukCbJEAKQJEegLReYFIT0DrCSI9QEDqCSIDaQIEIOukiJwyza5SJWlId3FFz+HuGmuSSw6p+dlZ3g84luhdUeI9M3fmziyXgBCUe/DHYY0Wj/tgWmjV42zFcWe4MIBBPNJ6qqW0uvAbXFvQgKzQK62bQhkaCIPc10q1Zi3XH1o/IG9cwUm0RogrgDY1KmLgHYX9DvyiBvDYI77XmiD+oLlQHw7hIDoCMBOt1U9w0BsU9mOAtaUUFk3oQoIfzAQFCf5dNMEdTFCQ4NtQih1NSIGgf3ibxOJt5UrAB1gNK72vIdjiI61HWr+YnNxDXK0rJiULsV65GJeiIescLSTTeobKSutiCuojX8kU3MBx4I3WeNVBBRl4fWiCyoB8v2JAAkk9PmDwT8sH1TEghRjgC27scCx41wO43KAg+ILxTvhNaUACwTc04Z0B30LwzTzm5Rjw3sgseIG1wGMawMBPIOQcqvzrNIMHOg9Q5KK953O90/rFC+BhJRH8PQZ+fu7SjC7HAIV95yu99vjlxfvBJx8nwHd6IfNJAkccOjHg6OgIs9lsra6vr2GTNE03/k7q8HAhyJ/2gM9O65/4kT7/mwEcoZwYsPQiV3BwcABb9Ho9KKU2njccDjGdLlxx+InBBPBAAR86ydRPaIC9SASi3+8bnXd+fr78nw8NJ39uDJjXAVFPP7dp/VmWLR9g6w6Huo/IOTk5MTpvZesn/93AiP/dXCwd9SyILT9Jko3n1bZ+8s8rGPGvoVHbEXcPMM39V1dX9Qd/19PPNxta959D4HUGF0RrAFs/8/8mxuPxXLUwtfx2WX+cxdivZ3DFA0SKldZPuPTAKrikbOlMOX+9zFu/Q2iAQoSY5H7mfeb/tXCT8MdneU9wNNCuQUXZA0ynnrUznyqOcrspUY4BJunHqPU3gOgMsNr6G0B0BpgUXrG0fhKVAaaF1/HxMWIhKgNMcj9Tz82Nk6rVGdav/tJ5eraJ0Wi01XPq1r/xOS8uLkJc6XYnRTMNXdf62eIvLy+jyftVghnQ7Xahe8FW59fBTRYOzosDNI1hJdz0lBQkBflkMBjMU5iL13pXRb8fYAJrB/a2db0oFHthAOEUliaYFHE+aaUBdZsvvFhApyM0idYZwOCvW4JmIWdSzPmidQaYrAGZ7iX4oFUGnJ2dGdUCTRqMozeANQCLsE6nA10JG/0Mx4KmDMbBCjEWR2yxu8LAM98vXelmCA2ovVLCI8EMYODWbpbvCXtTBzQVMSAwYkBgxIDAtNKAXWdGIRADAiMpKDA0IIMQikx6QGDEgMCIAYGRMSAsMgaEhgbcQgjFa+kBYZnIGBCWWzEgLPNBOJ6Fk/aR8Y5ZCvktKwX/PJZ7xoVjfs+4chYU11tK2sE85qUBLyH4Zh5z6QHhGPOf6r2j+TEbcgdFP2RaHX5TrYQlDflj5RXE5Q1cG/lWnhYpReUGKdUewGnRmhvnCJbgmxey8sHiZ8iwF3AsUBBckKHI/SWLq6HsBc8huML4DiK80D6WnBqLzN68UFCmopheYJOVYgcU5FOVbAVfYUcUZGoaLPglCtITdg2+tZUFBTFh2+ArWEYh/7z0WIIQSiM43lt5AWAmWhLHylN4QmkNEXfAbGqEQKsHSfHLYwiSq8AnaAAKeaW3D8VbijwNW5nh3IN9FPI/jnpaPKZi2/SfFuJu4W3x9RqWL+N5C+7ruKpBAgLkAAAAAElFTkSuQmCC"
                    }
                },
                {
                    "type": "text",
                    "text": "Please describe this image."
                }
            ]
        }
    ]
}

Image Content Field Description

When using the Vision model, the message.content field will change from str to List[Object[str, any]]. Each element in the List has the following fields:

Parameter Name	Required	Description	Type
type	required	Supports only text type (text) or image type (image_url)	string
image_url	required	Object for transmitting the image	Dict[str, any]

The fields for the image_url parameter are as follows:

Parameter Name	Required	Description	Type
url	required	Image content encoded in base64 or identified by file id	string

Example Request

import os
import base64
 
from openai import OpenAI
 
client = OpenAI(
    api_key = os.environ.get("MOONSHOT_API_KEY"), 
    base_url = "https://api.moonshot.ai/v1",
)
 
# Encode the image in base64
with open("your_image_path", 'rb') as f:
    img_base = base64.b64encode(f.read()).decode('utf-8')
 
response = client.chat.completions.create(
    model="kimi-k2.5", 
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{img_base}"
                    }
                },
                {
                    "type": "text",
                    "text": "Please describe this image."
                }
            ]
        }
    ]
)
print(response.choices[0].message.content)

List Models

Request URL

GET https://api.moonshot.ai/v1/models

Example request

from openai import OpenAI
 
client = OpenAI(
    api_key = "$MOONSHOT_API_KEY",
    base_url = "https://api.moonshot.ai/v1",
)
 
model_list = client.models.list()
model_data = model_list.data
 
for i, model in enumerate(model_data):
    print(f"model[{i}]:", model.id)

Error Explanation

Here are some examples of error responses:

{
    "error": {
        "type": "content_filter",
        "message": "The request was rejected because it was considered high risk"
    }
}

Below are explanations for the main errors:

HTTP Status Code	error type	error message	Detailed Description
400	content_filter	The request was rejected because it was considered high risk	Content review rejection, your input or generated content may contain unsafe or sensitive information. Please avoid prompts that could generate sensitive content. Thank you.
400	invalid_request_error	Invalid request: {error_details}	Invalid request, usually due to incorrect request format or missing necessary parameters. Please check and retry.
400	invalid_request_error	Input token length too long	The length of tokens in the request is too long. Do not exceed the model's maximum token limit.
400	invalid_request_error	Your request exceeded model token limit : {max_model_length}	The sum of the tokens in the request and the set max_tokens exceeds the model's specification length. Please check the request body's specifications or choose a model with an appropriate length.
400	invalid_request_error	Invalid purpose: only 'file-extract' accepted	The purpose (purpose) in the request is incorrect. Currently, only 'file-extract' is accepted. Please modify and retry.
400	invalid_request_error	File size is too large, max file size is 100MB, please confirm and re-upload the file	The uploaded file size exceeds the limit. Please re-upload.
400	invalid_request_error	File size is zero, please confirm and re-upload the file	The uploaded file size is 0. Please re-upload.
400	invalid_request_error	The number of files you have uploaded exceeded the max file count {max_file_count}, please delete previous uploaded files	The total number of uploaded files exceeds the limit. Please delete unnecessary earlier files and re-upload.
401	invalid_authentication_error	Invalid Authentication	Authentication failed. Please check if the apikey is correct and retry.
401	incorrect_api_key_error	Incorrect API key provided	Authentication failed. Please check if the apikey is provided and correct, then retry.
429	exceeded_current_quota_error	Your account {organization-id}<{ak-id}> is suspended, please check your plan and billing details	Account balance is insufficient. Please check your account balance.
403	permission_denied_error	The API you are accessing is not open	The API you are trying to access is not currently open.
403	permission_denied_error	You are not allowed to get other user info	Accessing other users' information is not permitted. Please check.
404	resource_not_found_error	Not found the model {model-id} or Permission denied	The model does not exist or you do not have permission to access it. Please check and retry.
429	engine_overloaded_error	The engine is currently overloaded, please try again later	There are currently too many concurrent requests, and the node is rate-limited. Please retry later. It is recommended to upgrade your tier for a smoother experience.
429	exceeded_current_quota_error	You exceeded your current token quota: <{organization_id}> {token_credit}, please check your account balance	Your account balance is insufficient. Please check your account balance and ensure it can cover the cost of your token consumption before retrying.
429	rate_limit_reached_error	Your account {organization-id}<{ak-id}> request reached organization max concurrency: {Concurrency}, please try again after {time} seconds	Your request has reached the account's concurrency limit. Please wait for the specified time before retrying.
429	rate_limit_reached_error	Your account {organization-id}<{ak-id}> request reached organization max RPM: {RPM}, please try again after {time} seconds	Your request has reached the account's RPM rate limit. Please wait for the specified time before retrying.
429	rate_limit_reached_error	Your account {organization-id}<{ak-id}> request reached organization TPM rate limit, current:{current_tpm}, limit:{max_tpm}	Your request has reached the account's TPM rate limit. Please wait for the specified time before retrying.
429	rate_limit_reached_error	Your account {organization-id}<{ak-id}> request reached organization TPD rate limit,current:{current_tpd}, limit:{max_tpd}	Your request has reached the account's TPD rate limit. Please wait for the specified time before retrying.
500	server_error	Failed to extract file: {error}	Failed to parse the file. Please retry.
500	unexpected_output	invalid state transition	Internal error. Please contact the administrator.

User Manual Tool Use