MoonPalace - Moonshot AI's Kimi API Debugging Tool

MoonPalace (Moon Palace) is an API debugging tool provided by Moonshot AI. It has the following features:

Cross-platform support:
- Mac
- Windows
- Linux;
Easy to use, just replace base_url with http://localhost:9988 after launching to start debugging;
Captures complete requests, including the "scene of the accident" when network errors occur;
Quickly search and view request information using request_id and chatcmpl_id;
One-click export of BadCase structured reporting data, helping to enhance Kimi's model capabilities;

We recommend using MoonPalace as your API "supplier" during the code writing and debugging phase, so you can quickly identify and locate various issues related to API calls and code writing. For any unexpected outputs from Kimi large language model, you can also export the request details via MoonPalace and submit them to Moonshot AI to improve Kimi large language model.

Installation Methods

Using the `go` Command to Install

If you have the go toolchain installed, you can run the following command to install MoonPalace:

$ go install github.com/MoonshotAI/moonpalace@latest

The above command will install the compiled binary file in your $GOPATH/bin/ directory. Run the moonpalace command to check if it has been installed successfully:

$ moonpalace
MoonPalace is a command-line tool for debugging the Moonshot AI HTTP API.
 
Usage:
  moonpalace [command]
 
Available Commands:
  cleanup     Cleanup Moonshot AI requests.
  completion  Generate the autocompletion script for the specified shell
  export      export a Moonshot AI request.
  help        Help about any command
  inspect     Inspect the specific content of a Moonshot AI request.
  list        Query Moonshot AI requests based on conditions.
  start       Start the MoonPalace proxy server.
 
Flags:
  -h, --help      help for moonpalace
  -v, --version   version for moonpalace
 
Use "moonpalace [command] --help" for more information about a command.

If you still cannot find the moonpalace binary file, try adding the $GOPATH/bin/ directory to your $PATH environment variable.

Downloading from the Releases Page

You can download the precompiled binary (executable) files from the Releases (opens in a new tab) page:

moonpalace-linux
moonpalace-macos-amd64 => for Intel-based Macs
moonpalace-macos-arm64 => for Apple Silicon-based Macs
moonpalace-windows.exe

Download the binary (executable) file that matches your platform and place it in a directory that is included in your $PATH environment variable. Rename it to moonpalace and then grant it executable permissions.

Usage

Starting the Service

Use the following command to start the MoonPalace proxy server:

$ moonpalace start --port <PORT>

MoonPalace will start an HTTP server locally, with the --port parameter specifying the local port that MoonPalace will listen on. The default value is 9988. When MoonPalace starts successfully, it will output:

[MoonPalace] 2024/07/29 17:00:29 MoonPalace Starts {'=>'} change base_url to "http://127.0.0.1:9988/v1"

As instructed, replace base_url with the displayed address. If you are using the default port, set base_url=http://127.0.0.1:9988/v1. If you are using a custom port, replace base_url with the displayed address.

Additionally, if you want to always use a debugging api_key during debugging, you can use the --key parameter when starting MoonPalace to set a default api_key for MoonPalace. This way, you don't have to manually set the api_key in each request. MoonPalace will automatically add the api_key you set with --key when requesting the Kimi API.

If you have correctly set base_url and successfully called the Kimi API, MoonPalace will output the following information:

$ moonpalace start --port <PORT>
[MoonPalace] 2024/07/29 17:00:29 MoonPalace Starts {'=>'} change base_url to "http://127.0.0.1:9988/v1"
[MoonPalace] 2024/07/29 21:30:53 POST   /v1/chat/completions 200 OK
[MoonPalace] 2024/07/29 21:30:53   - Request Headers: 
[MoonPalace] 2024/07/29 21:30:53     - Content-Type:   application/json
[MoonPalace] 2024/07/29 21:30:53   - Response Headers: 
[MoonPalace] 2024/07/29 21:30:53     - Content-Type:   application/json
[MoonPalace] 2024/07/29 21:30:53     - Msh-Request-Id: c34f3421-4dae-11ef-b237-9620e33511ee
[MoonPalace] 2024/07/29 21:30:53     - Server-Timing:  7134
[MoonPalace] 2024/07/29 21:30:53     - Msh-Uid:        cn0psmmcp7fclnphkcpg
[MoonPalace] 2024/07/29 21:30:53     - Msh-Gid:        enterprise-tier-5
[MoonPalace] 2024/07/29 21:30:53   - Response: 
[MoonPalace] 2024/07/29 21:30:53     - id:                cmpl-12be8428ebe74a9e8466a37bee7a9b11
[MoonPalace] 2024/07/29 21:30:53     - prompt_tokens:     1449
[MoonPalace] 2024/07/29 21:30:53     - completion_tokens: 158
[MoonPalace] 2024/07/29 21:30:53     - total_tokens:      1607
[MoonPalace] 2024/07/29 21:30:53   New Row Inserted: last_insert_id=15

MoonPalace will output the details of the request in the form of logs in the command line (if you want to persist the log content, you can redirect stderr to a file).

Note: In the logs, the value of the Msh-Request-Id field in the Response Headers corresponds to the --requestid parameter in the Search Request and Export Request sections below. The id in the Response corresponds to the --chatcmpl parameter, and last_insert_id corresponds to the --id parameter.

[MoonPalace] 2024/08/05 19:06:19   it seems that your max_tokens value is too small, please set a larger value

If the current mode is non-streaming output (stream=False), MoonPalace will suggest an appropriate max_tokens value.

Enabling Repeated Content Output Detection

MoonPalace offers a feature to detect repeated content output from the Kimi large language model. Repeated content output refers to the model continuously outputting a specific word, sentence, or blank character without stopping before reaching the max_tokens limit. This can lead to additional Token costs when using more expensive models like moonshot-v1-128k. Therefore, MoonPalace provides the --detect-repeat option to enable repeated content output detection, as shown below:

$ moonpalace start --port <PORT> --detect-repeat --repeat-threshold 0.3 --repeat-min-length 20

After enabling the --detect-repeat option, MoonPalace will interrupt the output of the Kimi large language model and log the following message when it detects repeated content:

[MoonPalace] 2024/08/05 18:20:37   it appears that there is an issue with content repeating in the current response

Note: The --detect-repeat option only interrupts the output in streaming mode (stream=True). It does not apply to non-streaming output.

You can adjust MoonPalace's blocking behavior using the --repeat-threshold and --repeat-min-length parameters:

The --repeat-threshold parameter sets MoonPalace's tolerance for repeated content. A higher threshold means lower tolerance, and repeated content will be blocked more quickly. The range is 0 <= threshold <= 1.
The --repeat-min-length parameter sets the minimum number of characters before MoonPalace starts detecting repeated content. For example, --repeat-min-length=100 means that repeated content detection will only start when the output exceeds 100 UTF-8 characters.

Enabling Forced Streaming Output

MoonPalace provides the --force-stream option to force all /v1/chat/completions requests to use streaming output mode:

$ moonpalace start --port <PORT> --force-stream

MoonPalace will set the stream field in the request parameters to True. When receiving a response, it will automatically determine the response format based on whether the caller has set stream:

If the caller has set stream=True, the response will be returned in streaming format without any special handling by MoonPalace.
If the caller has not set stream or has set stream=False, MoonPalace will concatenate all the streaming data chunks into a complete completion structure and return it to the caller after receiving all the data chunks.

For the caller (developer), enabling the --force-stream option will not affect the Kimi API response content you receive. You can still use your original code logic to debug and run your program. In other words, enabling the --force-stream option will not change or break anything. You can safely enable this option.

Why provide this option?

We initially hypothesize that common network connection errors and timeouts (Connection Error/Timeout) occur because, in non-streaming request scenarios (stream=False), intermediate gateways or proxy servers may have set read_header_timeout or read_timeout. This can cause the gateway or proxy server to disconnect while the Kimi API server is still assembling the response (since no response, or even the response header, has been received), resulting in Connection Error/Timeout.

We added the --force-stream parameter to MoonPalace. When starting with moonpalace start --force-stream, MoonPalace converts all non-streaming requests (stream=False or unset) to streaming requests. After receiving all data chunks, it assembles them into a complete completion response structure and returns it to the caller.

For the caller, you can still use the non-streaming API as before. However, after MoonPalace's conversion, it can reduce Connection Error/Timeout issues to some extent because MoonPalace has already established a connection with the Kimi API server and started receiving streaming data chunks.

Retrieving Requests

After MoonPalace is started, all requests routed through MoonPalace are recorded in an sqlite database located at $HOME/.moonpalace/moonpalace.sqlite. You can directly connect to the MoonPalace database to query the specific content of the requests, or you can use the MoonPalace command-line tool to query the requests:

$ moonpalace list
+----+--------+-------------------------------------------+--------------------------------------+---------------+---------------------+
| id | status | chatcmpl                                  | request_id                           | server_timing | requested_at        |
+----+--------+-------------------------------------------+--------------------------------------+---------------+---------------------+
| 15 | 200    | cmpl-12be8428ebe74a9e8466a37bee7a9b11     | c34f3421-4dae-11ef-b237-9620e33511ee | 7134          | 2024-07-29 21:30:53 |
| 14 | 200    | cmpl-1bf43a688a2b48eda80042583ff6fe7f     | c13280e0-4dae-11ef-9c01-debcfc72949d | 3479          | 2024-07-29 21:30:46 |
| 13 | 200    | chatcmpl-2e1aa823e2c94ebdad66450a0e6df088 | c07c118e-4dae-11ef-b423-62db244b9277 | 1033          | 2024-07-29 21:30:43 |
| 12 | 200    | cmpl-e7f984b5f80149c3adae46096a6f15c2     | 50d5686c-4d98-11ef-ba65-3613954e2587 | 774           | 2024-07-29 18:50:06 |
| 11 | 200    | chatcmpl-08f7d482b8434a869b001821cf0ee0d9 | 4c20f0a4-4d98-11ef-999a-928b67d58fa8 | 593           | 2024-07-29 18:49:58 |
| 10 | 200    | chatcmpl-6f3cf14db8e044c6bfd19689f6f66eb4 | 49f30295-4d98-11ef-95d0-7a2774525b85 | 738           | 2024-07-29 18:49:55 |
| 9  | 200    | cmpl-2a70a8c9c40e4bcc9564a5296a520431     | 7bd58976-4d8a-11ef-999a-928b67d58fa8 | 40488         | 2024-07-29 17:11:45 |
| 8  | 200    | chatcmpl-59887f868fc247a9a8da13cfbb15d04f | ceb375ea-4d7d-11ef-bd64-3aeb95b9dfac | 867           | 2024-07-29 15:40:21 |
| 7  | 200    | cmpl-36e5e21b1f544a80bf9ce3f8fc1fce57     | cd7f48d6-4d7d-11ef-999a-928b67d58fa8 | 794           | 2024-07-29 15:40:19 |
| 6  | 200    | cmpl-737d27673327465fb4827e3797abb1b3     | cc6613ac-4d7d-11ef-95d0-7a2774525b85 | 670           | 2024-07-29 15:40:17 |
+----+--------+-------------------------------------------+--------------------------------------+---------------+---------------------+

Use the list command to view the content of the most recent requests. By default, it displays fields that are easy to search, such as id/chatcmpl/request_id, as well as status/server_timing/requested_at for checking the request status. If you want to view a specific request, you can use the inspect command to retrieve it:

# The following three commands will retrieve the same request information
$ moonpalace inspect --id 13
$ moonpalace inspect --chatcmpl chatcmpl-2e1aa823e2c94ebdad66450a0e6df088
$ moonpalace inspect --requestid c07c118e-4dae-11ef-b423-62db244b9277
+--------------------------------------------------------------+
| metadata                                                     |
+--------------------------------------------------------------+
| {                                                            |
|     "chatcmpl": "chatcmpl-2e1aa823e2c94ebdad66450a0e6df088", |
|     "content_type": "application/json",                      |
|     "group_id": "enterprise-tier-5",                         |
|     "moonpalace_id": "13",                                   |
|     "request_id": "c07c118e-4dae-11ef-b423-62db244b9277",    |
|     "requested_at": "2024-07-29 21:30:43",                   |
|     "server_timing": "1033",                                 |
|     "status": "200 OK",                                      |
|     "user_id": "cn0psmmcp7fclnphkcpg"                        |
| }                                                            |
+--------------------------------------------------------------+

By default, the inspect command does not print the body of the request and response. If you want to print the body, you can use the following command:

$ moonpalace inspect --chatcmpl chatcmpl-2e1aa823e2c94ebdad66450a0e6df088 --print request_body,response_body
# Since the body information is too lengthy, the detailed content of the body is not shown here
+--------------------------------------------------+--------------------------------------------------+
| request_body                                     | response_body                                    |
+--------------------------------------------------+--------------------------------------------------+
| ...                                              | ...                                              |
+--------------------------------------------------+--------------------------------------------------+

Exporting Requests

If you find that a request does not meet your expectations, or if you want to report a request to Moonshot AI (whether it's a Good Case or a Bad Case, we welcome both), you can use the export command to export a specific request:

# You only need to choose one of the id/chatcmpl/requestid options to retrieve the corresponding request
$ moonpalace export \
    --id 13 \
    --chatcmpl chatcmpl-2e1aa823e2c94ebdad66450a0e6df088 \
    --requestid c07c118e-4dae-11ef-b423-62db244b9277 \
    --good/--bad \
    --tag "code" --tag "python" \
    --directory $HOME/Downloads/

Here, the usage of id/chatcmpl/requestid is the same as in the inspect command, used to retrieve a specific request. The --good/--bad options are used to mark the request as a Good Case or a Bad Case. The --tag option is used to add relevant tags to the request. For example, in the example above, we assume that the request is related to the Python programming language, so we add two tags: code and python. The --directory option specifies the path to the directory where the exported file will be saved.

The content of the successfully exported file is:

$ cat $HOME/Downloads/chatcmpl-2e1aa823e2c94ebdad66450a0e6df088.json
{
    "metadata":
    {
        "chatcmpl": "chatcmpl-2e1aa823e2c94ebdad66450a0e6df088",
        "content_type": "application/json",
        "group_id": "enterprise-tier-5",
        "moonpalace_id": "13",
        "request_id": "c07c118e-4dae-11ef-b423-62db244b9277",
        "requested_at": "2024-07-29 21:30:43",
        "server_timing": "1033",
        "status": "200 OK",
        "user_id": "cn0psmmcp7fclnphkcpg"
    },
    "request":
    {
        "url": "https://api.moonshot.ai/v1/chat/completions",
        "header": "Accept: application/json\r\nAccept-Encoding: gzip\r\nConnection: keep-alive\r\nContent-Length: 2450\r\nContent-Type: application/json\r\nUser-Agent: OpenAI/Python 1.36.1\r\nX-Stainless-Arch: arm64\r\nX-Stainless-Async: false\r\nX-Stainless-Lang: python\r\nX-Stainless-Os: MacOS\r\nX-Stainless-Package-Version: 1.36.1\r\nX-Stainless-Runtime: CPython\r\nX-Stainless-Runtime-Version: 3.11.6\r\n",
        "body":
        {}
    },
    "response":
    {
        "status": "200 OK",
        "header": "Content-Encoding: gzip\r\nContent-Type: application/json; charset=utf-8\r\nDate: Mon, 29 Jul 2024 13:30:43 GMT\r\nMsh-Cache: updated\r\nMsh-Gid: enterprise-tier-5\r\nMsh-Request-Id: c07c118e-4dae-11ef-b423-62db244b9277\r\nMsh-Trace-Mode: on\r\nMsh-Uid: cn0psmmcp7fclnphkcpg\r\nServer: nginx\r\nServer-Timing: inner; dur=1033\r\nStrict-Transport-Security: max-age=15724800; includeSubDomains\r\nVary: Accept-Encoding\r\nVary: Origin\r\n",
        "body":
        {}
    },
    "category": "goodcase",
    "tags":
    [
        "code",
        "python"
    ]
}

We recommend that developers use Github Issues (opens in a new tab) to submit Good Cases or Bad Cases, but if you do not want to make your request information public, you can also submit the Case to us via enterprise WeChat, email, or other means.

You can send the exported file to the following email address:

[email protected]

Switch from OpenAI to Kimi API Conduct Multi-turn Chat with Kimi API