Use Kimi API's Internet Search Functionality
In the previous chapter (Using Kimi API to Complete Tool Calls), we explained in detail how to use the tool_calls feature of the Kimi API to enable the Kimi large language model to perform internet searches. Let's review the process we implemented:
- We defined tools using the JSON Schema format. For internet searches, we defined two tools:
searchandcrawl. - We submitted the defined
searchandcrawltools to the Kimi large language model via thetoolsparameter. - The Kimi large language model would select to call
searchandcrawlbased on the context of the current conversation, generate the relevant parameters, and output them in JSON format. - We used the parameters output by the Kimi large language model to execute the
searchandcrawlfunctions and submitted the results of these functions back to the Kimi large language model. - The Kimi large language model would then provide a response to the user based on the results of the tool executions.
In the process of implementing internet searches, we needed to implement the search and crawl functions ourselves, which might include:
- Calling search engine APIs or implementing our own content search.
- Retrieving search results, including URLs and summaries.
- Fetching web page content based on URLs, which might require different reading rules for different websites.
- Cleaning and organizing the fetched web page content into a format that the model can easily recognize, such as Markdown.
- Handling various errors and exceptions, such as no search results or failure to fetch web page content.
Implementing these steps is often considered cumbersome and challenging. Our users have repeatedly requested a simple, ready-to-use "internet search" function. Therefore, based on the original tool_calls usage of the Kimi large language model, we have provided a built-in tool function builtin_function.$web_search to enable internet search functionality.
The basic usage and process of the $web_search function are the same as the usual tool_calls, but there are still some minor differences. We will explain in detail through examples how to call the built-in $web_search function of Kimi to enable internet search functionality and mark the items that need extra attention in the code and explanations.
$web_search Declaration
Unlike ordinary tool, the $web_search function does not require specific parameter descriptions. It only needs the type and function.name in the tools declaration to successfully register the $web_search function:
tools = [
{
"type": "builtin_function", # <-- We use builtin_function to indicate Kimi built-in tools, which also distinguishes it from ordinary function
"function": {
"name": "$web_search",
},
},
]The $web_search function is prefixed with a dollar sign $, which is our agreed way to indicate Kimi built-in functions (in ordinary function definitions, the dollar sign $ is not allowed), and if there are other Kimi built-in functions in the future, they will also be prefixed with the dollar sign $.
When using $web_search function, you must disable the thinking ability of the model.
When declaring tools, $web_search can coexist with other ordinary function. Furthermore, builtin_function can coexist with ordinary function. You can add both builtin_function and ordinary function to tools, or add both builtin_function and ordinary function at the same time.
Next, let's modify the original tool_calls code to explain how to execute tool_calls.
$web_search Execution
Here is the modified tool_calls code:
from typing import *
import os
import json
from openai import OpenAI
from openai.types.chat.chat_completion import Choice
client = OpenAI(
base_url="https://api.moonshot.ai/v1",
api_key=os.environ.get("MOONSHOT_API_KEY"),
)
# Specific implementation of the search tool, here we only need to return the arguments
def search_impl(arguments: Dict[str, Any]) -> Any:
"""
When using the search tool provided by Moonshot AI, you only need to return the arguments as-is,
no additional processing logic is needed.
But if you want to use other models and retain the web search functionality, you only need to modify the implementation here (e.g., calling search
and getting web content, etc.), the function signature remains unchanged and still works.
This maximizes compatibility, allowing you to switch between different models without needing destructive changes to the code.
"""
return arguments
def chat(messages) -> Choice:
completion = client.chat.completions.create(
model="kimi-k2.5",
messages=messages,
max_tokens=32768,
extra_body={
"thinking": {"type": "disabled"}
}, # Through the extra_body parameter, pass additional request body to disable thinking capability
tools=[
{
"type": "builtin_function", # <-- Use builtin_function to declare the $web_search function, please include the complete tools declaration in every request
"function": {
"name": "$web_search",
},
}
]
)
return completion.choices[0]
def main():
messages = [
{"role": "system", "content": "You are Kimi."},
]
# Initial question
messages.append({
"role": "user",
"content": "Please search for Moonshot AI Context Caching technology and tell me what it is."
})
finish_reason = None
while finish_reason is None or finish_reason == "tool_calls":
choice = chat(messages)
finish_reason = choice.finish_reason
if finish_reason == "tool_calls": # <-- Determine whether the current returned content contains tool_calls
messages.append(choice.message) # <-- We also add the assistant message returned by the Kimi model to the context, so that the Kimi model can understand our request in the next request
for tool_call in choice.message.tool_calls: # <-- tool_calls may be multiple, so we use a loop to execute them one by one
tool_call_name = tool_call.function.name
tool_call_arguments = json.loads(tool_call.function.arguments) # <-- arguments is a serialized JSON Object, we need to use json.loads to deserialize it
if tool_call_name == "$web_search":
tool_result = search_impl(tool_call_arguments)
else:
tool_result = f"Error: unable to find tool by name '{tool_call_name}'"
# Construct a role=tool message using the function execution result to show the model the tool call result;
# Note that we need to provide tool_call_id and name fields in the message so that the Kimi model
# can correctly match the corresponding tool_call.
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"name": tool_call_name,
"content": json.dumps(tool_result), # <-- We agree to submit tool call results to the Kimi model in string format, so here we use json.dumps to serialize the execution result into a string
})
print(choice.message.content) # <-- Here, we return the reply generated by the model to the user
if __name__ == '__main__':
main()Looking back at the code above, we are surprised to find that when using the $web_search function, its basic process is no different from that of a regular function. Developers don't even need to modify the original code for executing tool calls. The part that is different and particularly noteworthy is that when we implement the search_impl function, we don't include much logic for searching, parsing, or obtaining web content. We simply return the parameters generated by Kimi large language model, tool_call.function.arguments, as they are to complete the tool call. Why is that?
In fact, as the name builtin_function suggests, $web_search is a built-in function of Kimi large language model. It is defined and executed by Kimi large language model. The process is as follows:
- When Kimi large language model generates a response with
finish_reason=tool_calls, it means that Kimi large language model has realized that it needs to execute the$web_searchfunction and has already prepared everything for it; - Kimi large language model will return the necessary parameters for executing the function in the form of
tool_call.function.arguments. However, these parameters are not executed by the caller. The caller just needs to submittool_call.function.argumentsto Kimi large language model as they are, and Kimi large language model will execute the corresponding online search process; - When the user submits
tool_call.function.argumentsusing amessagewithrole=tool, Kimi large language model will immediately start the online search process and generate a readable message for the user based on the search and reading results, which is amessagewithfinish_reason=stop;
Compatibility Note
The online search function provided by the Kimi API aims to offer a reliable large language model online search solution without breaking the compatibility of the original API and SDK. It is fully compatible with the original tool call feature of Kimi large language model. This means that: if you want to switch from Kimi's online search function to your own implementation, you can do so in just two simple steps without disrupting the overall structure of your code:
- Modify the
tooldefinition of$web_searchto your own implementation (includingname,description, etc.). You may need to add additional information intool.functionto inform the model of the specific parameters it needs to generate. You can add any parameters you need in theparametersfield; - Change the implementation of the
search_implfunction. When using Kimi's$web_search, you just need to return the input parametersargumentsas they are. However, if you use your own online search service, you may need to fully implement thesearchandcrawlfunctions mentioned at the beginning of the article;
After completing the above steps, you will have successfully migrated from Kimi's online search function to your own implementation.
About Token Consumption
When using the $web_search function provided by Kimi, the search results are also counted towards the tokens occupied by the prompt (i.e., prompt_tokens). Typically, since the results of web searches contain a lot of content, the token consumption can be quite high. To avoid unknowingly using up a large number of tokens, we add an extra field called total_tokens when generating the arguments for the $web_search function. This field informs the caller of the total number of tokens occupied by the search content, which will be included in the prompt_tokens once the entire web search process is completed. We will use specific code to demonstrate how to obtain these token consumptions:
from typing import *
import os
import json
from openai import OpenAI
from openai.types.chat.chat_completion import Choice
client = OpenAI(
base_url="https://api.moonshot.ai/v1",
api_key=os.environ.get("MOONSHOT_API_KEY"),
)
# Specific implementation of search tool, here we only need to return parameters
def search_impl(arguments: Dict[str, Any]) -> Any:
"""
When using search tool provided by Moonshot AI, you only need to return arguments as is,
without additional processing logic.
But if you want to use other models and retain web search function, you only need to modify implementation here (such as calling search
and getting web content, etc.), function signature remains same, still work.
This maximizes compatibility, allowing you to switch between different models without breaking changes to code.
"""
return arguments
def chat(messages) -> Choice:
completion = client.chat.completions.create(
model="kimi-k2.5",
messages=messages,
max_tokens=32768,
tools=[
{
"type": "builtin_function",
"function": {
"name": "$web_search",
},
}
]
)
usage = completion.usage
choice = completion.choices[0]
# =========================================================================
# By judging finish_reason = stop, we print out the tokens consumed after completing the web search process
if choice.finish_reason == "stop":
print(f"chat_prompt_tokens: {usage.prompt_tokens}")
print(f"chat_completion_tokens: {usage.completion_tokens}")
print(f"chat_total_tokens: {usage.total_tokens}")
# =========================================================================
return choice
def main():
messages = [
{"role": "system", "content": "You are Kimi."},
]
# Initial question
messages.append({
"role": "user",
"content": "Please search for Moonshot AI Context Caching technology and tell me what it is."
})
finish_reason = None
while finish_reason is None or finish_reason == "tool_calls":
choice = chat(messages)
finish_reason = choice.finish_reason
if finish_reason == "tool_calls":
# Handle reasoning_content field
# Manually construct assistant message to ensure it includes reasoning_content field
message_dict = {
"role": "assistant",
"content": choice.message.content,
"tool_calls": choice.message.tool_calls,
"reasoning_content": getattr(choice.message, "reasoning_content", "I need to call search tool to get information.")
}
messages.append(message_dict)
for tool_call in choice.message.tool_calls:
tool_call_name = tool_call.function.name
tool_call_arguments = json.loads(
tool_call.function.arguments)
if tool_call_name == "$web_search":
# ===================================================================
# We print out the tokens generated by web search results during the web search process
search_content_total_tokens = tool_call_arguments.get("usage", {}).get("total_tokens")
print(f"search_content_total_tokens: {search_content_total_tokens}")
# ===================================================================
tool_result = search_impl(tool_call_arguments)
else:
tool_result = f"Error: unable to find tool by name '{tool_call_name}'"
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"name": tool_call_name,
"content": json.dumps(tool_result),
})
print(choice.message.content)
if __name__ == '__main__':
main()
Running the above code yields the following output:
search_content_total_tokens: 13046 # <-- This represents the number of tokens occupied by the web search results due to the web search action.
chat_prompt_tokens: 13212 # <-- This represents the number of input tokens, including the web search results.
chat_completion_tokens: 295 # <-- This represents the number of tokens generated by the Kimi large language model based on the web search results.
chat_total_tokens: 13507 # <-- This represents the total number of tokens consumed, including the web search process.
# The content generated by the Kimi large language model based on the web search results is omitted here.About Model Size Selection
Another issue that arises is that when the web search function is enabled, the number of tokens can change significantly, exceeding the context window of the originally used model. This may trigger an Input token length too long error message. Therefore, when using the web search function, we recommend using the dynamic model kimi-k2.5 to adapt to changes in token counts. We slightly modify the chat function to use the kimi-k2.5 model:
def chat(messages) -> Choice:
completion = client.chat.completions.create(
model="kimi-k2.5",
messages=messages,
tools=[
{
"type": "builtin_function", # <-- Use builtin_function to declare the $web_search function. Please include the full tools declaration in each request.
"function": {
"name": "$web_search",
},
}
]
)
return completion.choices[0]About Other Tools
The $web_search tool can be used in combination with other regular tools. You can freely mix tools with type=builtin_function and type=function.
About Web Search Billing
In addition to token consumption, we also charge a call fee for each web search, priced at $0.005. For more details, please refer to Pricing.