K2-Vendor-Verifier Now Includes "Kimi K2 Thinking"

Kimi Open Platform,Posted on 2025-11-18•2 min read

We saw a clear problem: Kimi K2 performance is inconsistent across API vendors, but there's no easy way to check.

The Solution: K2 Vendor Verifier

What it is: An open-source benchmark for Kimi K2 API transparency.

Why we built it: To help you choose providers based on performance, not just cost and speed.

What's new: The big news—we've added the Kimi K2 Thinking model to the benchmark. We just pushed our latest eval results, focusing on ToolCall.

ToolCall Results

Latest Results

To see the full results for K2-0905 and get a breakdown of what each metric means, head here (opens in a new tab).

Independent Evaluation Confirms Performance

We're seeing independent evals, like this recent one from Andon Labs, confirm that the official Kimi API (opens in a new tab) delivers a major performance boost for ToolCall.

Andon Labs Evaluation

Benchmarking K2 Thinking on Other Tasks

We've focused on ToolCall here, but if you're planning to test the K2 Thinking model on general-purpose evals (like HLE, GPQA-Diamond, etc.), we have a guide for that (opens in a new tab).

To get accurate results, you'll need to use the correct setup. We've put together the recommended config and best practices to help you get it right (opens in a new tab).

Get Started

GitHub Repository: K2-Vendor-Verifier (opens in a new tab)
Official API: platform.moonshot.ai (opens in a new tab)
Benchmarking Guide: Best Practices (opens in a new tab)

Community & Support

Email: [email protected]
Forum: https://forum.moonshot.ai/ (opens in a new tab)
Discord: https://discord.com/invite/TYU2fdJykW (opens in a new tab)

This newsletter introduces the K2 Vendor Verifier, an open-source benchmarking tool that now includes Kimi K2 Thinking model evaluations, helping developers make informed decisions based on performance metrics rather than just cost and speed.