Kimi Founder's AMA Recap: First Look at Vision Plans & Roadmap

Kimi Open Platform,3 min read

We recently hosted an AMA (Ask Me Anything) on Reddit to discuss the technology behind Kimi K2 and our future roadmap. During the session, our founder and research team revealed some key details for the first time—including our plans for Vision capabilities.

In case you missed the discussion, here are a few highlights from behind the scenes.

1. Reasoning Speed vs. Quality

Reddit User: "I hear a lot of thoughts that this would be a great model for complex agents, but nobody has brought up speed and throughput yet that I've heard."

Kimi Team: "The way we trained K2 Thinking favors relatively more thinking tokens to achieve the best results. Our Turbo API (opens in a new tab) should be much faster. Also K2 Thinking is natively INT4, which further speeds up the reasoning process."

Vision Roadmap

2. Multimodal Plans (Vision)

Reddit User: "any plans for a VL in k2?"

Kimi Team: "Yes, we are working on it. Stay tuned!"

3. The Hardest Challenge in K2

Reddit User: "What is the most challenging thing you encountered during the process of making k2 thinking?"

Kimi Team: "One challenge is to support the interleaved 'think - tool - think - tool' mode. This is a relatively new behavior in LLMs and takes a lot of work to get right."

4. Kimi's Distinct "Personality"

Reddit User: "The distinct creative writing quality of K2-Instruct, was it intentional or was it an emergent behaviour...?"

Kimi Team: "We also enjoy its writing style and it's an important part of our post-training data and eval."

5. 1M Context Window

Reddit User: "Do you have any plans to move to 1M context window? There are many use cases, e.g. Legal AI..."

Kimi Team: "We've done 1M context window before, but it is too expensive to serve at that moment. We will revisit longer context window in the future."

6. Next-Gen Architecture (K3)

Reddit User: "Will KDA be used in the next-generation flagship model of Kimi? What's its advantage?"

Kimi Team: "KDA hybrids with NOPE MLA perform better than full MLA with ROPE in our apples-to-apples comparison across pretraining and RL. They not only achieve higher benchmark scores, but are also faster and more economical, allowing us to pretrain more quickly, roll out faster during RL, and serve more users. ... It is likely that related ideas will be employed in K3."

AMA Screenshot


Read the Full Thread

For the complete discussion, visit the archived AMA here: AMA with Kimi AI on r/LocalLLaMA (opens in a new tab)

Community & Support

We're here to help and would love your feedback. Reach out anytime:


This newsletter recaps the Kimi founder's AMA session, revealing key insights about K2's development challenges, future vision capabilities, context window plans, and next-generation K3 architecture developments.

Thank you for building with Kimi.

The Kimi API Team

2025 © Moonshot AIConsoleDocs