Kimi K2.6 runs agents for days — and exposes the limits of enterprise orchestration
Most orchestration frameworks were built for agents that run for seconds or minutes. Now that agents are running for hours — and in some cases days — those frameworks are starting to crack.
Several model providers, such as Anthropic with Claude Code and OpenAI with Codex, introduced early support for long-horizon agents through multi-session tasks, subagents and background execution. However, these systems sometimes assume agents are still operating within bounded-time workflows even when they run for extended periods.
Open-source model provider Moonshot AI wants to push beyond that with its new model, Kimi K2.6.
Moonshot says the model is designed for continuous execution, with internal use cases including agents that ran for hours and, in one case, five straight days, handling monitoring and incident response autonomously.
But this growing use of this type of agent is exposing a critical gap in orchestration: most orchestration frameworks were not designed for this type of continuous, stateful execution. Open-source models, such as Kimi K2.6, that rely on agent swarms are making the case that their orchestration approach comes close to managing stateful agents.
The difficulties of orchestrating long-running agents
While it is true that some enterprises would rather bring their own orchestration frameworks to their agentic ecosystem, model providers and agent platforms recognize that offering agent management remains a competitive advantage.
Other model providers have begun exploring long-running agents, many through multi-session tasks and background execution. For example, Anthropic’s Claude Code orchestrates agents with a lead agent that directs other agents based on a set of user-instructed definitions. OpenAI’s Codex runs similarly.
Kimi K2.6 approaches orchestration with an improved version of its Agent Swarms, capable of managing up to 300 sub-agents “executing across 4,000 coordinated steps simultaneously,” Moonshot AI wrote in a blog post. Compared to both Claude Code and Codex, K2.6 relies on the model, rather than pre-defined roles, to determine orchestration.
Kimi K2.6 is now available on Hugging Face, through its API, Kimi Code and the Kimi app.
Practitioners experimenting with long-horizon agents say the brittleness runs deeper than prompting can fix.
As one practitioner, Maxim Saplin, put it in a blog post, “That does not mean subagents are useless. It means orchestration is still fragile. Right now, it feels more like a product and training problem than something you can solve by writing a sufficiently stern prompt.”
The problem long-running agents pose is that it’s difficult to maintain their state, especially as their environment continues to change while they're doing their job. The agent would constantly call different tools and APIs or tap into different databases during its runtime. Most current agents, those that may run for one or two executions, do call different tools, but for at most a minute.
Mark Lambert, chief product officer at ArmorCode, which builds an autonomous security platform for enterprises, told VentureBeat in an email that the governance gap is already outpacing deployment.
"These agentic systems can now generate code and system changes faster than most organizations can review, remediate, or govern them. This will require more than just additional scanning. Organizations will need stronger AI governance that provides the context, prioritization, and accountability teams need to manage Kimi and other AI-generated risk before they turn into accumulated exposure," Lambert said.
Long-running agents could also risk failure without a clear rollback. Most importantly, these types of agents often lack a set of well-defined tasks and dynamically adjust their plans as they run.
Kunal Anand, chief product officer at F5, told VentureBeat in an email that long-horizon agents represent a much bigger architectural shift than most companies were prepared for.
“We went from scripts to services to containers to functions, and now to agents as persistent infrastructure. That creates categories we do not yet have good names for: agent runtime, agent gateway, agent identity provider, agent mesh. The API gateway pattern is morphing into something that has to understand goals and workflows, not just endpoints and verbs,” Anand said.
Running for 13 hours and even five days
Understanding how to orchestrate agents becomes important because model capabilities have begun to outpace orchestration innovations, even as enterprises start to look at long-horizon agents.
Moonshot AI says the model is built for tasks that reflect "real-world challenges that typically demand weeks or months of collective human effort." In a separate technical document provided to VentureBeat, Moonshot claims K2.6 built a full SysY compiler from scratch in 10 hours — work it characterized as equivalent to a team of four engineers over two months — and passed all 140 functional tests without human intervention.
The team deployed K2.6 to complex engineering tasks, including overhauling an eight-year-old open source financial matching engine. Moonshot's engineers described a 13-hour execution that “iterated through 12 optimization strategies, initiating over 1,000 tool calls to modify more than 4,000 lines of code precisely.”
Moonshot said one of its teams used K2.6 to build an agent that ran autonomously for five days. That agent managed monitoring, incident response and system operations.
