OpenAI upgrades its Responses API to support agent skills and a complete terminal shell
Until recently, the practice of building AI agents has been a bit like training a long-distance runner with a thirty-second memory.
Yes, you could give your AI models tools and instructions, but after a few dozen interactions — several laps around the track, to extend our running analogy — it would inevitably lose context and start hallucinating.
With OpenAI's latest updates to its Responses API — the application programming interface that allows developers on OpenAI's platform to access multiple agentic tools like web search and file search with a single call — the company is signaling that the era of the limited agent is waning.
The updates announced today include Server-side Compaction, Hosted Shell Containers, and a new "Skills" standard for agents.
With these three major updates, OpenAI is effectively handing agents a permanent desk, a terminal, and a memory that doesn’t fade and should help agents evolve furhter into reliable, long-term digital workers.
Technology: overcoming 'context amnesia'
The most significant technical hurdle for autonomous agents has always been the "clutter" of long-running tasks. Every time an agent calls a tool or runs a script, the conversation history grows.
Eventually, the model hits its token limit, and the developer is forced to truncate the history—often deleting the very "reasoning" the agent needs to finish the job.
OpenAI’s answer is Server-side Compaction. Unlike simple truncation, compaction allows agents to run for hours or even days.
Early data from e-commerce platform Triple Whale suggests this is a breakthrough in stability: their agent, Moby, successfully navigated a session involving 5 million tokens and 150 tool calls without a drop in accuracy.
In practical terms, this means the model can "summarize" its own past actions into a compressed state, keeping the essential context alive while clearing the noise. It transforms the model from a forgetful assistant into a persistent system process.
Managed cloud sandboxes
The introduction of the Shell Tool moves OpenAI into the realm of managed compute. Developers can now opt for container_auto, which provisions an OpenAI-hosted Debian 12 environment.
This isn't just a code interpreter: it gives each agent its own full terminal environment pre-loaded with:
Native execution environments including Python 3.11, Node.js 22, Java 17, Go 1.23, and Ruby 3.1.
Persistent storage via /mnt/data, allowing agents to generate, save, and download artifacts.
Networking capabilities that allow agents to reach out to the internet to install libraries or interact with third-party APIs.
The Hosted Shell and its persistent /mnt/data storage provide a managed environment where agents can perform complex data transformations using Python or Java without requiring the team to build and maintain custom ETL (Extract, Transform, Load) middleware for every AI project.
By leveraging these hosted containers, data engineers can implement high-performance data processing tasks while minimizing the "multiple responsibilities" that come with managing bespoke infrastructure, removing the overhead of building and securing their own sandboxes. OpenAI is essentially saying: “Give us the instructions; we’ll provide the computer.”
OpenAI's Skills vs. Anthropic's Skills
While OpenAI is racing toward a unified agent orchestration stack, it faces a significant philosophical challenge from Anthropic’s Agent Skills.
Both companies have converged on a remarkably similar file structure — using a SKILL.md (markdown) manifest with YAML frontmatter — but their underlying strategies reveal divergent visions for the future of work.
OpenAI’s approach prioritizes a "programmable substrate" optimized for developer velocity. By bundling the shell, the memory, and the skills into the Responses API, they offer a "turnkey" experience for building complex agents rapidly.
Already, enterprise AI search startup Glean reported a jump in tool accuracy from 73% to 85% by using OpenAI's Skills framework.
In contrast, Anthropic has launched Agent Skills as an independent open standard (agentskills.io).
While OpenAI's system is tightly integrated into its own cloud infrastructure, Anthropic’s skills are designed for portability. A skill built for Claude can theoretically be moved to VS Code, Cursor, or any other platform that adopts the specification.
Indeed, the hit new open source AI agent OpenClaw adopted this exact SKILL.md manifest and folder-based packaging, allowing it to inherit a wealth of specialized procedural knowledge originally designed for Claude.
This architectural compatibility has fueled a community-driven "skills boom" on platforms like ClawHub, which now hosts over 3,000 community-built extensions ranging from smart home integrations to complex enterprise workflow automations.
This cross-pollination demonstrates that the "Skill" has become a portable, versioned asset rather than a vendor-locked feature. Because OpenClaw supports multiple models — including OpenAI’s GPT-5 series and local Llama instances — developers can now write a skill once and deploy it across a heterogeneous landscape of agents.
For technical decision-makers, this open standard is turning into the industry's preferred way to externalize and share "agentic knowledge," moving past proprietary prompts toward a shared, inspectable, and interoperable infrastructure.
But there is another important distinction between OpenAI's and Anthropic's "Skills."
OpenAI uses Server-side Compaction to manage the active state of a long-running session. Anthropic utilizes Progressive Disclosure, a three-level system where the model is initially only aware of skill names and descriptions.
Full details and auxiliary scripts are only loaded when the task specifically requires them. This allows for massive skill libraries—brand guidelines, legal checklists, and code templates—to exist without overwhelming the model's working memory.
Implications for enterprise technical decision-makers
For engineers focused on "rapid deployment and fine-tuning," the combination of Server-side Compaction and Skills provides a massive productivity boost
Instead of building custom state management for every agent run, engineers can leverage built-in compaction to handle multi-hour tasks.
Skills allow for "packaged IP," where specific fine-tuning or specialized procedural knowledge can be modularized and reused across different internal projects.
For those tasked with moving AI from a "chat box" into a production-grade workflow—OpenAI’s announcement marks the end of the "bespoke infrastructure" era.
Historically, orchestrating an agent required significant manual scaffolding: developers had to build custom state-management logic to handle long conversations and secure, ephemeral sandboxes to execute code.
The challenge is no longer "How do I give this agent a terminal?" but "Which skills are authorized for which users?" and "How do we audit the artifacts produced in the hosted filesystem?" OpenAI has provided the engine and the chassis; the orchestrator’s job is now to define the rules of the road.
For security operations (SecOps) managers, giving an AI model a shell and network access is a high-stakes evolution. OpenAI’s use of Domain Secrets and Org Allowlists provides a defense-in-depth strategy, ensuring that agents can call APIs without exposing raw credentials to the model's context.
But as agents become easier to deploy via "Skills," SecOps must be vigilant about "malicious skills" that could introduce prompt injection vulnerabilities or unauthorized data exfiltration paths.
How should enterprises decide?
OpenAI is no longer just selling a "brain" (the model); it is selling the "office" (the container), the "memory" (compaction), and the "training manual" (skills). For enterprise leaders, the choice is becoming clear:
Choose OpenAI if you need an integrated, high-velocity environment for long-running autonomous work.
Choose Anthropic if your organization requires model-agnostic portability and an open ecosystem standard.
Ultimately, the announcements signal that AI is moving out of the chat box and into the system architecture, turning "prompt spaghetti" into maintainable, versioned, and scalable business workflows.
