Let's talk about the accidental leak of @claudeai's Code source code. In short: this production-grade Agent OS system design provides @openclaw with a solid model for building systems engineering capabilities, and the AI OS field is about to reach new heights:
1) In the past two years, the core problem for agents has never been that the models weren't smart enough, but that the systems weren't reliable enough. Models can write code, but writing code doesn't guarantee it will run. Models can execute tasks, but after running for a long time, the context decays, creating illusions upon illusions. Models can remember things, but no one tells them what to remember or how to verify if the memory is correct.
In Claude Code's source code, these three problems all have clear engineering solutions:
--Context decay: A three-layer memory architecture, the resident context only contains lightweight indexes pointing to external files, and historical conversations are never read back in their entirety, only the necessary fragments are grepped. Writing is strictly disciplined; the index is only updated upon successful file writing, and failed attempts are not entered into the memory.
--Generation does not equal completion: A separate verification specialist is responsible for verification. The system prompts themselves contain 2866 tokens, completely separated from the generation module. In addition, an assertiveness counterweight mechanism actively questions its own conclusions, ensuring that "appearing complete" is not considered "truly complete."
--How tasks survive when the user is not present: KAIROS, a background daemon appearing more than 150 times in the code, continuously runs autoDream when unsupervised, integrating observations, removing contradictory inferences, and solidifying vague impressions into clear facts. When the user returns, the agent's state is proactively maintained.
Putting these three solutions together, Anthropic's logical approach becomes clear: decoupling model capabilities from system reliability. The model is responsible for generation, while the system is responsible for verification, memory, recovery, and scheduling.
2) This set of engineering capabilities and optimization logic is precisely OpenClaw's current weakest point.
Despite OpenClaw's meteoric rise to over 340,000 GitHub stars in just five months—a speed rarely matched in open-source history—community criticism has been constant. Issues include: task looping, false completions (the agent reports "complete" as soon as the code is written, without any verification), state loss, rudimentary memory management relying solely on markdown, and widespread plugin crashes after refactoring.
These aren't random bugs; the root cause boils down to one thing: a lack of systematic engineering capabilities for managing global state.
Now, by referencing the leaked source code of Claude Code, we can simply copy the approach, filling in the gaps and providing a clear reference: No independent verification sub-agent → Refer to the Verification Specialist architecture, forcing an independent verification layer after execution; Memory relies on whole-segment reads and writes → Refer to three-layer memory + strict write discipline, separating context indexes from actual content; Task state is not persistent → Refer to 25+ lifecycle hooks + sessionId recovery mechanism, making "no task loss upon restart" a standard feature; Coarse tool permissions → Refer to 2500 lines of bash verification logic + permission schema, performing verification and permission grading before each tool execution;
That's all.
Honestly, I'm looking forward to the next version of OpenClaw. Because if this approach is carefully ported in, all the problems that OpenClaw currently faces criticism might be easily solved. And once these are addressed, OpenClaw's inherent advantages of local operation + privacy protection + open ecosystem will truly be accelerated, which is precisely the result Anthropic least wants to see. Haha, what a coincidence that a human-caused accident happened right when Anthropic was frantically iterating its product line and OpenClaw's momentum was waning. Now we'll see how the OpenClaw ecosystem performs in the face of this opportunity.