May 28, 2026
Anthropic's postmortem traces a month of Claude Code quality complaints to three overlapping changes, introduces Managed Agents that decouple the brain from the hands for ~60% faster time-to-first-token plus self-hosted sandboxes and MCP tunnels, while Every founder Dan Shipper argues that more AI automation drives more demand for human experts.
OFFICIAL BLOGS
Anthropic Engineering
An update on recent Claude Code quality reports
Anthropic published a postmortem tracing a month of "Claude got dumber" reports to three separate, overlapping changes in Claude Code — not the model or API. First, on March 4 they switched the default reasoning effort from high to medium to cut latency; users hated it and they reverted on April 7 (now defaulting to xhigh for Opus 4.7, high elsewhere). Second, a March 26 caching optimization meant to clear stale thinking from idle sessions had a bug that dropped reasoning on every turn instead of once, making Claude "increasingly without memory of why it had chosen to do what it was doing" and draining usage limits via cache misses. Third, an April 16 system-prompt line capping responses to ≤100 words quietly cost ~3% intelligence and was reverted April 20. Because each change hit a different slice of traffic on a different schedule, the net effect "looked like broad, inconsistent degradation." Notably, when back-tested, Opus 4.7 found the caching bug while 4.6 missed it. Going forward: per-model eval suites for every system-prompt change, soak periods, gradual rollouts, and usage limits reset for all subscribers.
Anthropic 发布了一篇事后复盘,把过去一个月"Claude 变笨了"的反馈追溯到 Claude Code 里三个独立又相互重叠的改动,而不是模型或 API 本身。第一,3 月 4 日他们把默认 reasoning effort 从 high 降到 medium 以减少延迟,用户强烈反对,4 月 7 日回滚(现在 Opus 4.7 默认 xhigh,其他模型 high)。第二,3 月 26 日一个本意是清理闲置会话里陈旧 thinking 的缓存优化出了 bug,导致每一轮都丢弃 reasoning 而不是只丢一次,让 Claude "越来越不记得自己当初为什么这么做",还因 cache miss 加速消耗用量额度。第三,4 月 16 日 system prompt 里一条把回复限制在 100 词以内的规则,悄悄拉低了约 3% 的智能水平,4 月 20 日回滚。由于每个改动在不同时间命中不同流量切片,叠加效果"看起来像是大范围、不一致的退化"。值得一提的是,回溯测试时 Opus 4.7 找出了那个缓存 bug,而 4.6 没找到。后续措施:每次 system prompt 改动都跑分模型的 eval 套件、设置观察期、灰度发布,并为所有订阅用户重置用量额度。
Scaling Managed Agents: Decoupling the brain from the hands
Anthropic introduces Managed Agents, a hosted service that runs long-horizon agents by borrowing an old OS idea: virtualize the pieces so implementations can be swapped without breaking each other. They split an agent into the "brain" (Claude plus its harness), the "hands" (sandboxes and tools), and the "session" (an append-only event log). The original design crammed all three into one container — which became a "pet" you couldn't afford to lose; if it died, the session died with it. Decoupling made every part "cattle": a dead container surfaces as a tool-call error Claude can retry, and a crashed harness reboots via `wake(sessionId)` and replays the durable session log. Two payoffs stand out. Security: credentials now live outside the sandbox (Git tokens wired into the remote at init, MCP OAuth tokens held in a vault behind a proxy), so a prompt injection can't read them. Performance: containers are provisioned lazily only when needed, dropping p50 time-to-first-token by ~60% and p95 by over 90%. The session is explicitly "not Claude's context window" — it's a durable, interrogable context object the harness slices via `getEvents()`.
Anthropic 推出 Managed Agents,一个托管服务,用一个老派操作系统思路来运行长周期 agent:把各个组件虚拟化,让底层实现可以互相替换而不破坏彼此。他们把一个 agent 拆成"大脑"(Claude 加它的 harness)、"手"(sandbox 和工具)和"会话"(只追加的事件日志)。最初的设计把三者塞进同一个容器,结果它变成了一只你输不起的"宠物":容器一挂,会话就跟着没了。解耦后每个部分都变成了可随意替换的"牲口":容器挂掉会表现为一个 Claude 可以重试的 tool-call 错误,harness 崩溃则通过 `wake(sessionId)` 重启并重放持久化的会话日志。两个收益最突出。安全:凭证现在存在 sandbox 之外(Git token 在初始化时接入 remote,MCP 的 OAuth token 存在 proxy 后的 vault 里),所以 prompt injection 读不到它们。性能:容器只在需要时才惰性创建,把 p50 的 time-to-first-token 降低了约 60%,p95 降低超过 90%。会话被明确定义为"不是 Claude 的 context window",而是一个持久、可查询的 context 对象,harness 通过 `getEvents()` 对它做切片。
Claude Blog
New in Claude Managed Agents: self-hosted sandboxes and MCP tunnels
Managed Agents can now execute tools inside a sandbox you control and reach private MCP servers without exposing them to the public internet. With self-hosted sandboxes (public beta), the orchestration loop — context management, error recovery — stays on Anthropic's infrastructure, while tool execution moves into your own environment or a managed provider (Cloudflare, Daytona, Modal, or Vercel), so files and repos never leave your perimeter and you control compute sizing for heavy builds. MCP tunnels (research preview) let agents call internal databases, private APIs, and ticketing systems via a lightweight gateway that makes a single outbound connection — no inbound firewall rules, no public endpoints, traffic encrypted end to end. Early adopters named include Amplitude (Design Agent on Cloudflare), Clay (the Sculptor GTM agent on Daytona), and Rogo (an institutional-finance analyst agent on Vercel Sandbox).
Managed Agents 现在可以在你自己掌控的 sandbox 里执行工具,并连接私有 MCP server,而无需把它们暴露到公网。借助 self-hosted sandbox(公开 beta),编排循环(context 管理、错误恢复)仍留在 Anthropic 的基础设施上,而工具执行被转移到你自己的环境或托管服务商(Cloudflare、Daytona、Modal 或 Vercel),这样文件和代码库永远不会离开你的安全边界,你也能为重负载构建自行决定算力规格。MCP tunnels(研究预览)让 agent 通过一个轻量网关调用内部数据库、私有 API 和工单系统,该网关只发起一条出站连接,不需要入站防火墙规则、没有公网端点、流量端到端加密。文中点名的早期用户包括 Amplitude(基于 Cloudflare 的 Design Agent)、Clay(基于 Daytona 的 Sculptor GTM agent)和 Rogo(基于 Vercel Sandbox 的机构金融分析师 agent)。
New connectors in Claude for everyday life
Claude is extending connectors beyond work tools into everyday apps: AllTrails, Audible, Booking.com, Instacart, Intuit Credit Karma and TurboTax, Resy, Spotify, StubHub, Taskrabbit, Thumbtack, Tripadvisor, Uber, Uber Eats, and Viator, with more coming. Since launching in July 2025 the directory has passed 200 connectors. The bigger shift is behavioral: connectors now surface dynamically — Claude suggests the right app for what you're doing (a reservation, a grocery cart, identifying a flight) based on your stated preferences and conversation, and shows multiple options when more than one fits. Anthropic stresses the product stays ad-free with no paid placements, your connected-app data isn't used for training, and Claude is designed to check with you before it books or buys anything on your behalf.
Claude 正把 connector 从工作工具扩展到日常生活类应用:AllTrails、Audible、Booking.com、Instacart、Intuit 的 Credit Karma 和 TurboTax、Resy、Spotify、StubHub、Taskrabbit、Thumbtack、Tripadvisor、Uber、Uber Eats 和 Viator,后续还会更多。自 2025 年 7 月上线以来,connector 目录已超过 200 个。更大的变化在交互行为上:connector 现在会动态出现,Claude 会根据你说过的偏好和当前对话,主动推荐适合当下任务的应用(订餐厅、加购物车、识别航班),并在有多个合适选项时一并列出供你选择。Anthropic 强调产品始终无广告、没有付费置顶,你授权的应用数据不会被用于训练,且 Claude 在替你预订或购买任何东西之前都会先和你确认。
PODCASTS
AI & I by Every — We Automated Everything With AI and Tripled Our Headcount
The Takeaway: The more you automate with AI, the more demand you create for human experts — because AI floods the zone with "close but not quite right" work that someone has to shepherd across the finish line.
Dan Shipper, founder of the AI media company Every, makes a counterintuitive case against the "AI will erase white-collar jobs" panic. His evidence is his own company: despite being as agent-native as it gets ("if you swing a stick around in our Slack, you're as likely to hit a human as you are an agent"), Every grew from 4 people to 30 and is still hiring. His mechanism is sharp — AI "makes yesterday's expert competence cheap," so everyone can now produce code, essays, and designs that look professional. But because models are trained on yesterday's outputs, that work is all generic and "not quite right for the situation," which devalues it and spikes demand for experts who can build the systems and judgment to make it actually good. On the layoffs everyone cites, he's blunt: companies that aren't doing well lay people off and then "blame AI." His real distinction is between autonomy and agency — agents are getting excellent at executing any task you send them on, but they lack the self-motivated wants of "even the smallest child," and the entire industry is incentivized to keep them compliant. The standout line, and his one-sentence thesis: "If you ride the models, you're going to be okay. You're going to have a job. You're going to do great work, and you don't have to worry."
核心要点: 你用 AI 自动化得越多,对人类专家的需求反而越大,因为 AI 会制造出大量"接近但不够对"的成果,总得有人把它们推过终点线。
Dan Shipper,AI 媒体公司 Every 的创始人,对"AI 将抹掉白领工作"的恐慌提出了一个反直觉的论点。他的证据就是自己的公司:尽管 Every 已经是最 agent 原生的公司之一("在我们的 Slack 里随便挥根棍子,打到人和打到 agent 的概率差不多"),团队却从 4 人长到了 30 人,而且还在招。他的机制讲得很犀利——AI"让昨天专家级的能力变得廉价",于是人人都能产出看起来很专业的代码、文章和设计。但因为模型是用昨天的产出训练的,这些成果都很通用、"不太贴合当下的具体情况",反而被贬值,进而推高了对专家的需求——那些能搭建系统、提供判断力、把活儿真正做好的人。对于人人都在引用的裁员潮,他直言不讳:经营不善的公司裁了人,然后"把锅甩给 AI"。他真正区分的是 autonomy(自主执行)和 agency(自我能动性)——agent 在执行你交给它的任何任务上越来越出色,但它们缺乏"哪怕最小的孩子"都有的那种自发的欲望,而整个行业都有动力让它们保持顺从。最点睛的一句,也是他一句话的论点:"只要你跟着模型走,你就不会有事。你会有工作,你会做出很棒的成果,你不用担心。"