ai builders

June 8, 2026

13 builders20 posts1 podcast1 blog

Box CEO Aaron Levie argues enterprise AI's real bottleneck is that breakthroughs arrive faster than companies can implement a stable architecture, paradoxically slowing rollout and making the internal FDE a durable new job, while Claude Managed Agents ships dreaming, outcomes, and multiagent orchestration and builders converge on a 'design loops, stop prompting' theme.

X / TWITTER

Anthropic's Boris Cherny (Claude Code) shared five tips for running Opus autonomously for hours or days, citing benchmarks that show Opus leading on long-running work: turn on auto-mode permissions so Claude never stalls on approvals, use dynamic workflows to orchestrate hundreds or thousands of agents, nudge progress with /goal or /loop, run Claude Code in the cloud so you can close your laptop, and — most important — give Claude a way to self-verify end to end (Chrome extension for web, iOS/Android sim MCP for mobile, a live server for backend work).

Sources1

Anthropic 的 Boris Cherny（Claude Code 团队）分享了让 Opus 连续自主运行数小时甚至数天的五个技巧，并引用 benchmark 指出 Opus 在长时运行任务上表现最佳：打开 auto 权限模式，让 Claude 不会卡在审批环节；用 dynamic workflow 来编排成百上千个 agent；用 /goal 或 /loop 推动它持续推进；把 Claude Code 跑在云端，这样你可以合上笔记本；而最关键的一点，是给 Claude 一条端到端自我验证的路径（网页用 Chrome 扩展，移动端用 iOS/Android 模拟器 MCP，后端则启动完整的服务）。

OpenAI's Thibault Sottiaux (Codex & ChatGPT) announced a 100-day program: each day for 100 days, OpenAI will pick one person doing impressive or unusually useful work with Codex and grant them 10X usage limits for a month to see how far they can push it. The first pick lands tomorrow.

Sources1

OpenAI 的 Thibault Sottiaux（Codex 与 ChatGPT 团队）宣布了一个为期 100 天的计划：在这 100 天里，OpenAI 每天会挑出一位用 Codex 做出惊艳或极其有用工作的人，给予他一个月 10 倍的用量额度，看看他们能把 Codex 推到多远。第一位人选明天揭晓。

Former Google Gemini/Veo product leader Madhu Guru pushed back on the idea that training data is low-skill grunt work. The data that actually advances the model frontier is the opposite: labs need training data for high-economic-value tasks, and most of those tasks outside of software engineering have little documentation — they're complex, domain-specific knowledge built over years across legacy tools that don't talk to each other. That, he argues, is exactly why we have SWE agents but not yet knowledge-work agents, and why companies producing this data (like Mercor) are doing high-leverage, deeply underappreciated work.

Sources1

前 Google Gemini/Veo 产品负责人 Madhu Guru 反驳了"训练数据是低技能体力活"的看法。真正能推动模型前沿的数据恰恰相反：实验室需要的是高经济价值任务的训练数据，而这些任务（软件工程之外的）大多缺乏文档——它们是多年积累的、复杂的、领域专属的知识，散落在彼此不互通的老旧工具里。他认为，这正是为什么我们有了 SWE agent 却还没有 knowledge-work agent，也是为什么像 Mercor 这样生产这类数据的公司，做的是高杠杆、却被严重低估的工作。

Vercel CEO Guillermo Rauch revealed that Vercel AI Gateway recovers on average over 1 trillion tokens a month — drawing an analogy to how Stripe recovers revenue with smart retries on failed payments. He stressed they do it with zero markup over the labs, adding redundancy, zero-data-retention enforcement, observability, usage APIs, and caps on top.

Sources1

Vercel CEO Guillermo Rauch 透露，Vercel AI Gateway 平均每月能"挽回"超过 1 万亿个 token——他把这比作 Stripe 通过对失败支付的智能重试来挽回收入。他强调这一切相对实验室是零加价的，在此之上还叠加了冗余、零数据留存的强制执行、可观测性、用量 API 和额度上限。

Box CEO Aaron Levie made the case that use-cases will stratify between model families over the next year or two: frontier intelligence for high-end tasks, and much cheaper models for high-volume workloads that can be safely peeled off. Frontier demand will still grow, but the low end grows faster — which makes the routing layer increasingly valuable. "Agent orchestration that can cost optimize while still performing the task successfully will be in a strong position."

Sources1

Box CEO Aaron Levie 提出，未来一两年里用例会在不同模型家族之间分层：高端任务用前沿智能，而高吞吐、可以安全剥离出来的工作负载则交给便宜得多的模型。前沿需求仍会增长，但低端增长更快——这让"路由层"变得越来越有价值。"既能优化成本、又能成功完成任务的 agent 编排，将处于强势地位。"

In a second thread, Levie argued the market misunderstood AI "eating" enterprise software. AI has made building software somewhat easier, but the bulk of cost in enterprise software companies is go-to-market — consultative selling, implementation, integration — and AI hasn't reduced that need; if anything, as landscapes get busier, discoverability and differentiation become the hardest part.

Sources1

在另一条 thread 中，Levie 认为市场误解了 AI "吞噬"企业软件这件事。AI 确实让构建软件变得容易了一些，但企业软件公司的大头成本其实在 go-to-market——顾问式销售、实施、集成——而 AI 并没有降低这部分需求；恰恰相反，随着市场越来越拥挤，可发现性和差异化反而成了最难的部分。

YC President & CEO Garry Tan flagged that educating people on how to use AI tools has become a serious bottleneck, and showed off GBrain v0.42.30, which can now give you a detailed summary of how your thinking has changed over time.

Sources1

YC 总裁兼 CEO Garry Tan 指出，教会人们如何使用 AI 工具，已经成了一个严重的瓶颈；他同时展示了 GBrain v0.42.30，该版本现在能给你一份关于"你的思考随时间如何变化"的详细总结。

Builder Zara Zhang observed that her Frontend Slides skill grew organically because slides are inherently social: people see striking decks and immediately ask "how did you make it," and HTML-based decks make their creators look more AI-native and AI-savvy.

Sources1

Builder Zara Zhang 观察到，她的 Frontend Slides skill 之所以能自然增长，是因为幻灯片本身具有社交属性：人们看到惊艳的 deck 会立刻问"你是怎么做出来的"，而用 HTML 做的 deck 会让创作者显得更 AI-native、更懂 AI。

FPV Ventures partner Nikunj Kothari noted the vibe shift from "tokenmaxxing" and token anxiety to "tokenoptimizing" in just a few weeks. His mild hot take: companies should still give employees copious token budgets to stay at the frontier and explore the edges — otherwise it's far too easy to slide back into doing things the way they've always been done.

Sources1

FPV Ventures 合伙人 Nikunj Kothari 注意到，短短几周内风向就从"tokenmaxxing"和 token 焦虑转向了"tokenoptimizing"。他抛出一个略带争议的观点：公司仍应给员工充裕的 token 预算，让他们待在前沿、探索边界——否则太容易退回到"一直以来都是这么干的"老路上去。

Builder Peter Steinberger delivered his monthly reminder, capturing the week's recurring theme in one line: "You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents."

Sources1

Builder Peter Steinberger 送上他每月一次的提醒，用一句话概括了本周反复出现的主题："你不该再去 prompt 编码 agent 了，你该去设计那些 prompt 你 agent 的 loop。"

OFFICIAL BLOGS

Claude Blog: New in Claude Managed Agents — dreaming, outcomes, and multiagent orchestration. Anthropic launched "dreaming" as a research preview alongside making outcomes, multiagent orchestration, and webhooks available to developers. Dreaming is a scheduled process that reviews past agent sessions and memory stores, extracts patterns (recurring mistakes, converged workflows, shared team preferences), and curates memory so agents self-improve over time — you can let it update memory automatically or review changes first. "Outcomes" lets you write a rubric describing success; a separate grader evaluates output in its own context window and the agent self-corrects until it clears the bar — improving task success by up to 10 points over standard prompting, with +8.4% on docx and +10.1% on pptx generation in internal benchmarks. Multiagent orchestration lets a lead agent delegate pieces to specialists with their own models, prompts, and tools, working in parallel on a shared filesystem with full tracing in the Claude Console. Early results: Harvey saw completion rates rise ~6x with dreaming, and Wisedocs' review agent now runs 50% faster using outcomes.

Sources1

Claude Blog：Claude Managed Agents 新功能——dreaming、outcomes 与多 agent 编排。 Anthropic 以研究预览的形式推出了"dreaming"，同时向开发者开放了 outcomes、多 agent 编排和 webhooks。dreaming 是一个定时进程，会回顾过去的 agent 会话和记忆库，提取模式（反复出现的错误、收敛出的工作流、团队共享的偏好），并整理记忆，让 agent 随时间自我改进——你可以让它自动更新记忆，也可以先审核变更。"outcomes"让你写一份描述"什么算成功"的 rubric；一个独立的 grader 在自己的 context window 里评估输出，agent 反复自我修正直到达标——在内部 benchmark 中，任务成功率比标准 prompting 最高提升 10 个百分点，docx 生成 +8.4%、pptx +10.1%。多 agent 编排让 lead agent 把任务拆给拥有各自模型、prompt 和工具的专家 agent，它们在共享文件系统上并行工作，并可在 Claude Console 里完整追踪。早期成果：Harvey 用上 dreaming 后完成率提升约 6 倍，Wisedocs 的审查 agent 用 outcomes 后速度快了 50%。

PODCASTS

The MAD Podcast with Matt Turck — "State of Enterprise AI 2026: Aaron Levie on Tokenmaxxing, Rise of Headless, and AI-Proofing Your Job"

The Takeaway: Enterprise AI's real bottleneck isn't model capability — it's the unglamorous, multi-year work of data, access controls, and change management, and that work is exactly why most knowledge-work jobs aren't going anywhere soon.

Aaron Levie, CEO of Box, sells to the world's largest enterprises and has spoken with a couple hundred CIOs this year, giving him a rare double vantage point: Silicon Valley power user and public-company operator. His most counterintuitive point is that faster AI progress is paradoxically slowing enterprise rollout — "the breakthroughs keep happening faster than the customer can implement any kind of standard architecture," and each new model makes the last deployment obsolete, so there's no stable ground to build on. He reframes token costs as a budgeting earthquake: a single coding task can burn $1,000 of compute, AI spend is escaping the 3-7% IT budget and moving into line-of-business OpEx, and nobody has FinOps for the marketing team — he half-jokingly pegs "ERP for AI compute" as a $5B startup waiting to happen. Why does coding race ahead while the rest of knowledge work lags? Coding has technical users who can fix the agent, verifiable output, and clean access to the whole codebase — knowledge work has none of that, hitting entitlement and data-integrity walls immediately. His bet on jobs is contrarian-optimistic: the "internal FDE" — a new technical role that sits next to the business, wires up agents, and maintains them as models churn — becomes a durable fixture, not a stopgap. And on headless software, he predicts a dual model where agents bang on systems via API at 100x human volume, but people still keep the GUI: "I do have an iPad and a MacBook and an iPhone... they all do something different."

Sources1

要点：企业 AI 真正的瓶颈不是模型能力，而是数据、访问控制和变革管理这些不光鲜、需要数年才能落地的苦活——而这恰恰是大多数知识工作岗位短期内不会消失的原因。

Box CEO Aaron Levie 把产品卖给全球最大的企业，今年已和数百位 CIO 交流过，因此拥有罕见的双重视角：既是硅谷的重度用户，也是上市公司的经营者。他最反直觉的观点是，AI 进步越快，反而拖慢了企业落地——"突破出现的速度，比客户实施任何标准架构的速度都快"，每个新模型都会让上一次的部署过时，于是根本没有稳定的地基可以施工。他把 token 成本重新定义为一场预算地震：单个编码任务就可能烧掉 1000 美元算力，AI 支出正在逃离占营收 3-7% 的 IT 预算、转入各业务线的 OpEx，而没人为市场部配过 FinOps——他半开玩笑地说，"为 AI 算力做的 ERP"会是一家价值 50 亿美元、等着被创立的公司。为什么编码一骑绝尘、其余知识工作却落在后面？因为编码有能修 agent 的技术型用户、有可验证的产出、还能干净地访问整个代码库——而知识工作什么都没有，一上来就撞上权限和数据完整性的墙。他对就业的判断是反向乐观的："内部 FDE"——一种坐在业务旁边、给 agent 接线、并在模型不断更迭时维护它们的新技术岗位——会成为长期固定角色，而非权宜之计。至于无头软件，他预测是一种双轨模式：agent 以人类 100 倍的量级通过 API 敲打系统，但人依然保留 GUI："我确实同时有 iPad、MacBook 和 iPhone……它们各自做不同的事。"

Sources1