ai builders

June 16, 2026

12 builders25 posts1 podcast1 blog

OpenAI's Dan Roberts argues reinforcement learning is now 'the cake, not the cherry' and powerful enough for AI to make genuine scientific discoveries, citing the week's Erdős-problem breakthroughs where a model disproved a conjecture by daring to assume it false; Claude Managed Agents ships dreaming, outcomes, and multiagent orchestration with Harvey seeing ~6x higher completion rates; and Vercel's Guillermo Rauch ships longer Fluid-compute function runtimes plus v0 default skills, Box's Aaron Levie reframes the AI race around customizable, model-routed intelligence and applied-use regulation, Google's Josh Woodward upgrades Gemini voice to 70+ mixable languages, Replit's Amjad Masad touts one-click domain-specific agents, and Peter Steinberger demos an agent that auto-triages and PRs open-source issues.

X / TWITTER

Google VP Josh Woodward announced a major upgrade to Gemini's voice input on Android and iOS. The mic now handles 70+ languages, lets you mix languages freely within a single session, removes the need to change language settings manually, and still won't interrupt you mid-sentence. He framed it as especially significant for non-English speakers. https://x.com/joshwoodward/status/2066673011554435450

谷歌副总裁 Josh Woodward 宣布 Gemini 在 Android 和 iOS 上的语音输入迎来重大升级。麦克风现在支持 70 多种语言，可以在一次对话中自由混用多种语言，无需再手动切换语言设置，而且依然不会在你说话时打断你。他特别强调，这对非英语母语者意义重大。https://x.com/joshwoodward/status/2066673011554435450

Vercel CEO Guillermo Rauch shipped longer Vercel Function runtimes, which he says is less a tweaked constant than the payoff of a multi-year compute-platform investment: Builds, Sandbox, and now Functions all run on Vercel's homegrown microVM-based Fluid compute. He predicts 2026 is the year serverless and servers "finally converge" with no gotchas, since a sandbox, a function, a server, and a build are all expressions of the same underlying compute. Separately, v0 now ships curated skills by default, aiming to give every prompt the equivalent of a Vercel product engineer, with a marketplace to grab more or add your team's private set. https://x.com/rauchg/status/2066553521978097921 · https://x.com/rauchg/status/2066556235961237826 · https://x.com/rauchg/status/2066567117562868009

Vercel CEO Guillermo Rauch 上线了更长的 Vercel Function 运行时。他说这与其说是改了个常量，不如说是多年 compute 平台投入的回报：Builds、Sandbox，如今再加上 Functions，全都跑在 Vercel 自研、基于 microVM 的 Fluid compute 之上。他预测 2026 年是 serverless 与服务器"终于融合"且毫无坑点的一年，因为 sandbox、function、服务器和 build 本质上都是同一套底层 compute 的不同表达。另外，v0 现在默认内置精选 skills，目标是让每一次 prompt 都相当于配上一位 Vercel 产品工程师，并提供一个 marketplace 让你获取更多 skills 或接入团队的私有集合。https://x.com/rauchg/status/2066553521978097921 · https://x.com/rauchg/status/2066556235961237826 · https://x.com/rauchg/status/2066567117562868009

Box CEO Aaron Levie argued the most interesting shift in AI isn't one model getting smarter, but intelligence becoming customizable: the winners won't necessarily have the biggest models, but those who combine their unique data and workflows with a routing layer that sends each task to whichever model performs it best. On policy, he pushed back on the idea of an "FDA for AI," warning that pre-release approval for every model across every country would create an impossible backlog and dramatically slow progress. Instead, he says, regulate the applied uses of AI where risk actually shows up. https://x.com/levie/status/2066735879213994434 · https://x.com/levie/status/2066554018953146689

Box CEO Aaron Levie 认为，AI 领域最有意思的转变不是某个模型变得更聪明，而是智能正在变得可定制：赢家未必拥有最大的模型，而是那些把自身独特的数据和工作流，与一个能把每个任务路由到最擅长它的模型的中间层结合起来的公司。在监管上，他反对设立"AI 版 FDA"的想法，警告说要在每个国家、对每个模型都做发布前审批，会造成无法消化的积压，并极大拖慢进展。他主张应当转而监管 AI 的实际应用场景，也就是风险真正出现的地方。https://x.com/levie/status/2066735879213994434 · https://x.com/levie/status/2066554018953146689

Replit CEO Amjad Masad highlighted Replit's new domain-specific agents — a growth agent that surfaces SEO issues and a security agent that flags potential vulnerabilities — with a "select all, fix with Agent" workflow he calls his favorite feature. The pitch is specialized agents that both detect problems and remediate them in one click. https://x.com/amasad/status/2066683949129330817

Replit CEO Amjad Masad 重点介绍了 Replit 新推出的领域专用 agent——一个负责发现 SEO 问题的增长 agent，以及一个负责标记潜在漏洞的安全 agent——并配上他最喜欢的"全选，用 Agent 一键修复"工作流。卖点在于：专门化的 agent 既能发现问题，又能一键修复。https://x.com/amasad/status/2066683949129330817

Peter Steinberger showed off "clawsweeper," an agent wired into his open-source projects: whenever someone files an issue, the agent reviews it against the repo's VISION.md and, if it fits, picks it up and opens a self-reviewed PR automatically. It's a concrete example of agents handling open-source triage and contribution end to end. https://x.com/steipete/status/2066457262571360396

Peter Steinberger 展示了 "clawsweeper"——一个接入他开源项目的 agent：每当有人提交 issue，这个 agent 会对照仓库里的 VISION.md 进行评估，如果契合，就自动接手并开出一个经过自我评审的 PR。这是 agent 端到端处理开源项目 issue 分流与贡献的一个具体例子。https://x.com/steipete/status/2066457262571360396

FPV Ventures partner Nikunj Kothari noted he now knows 32 VCs — from associates to GPs — who have moved back into operating roles in the past 12 months, and the pace seems to be accelerating. His read: for junior investors especially, operating offers more autonomy, direct customer work, and a faster shot at liquidity than waiting 13 years for carry, and the people making the move seem much happier. https://x.com/nikunj/status/2066701833964531736

FPV Ventures 合伙人 Nikunj Kothari 提到，过去 12 个月里他认识的 VC 中已有 32 人——从 associate 到 GP——重新回到了实操(operating)岗位，而且节奏似乎还在加快。他的解读是：尤其对资历尚浅的投资人来说，做实操能带来更多自主权、直接面对客户的机会，以及比苦等 13 年 carry 更快的变现可能；而且转身去做实操的人看起来都明显更快乐。https://x.com/nikunj/status/2066701833964531736

Peter Yang, who publishes practical AI tutorials, said OpenAI's Codex browser use is now so good it "almost makes me forget APIs are even needed" — a small but telling vote for agents driving real browsers over wiring up integrations. https://x.com/petergyang/status/2066753125197967653

专注实用 AI 教程的 Peter Yang 表示，OpenAI Codex 的浏览器操作能力如今好到"几乎让我忘了还需要 API"——这是一个虽小却很说明问题的信号：让 agent 直接操作真实浏览器，正在与费力对接各种集成形成竞争。https://x.com/petergyang/status/2066753125197967653

OFFICIAL BLOGS

Claude Blog

New in Claude Managed Agents: dreaming, outcomes, and multiagent orchestration introduced three updates to Claude Managed Agents. "Dreaming," launching as a research preview, is a scheduled process that reviews past agent sessions and memory stores to extract patterns — recurring mistakes, converged workflows, shared team preferences — and curate memory so agents self-improve between sessions. "Outcomes" lets you write a rubric for what success looks like; a separate grader evaluates the output in its own context window and sends the agent back for another pass until it clears the bar, improving task success by up to 10 points (and +8.4% on docx, +10.1% on pptx generation) in internal benchmarks. Multiagent orchestration lets a lead agent break a job into pieces and delegate to specialists with their own model, prompt, and tools, working in parallel on a shared filesystem. As the post puts it: "Memory lets each agent capture what it learns as it works. Dreaming refines that memory between sessions." Early results: Harvey saw completion rates rise ~6x, Spiral by Every runs a Haiku lead agent that delegates drafting to Opus subagents scored against editorial rubrics, and Wisedocs cut review time 50%. https://claude.com/blog/new-in-claude-managed-agents

Claude 博客的文章《New in Claude Managed Agents: dreaming, outcomes, and multiagent orchestration》介绍了 Claude Managed Agents 的三项更新。以研究预览形式推出的 "Dreaming"(做梦)是一个定时运行的过程，它会回顾过往的 agent 会话和 memory 存储，提炼其中的规律——反复出现的错误、agent 趋同采用的工作流、团队共享的偏好——并整理 memory，让 agent 在会话之间不断自我提升。"Outcomes"(结果)让你写下一份描述"什么算成功"的 rubric；一个独立的评分器在自己的 context window 中评估输出，并在达标之前把任务退回给 agent 重做，在内部基准测试中将任务成功率最多提升了 10 个百分点(docx 生成 +8.4%、pptx 生成 +10.1%)。多 agent 编排(multiagent orchestration)则让一个主 agent 把任务拆分成若干部分，分派给各自拥有独立模型、prompt 和工具的专门 agent，在共享文件系统上并行工作。正如文中所说："Memory 让每个 agent 在工作中记下自己学到的东西，Dreaming 则在会话之间打磨这些 memory。"早期成效：Harvey 的任务完成率提升约 6 倍；Spiral by Every 用一个跑在 Haiku 上的主 agent，把起草工作分派给跑在 Opus 上、并按编辑 rubric 评分的子 agent；Wisedocs 则把审核时间缩短了 50%。https://claude.com/blog/new-in-claude-managed-agents

PODCASTS

The MAD Podcast with Matt Turck — OpenAI's Dan Roberts: Why AI Can Now Make Discoveries

The Takeaway: AI has crossed from doing the work we assign it to autonomously making genuine scientific discoveries — and the breakthrough came not from raw power, but from a model willing to be contrarian.

Dan Roberts leads the foundations of reinforcement learning team at OpenAI. A former theoretical physicist with an MIT PhD in quantum gravity and quantum information, he co-authored *The Principles of Deep Learning Theory* — and he is unusually clear about why this moment matters. In a single week, OpenAI, DeepMind, and Anthropic each cracked long-unsolved Erdős problems. The approaches diverged: DeepMind formalized proofs in the language Lean, while OpenAI worked in informal, human-readable math. The striking part was the move itself. Faced with a conjecture everyone assumed was true, the model assumed it was *false* and persevered down a long, contrarian calculation path, pulling in tools from algebraic number theory to refute it. As Roberts puts it: "One of the things that ChatGPT was able to do was assume it was false. When you go against the grain and do something contrarian like that, you really have to have strong conviction in what you're doing in order to persevere down a really long calculation path."

Two ideas stand out. First, reinforcement learning is now "the cake, not the cherry" — the main course of how raw compute is turned into intelligence, not a thin garnish on top of pretraining. Second, a contrarian take on scaling: when capabilities seem to "emerge" or "grok" at scale, that's not magic, it means you didn't actually understand what you were scaling. The real work is to go back to smaller, simpler models and restore smoothness — the physicist's "spherical cow" move of stripping a system down to the simplest version that still contains the thing you care about. He also resists pure scale-maximalism: good ideas have to guide the scaling, and language is the right grounding for intelligence precisely because "everything goes through language."

Sources1

核心要点： AI 已经从"做我们交派的活"跨越到了"自主做出真正的科学发现"——而这个突破靠的不是蛮力算力，而是一个敢于唱反调的模型。

Dan Roberts 领导着 OpenAI 的强化学习基础(foundations of reinforcement learning)团队。他曾是理论物理学家，拥有 MIT 量子引力与量子信息方向的博士学位，并与人合著了《The Principles of Deep Learning Theory》——他对这一刻为何重要讲得格外清楚。在短短一周内，OpenAI、DeepMind 和 Anthropic 各自攻克了长期悬而未决的 Erdős 问题。路径各不相同：DeepMind 用 Lean 语言把证明形式化，OpenAI 则在非形式化、人类可读的数学语言中作业。真正惊人的是那一步操作本身。面对一个所有人都默认为真的猜想，模型反而假设它是"假的"，并沿着一条漫长、反主流的计算路径坚持下去，动用代数数论的工具去推翻它。正如 Roberts 所说："ChatGPT 能做到的一件事，就是假设它是假的。当你逆势而行、做这种唱反调的事时，你必须对自己在做的事有很强的信念，才能在一条非常长的计算路径上坚持走下去。"

有两点尤其突出。第一，强化学习如今是"蛋糕本身，而不是上面的樱桃"——它是把原始算力转化为智能的主菜，而不是 pretraining 之上的一层薄薄点缀。第二，一个关于 scaling 的反主流观点：当能力似乎在规模变大时"涌现"或"grok"出来，这并不神秘，而是说明你其实没真正理解自己在 scale 什么。真正该做的，是回到更小、更简单的模型上去恢复那条平滑曲线——也就是物理学家"球形奶牛"的思路：把一个系统剥到最简，却仍保留你真正关心的那个核心。他也不赞同纯粹的规模至上：好的想法必须为 scaling 指路；而语言之所以是智能的正确"接地"，恰恰是因为"一切都要通过语言"。

Sources1