June 7, 2026
Transformer coauthor Lukasz Kaiser gives a hype-free assessment on Unsupervised Learning, admitting nobody can cleanly pin down what made coding agents click last Christmas, that long context is really a grep-and-files hack that works, and that a $2-3K RTX 5090 now lets outsiders run wild experiments; meanwhile model routing dominates X as ex-Google Madhu Guru maps its three-phase enterprise arc and Box CEO Aaron Levie calls it the next differentiation axis, swyx argues non-competes killed research publishing, Zara Zhang bets live interaction beats static content, and Every CEO Dan Shipper reads LLMs through Plato.
X / TWITTER
Cognition's Swyx (Shawn Wang, swyx on X) floats a provocative theory for why frontier research labs publish less these days: the alpha in research papers died once researchers realized they could skip the fight with marketing departments, walk out the door, and command nine-figure offers for their legally protected tacit knowledge. His sharper claim is structural — "California non-competes have a bigger impact on knowledge spreading than github, arxiv, and huggingface combined." He frames it as motivation for building AI Engineer as a product-centric industry conference to complement the paper-centric research ones.
Cognition 的 Swyx(Shawn Wang,X 上的 swyx)抛出一个挑衅性的观点,解释为什么前沿实验室如今越来越少发论文:一旦研究者意识到自己可以不必和市场部门纠缠、直接离职、凭借受法律保护的隐性知识拿到九位数的 offer,论文里的「alpha」也就消失了。他更尖锐的判断是结构性的——「加州的竞业禁止条款对知识扩散的影响,比 github、arxiv 和 huggingface 加起来还大。」他把这当作自己把 AI Engineer 办成一个以产品为核心的行业大会、与以论文为核心的研究会议互补的动力。
Former Google product leader Madhu Guru (realmadhuguru on X), previously on Gemini, Veo, and Nano Banana, argues that routing tasks to the right model is genuinely hard but exactly where the opportunity lies. He maps a three-phase enterprise arc he watched while at Gemini: Phase 1 (2024) — default to the "it" model and use GPT for everything; Phase 2 (early 2025) — over-correct toward the smallest/cheapest model without evals good enough to map tasks to models, burning cycles and shipping slower; Phase 3 — nuanced routing, where the most sophisticated AI-native startups break products into sub-agents and route each task (hardest reasoning to Claude, simplest to Gemini Flash-Lite or open-weight models). His closing observation: enterprises follow the AI-native builders by 6-9 months.
前 Google 产品负责人 Madhu Guru(X 上的 realmadhuguru,曾负责 Gemini、Veo、Nano Banana)认为,把任务路由到合适的模型确实很难,但难点正是机会所在。他梳理了在 Gemini 期间观察到的企业三阶段演进:第一阶段(2024)——默认用「当红」模型,什么都拿 GPT 来做;第二阶段(2025 年初)——矫枉过正,一味追求最小最便宜的模型,却没有足够好的 evals 把任务映射到模型,结果空转、交付更慢;第三阶段——精细化路由,最成熟的 AI-native 创业公司把产品拆成 sub-agents,每个任务各得其所(最难的推理交给 Claude,最简单的交给 Gemini Flash-Lite 或开源权重模型)。他最后点出:企业往往比 AI-native builders 晚 6 到 9 个月才跟上。
Box CEO Aaron Levie (levie on X) says token costs have become one of the hottest topics in every enterprise conversation he has — bullish, because it signals these systems are being used at a scale nobody contemplated before. His thesis: as tokens take on a significant share of any workflow's cost, model routing becomes the next axis of differentiation for the applied-AI layer. Frontier intelligence stays relevant for high-end coding, legal, financial, and healthcare tasks, but individual sub-tasks can be peeled off to cheaper models. The winners, he argues, will be the companies "with the best evals, the best ability to route the workloads, and those that have business models directly aligned to customers' financial goals."
Box CEO Aaron Levie(X 上的 levie)表示,token 成本已经成为他与每一家企业交流时最热门的话题之一——这是利好,因为它说明这些系统正以前所未有的规模被使用。他的核心观点是:当 token 占据任何工作流相当一部分成本时,model routing 就成为应用 AI 层的下一个差异化维度。前沿智能在高端的编程、法律、金融和医疗任务上依然不可替代,但单个子任务可以剥离给更便宜的模型来做。他认为最终的赢家会是那些「拥有最好的 evals、最强的工作负载路由能力,以及商业模式与客户财务目标直接对齐」的公司。
Every CEO Dan Shipper (danshipper on X) offers a pair of philosophy-meets-AI takes. He reads Plato's Protagoras — its discussion of where knowledge comes from and whether virtue can be taught — as pre-saging LLMs, and adds two human capacities he thinks are rising in value: aidōs (reverence and responsiveness to others) and dikē (the capacity to perceive what is right). His most quotable line is a deliberate koan on the consciousness debate: "LLMs are not conscious. LLMs are not not conscious. Both true."
Every CEO Dan Shipper(X 上的 danshipper)抛出一组「哲学 × AI」的思考。他把柏拉图的《普罗泰戈拉篇》——关于知识从何而来、美德是否可教的讨论——读作对 LLM 的预言,并补充了两种他认为正在升值的人类能力:aidōs(对他人的敬重与回应)和 dikē(感知何为正当的能力)。他最值得引用的一句,是对意识之争刻意留下的禅式悖论:「LLM 没有意识。LLM 也并非没有意识。两者皆为真。」
Zara Zhang (zarazhangrui on X) distills a talk she enjoyed into a crisp thesis on where value is moving: the value of static content is going down while the value of live interaction is going up. People increasingly want to connect with the human being behind a piece of work — whether content or software — and "raw & opinionated" now beats "polished & generic."
Zara Zhang(X 上的 zarazhangrui)把一场她很喜欢的演讲提炼成一句关于价值迁移的判断:静态内容的价值在下降,而实时互动的价值在上升。人们越来越想与一件作品背后那个真实的人产生连接——无论是内容还是软件——「粗糙但有观点」如今胜过「精致但平庸」。
Y Combinator President & CEO Garry Tan (garrytan on X) clarifies the privacy posture of Paxel, his local-first coding tool: the company never claimed it uploads no user data to the cloud — its specific commitment is that code (file contents) stays local and is not uploaded. He frames it as a trajectory: "over time as local models get better, we'll be able to do even more locally."
Y Combinator 总裁兼 CEO Garry Tan(X 上的 garrytan)澄清了他主打本地优先的编程工具 Paxel 的隐私立场:公司从未声称不向云端上传任何用户数据——它明确承诺的是代码(文件内容)只留在本地、不会被上传。他将其描述为一条演进路线:「随着本地模型变得更好,我们将能在本地做得更多。」
fpv Ventures partner Nikunj Kothari (nikunj on X) shares a deep-dive conversation on world models with a founder building Reactor World, walking from what world models actually are, through the path from text-to-3D, to why low latency matters and where world models will grow first.
fpv Ventures 合伙人 Nikunj Kothari(X 上的 nikunj)分享了一场关于 world models 的深度对谈,对象是正在打造 Reactor World 的一位创始人,内容从 world models 究竟是什么,讲到从 text-to-3D 走来的路径,再到为什么低延迟很重要、以及 world models 会最先在哪些领域生长。
PODCASTS
Unsupervised Learning — Ep 89: AI Research Legend's Honest Assessment of Where We Are
The Takeaway: One of the people who built the transformer thinks today's models are genuinely intelligent and yet still missing something fundamental about how humans learn — and nobody can cleanly explain what made coding agents suddenly work.
Łukasz Kaiser co-authored "Attention Is All You Need" and has held senior research roles at both Google and OpenAI, so when he gives a hype-free reading of where AI stands, it's worth slowing down for. His central tension: transformers plus reasoning and tools "can do amazing things," but they learn inefficiently. His memorable framing — riffing on the line that Americans do the right thing after exhausting all other options — is that "LLMs, they will learn the concept... but after exhausting all other options. You need this trillion tokens." Humans grab concepts from far less data, and that gap is why he believes something beyond the transformer might still generalize better.
He's refreshingly candid that the December-Christmas leap in coding agents is hard to attribute: it wasn't one big pretraining run, but some mix of harness changes, light post-training (he singles out Codex's compaction as why he prefers it to Claude Code), and new base models. On productivity, he's concrete: reproducing an old paper that once took three weeks now takes two days, and he's "basically stopped looking at the code" on private projects — while insisting you must hold full control of the machine-learning logic in your head, because agents will silently add an auxiliary loss you never asked for.
His most empowering point is for outsiders: a single ~$2-3K RTX 5090 packs roughly five times the compute of the eight-GPU machines they trained the original transformer on. "You could do all of transformer research on this few thousand dollars GPU under your desk." Long context, meanwhile, is "really" just grep plus files — a hack he'd once have dismissed, except "we don't judge solutions by how they look. We judge them by how they work."
Unsupervised Learning — 第 89 期:一位 AI 研究传奇人物对现状的诚实评估
核心要点: 亲手参与发明 transformer 的人之一认为,今天的模型确实具备智能,却仍然缺失了关于人类如何学习的某种根本东西——而且没有人能清楚说明,是什么让编程 agent 突然就好用了。
Łukasz Kaiser 是「Attention Is All You Need」的合著者,曾在 Google 和 OpenAI 担任高级研究职位,所以当他不带炒作地评判 AI 当前所处的位置时,值得我们慢下来听。他的核心张力在于:transformer 加上推理和工具「能做出惊人的事」,但学习效率很低。他那句令人难忘的比喻——化用「美国人总是在试遍所有其他选项后才做对的事」——是:「LLM 会学到那个概念……但要在试遍所有其他选项之后。你需要这一万亿个 token。」人类能从少得多的数据里抓住概念,正是这个差距让他相信,某种超越 transformer 的东西仍可能泛化得更好。
他难得地坦承,去年圣诞前后编程 agent 的那次跃升很难归因:它不是某一次大规模预训练的结果,而是 harness 改动、轻量后训练(他特别点名 Codex 的 compaction 是他更偏爱它而非 Claude Code 的原因)和新基础模型的某种混合。在生产力上他给出具体数字:复现一篇旧论文过去要三周,如今两天搞定;在私人项目里他「基本上已经不看代码了」——但他坚持你必须在脑中牢牢掌控机器学习层面的逻辑,因为 agent 会悄悄给你加上一个你从没要求的 auxiliary loss。
他最赋能的一点是说给圈外人听的:一块约 2000 到 3000 美元的 RTX 5090,算力大约是他们当年训练初代 transformer 所用的八卡机器的五倍。「你完全可以用桌子底下这块几千美元的 GPU 做所有 transformer 研究。」与此同时,所谓长上下文「其实」不过是 grep 加文件——一个他曾经会嗤之以鼻的 hack,但「我们不以方案好不好看来评判它,而是以它管不管用来评判。」