BREAKING: Benchmark's AI Playbook

Death of Spreadsheet Investing

Jun 30, 2026

The Golden Rules Are Gone

Everett “Ev” Randle, General Partner at Benchmark, joins Sourcery to break down how AI has rewritten the rules of venture and growth investing.

→ Listen on X, Spotify, YouTube, Apple

Ev explains why the old inverse relationship between scale and risk has collapsed, why the golden rules of SaaS now run in reverse, and how he underwrites AI companies when the spreadsheet no longer does the work.

We cover the new AI taxonomy through a P x Q x M lens, the economics of inference and agents, Claude Code at $36,000 per developer, the AI mom test for frontier versus open source demand, and the trillion-dollar question of whether frontier labs hold pricing power. Ev also walks through the coming liquidity shock from the $380B Anthropic round, why late-stage can now beat a Series C, and how venture firms became alternative asset managers.

Benchmark's AI portfolio spans Cerebras, Sierra, Fireworks AI, Legora, Mercor, HeyGen, Decart, Manus, Gumloop, Exa, StarCloud, Reducto, Eigen, 11x, and LangChain

𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒

(00:00) Everett Randle, General Partner at Benchmark
(00:58) Coming off Benchmark's AGM
(04:05) The Golden rules of Investing are all gone
(08:58) Who actually has a handle on AI Economics?
(12:57) Why Benchmark bets on Founders, not Categories
(15:31) Brad Gerstner's "Age of Inference" Thesis
(19:07) The most important shift since the Cloud
(23:55) Inside Gumloop's AI automation canvas
(26:40) The Token Maxing problem nobody's solving
(27:25) Ev's "Mom Test" for frontier AI
(31:33) What happens when frontier models get too cheap
(34:44) Inside the new funding playbook
(40:31) Venture Capital became a product, not a firm
(42:35) The Biggest IPO wave Wall Street's ever seen
(49:08) The secret behind Benchmark's wildly diverse bets
(52:54) The mentors who shaped him

Brought to you by:

Brex—The intelligent finance platform: cards, expenses, travel, bill pay, banking—wrapped into a high-performance stack. Built for scale. Trusted by OpenAI, Anthropic, Vercel, Granola, Deepgram, & Sourcery.. teams that move fast AF. visit → brex.com/sourcery

Turing—Turing partners with frontier AI labs to improve model capabilities in coding, reasoning, tool use, & multimodality, as well as with Fortune 500 enterprises to build & deploy end-to-end agentic AI systems in mission-critical workflows Visit: turing.com/sourcery

VCX—VCX is the public ticker for private tech, allowing investors of all sizes to invest in venture capital. View The Portfolio at GetVCX.com

Deel—Deel is the global people platform that helps startups hire, manage, pay, and equip anyone, anywhere. Trusted by more than 35,000 fast-growing companies, Deel is the people platform that just works, so teams can scale without the chaos. Visit: deel.com/sourcery

Public-–Investing platform Public just launched Generated Assets, which lets you turn any idea into an investable index with AI. With Generated Assets, you can build, backtest, refine, and invest in any thesis with AI. Gone are the days of one-size-fits-all ETFs. Try it today: public.com/sourcery

Merge—The leading provider of customer-facing integrations and agentic tools for frontier LLMs, Fortune 500 organizations, and B2B SaaS companies. Visit: https://merge.dev

The Death of Spreadsheet Investing: Inside Benchmark’s New AI Playbook

Ev Randle, General Partner at Benchmark, joined Sourcery to lay out how AI has broken the underwriting logic that governed software investing for two decades. The conversation moved from why the market feels disorienting, to the new taxonomy he uses to evaluate AI companies, to the liquidity shock coming from a pipeline of trillion-dollar IPOs.

Benchmark’s AI portfolio referenced across the conversation includes Mercor, HeyGen, Decart, Manus, Gumloop, Exa, StarCloud, Reducto, Eigen, Cerebras, 11x, Sierra, Fireworks AI, Legora, and LangChain.

→ Listen on X, Spotify, YouTube, Apple

The Disorientation: Scale No Longer De-Risks a Company

Randle’s starting point is a single inverted relationship. In the prior software paradigm, scale and the risk of mega-impairment moved in opposite directions. As a startup grew, it sequentially de-risked product market fit, then unit economics, then total addressable market, then market leadership. If a company failed to clear those gates, it stopped growing. Bigger therefore meant safer.

AI broke that relationship. “You can have businesses that are well over a billion dollars in revenue that haven’t proven out their unit economics,” Randle said. “The risk of impairment over time is actually sort of flat, or even maybe there’s a weird positive correlated relationship with scale and risk.”

The practical consequence is that investors can no longer read scale as a proxy for safety. A company past $1B in revenue may still carry unresolved questions on margins, durability, and differentiation that scale used to answer on its own.

What Happened to the Golden Rules?

For roughly 20 years, the most attractive software companies shared a predictable profile to measure ‘greatness’ off of: gross margins of 70% to 90%, pure software with minimal services load, high gross retention, low capital intensity, and operating leverage on R&D.

A company with 100 customers and 90%-plus gross retention would lose fewer than 10 of them a year. The output was a capital-light business that generated high free cash flow margins growing at a multiple of GDP for a long time.

That profile was legible and, in Randle’s word, “spreadsheetable.” It produced shorthand metrics like rule of 40, where growth rate + free cash flow margin could compress company quality into a single score. “Sometimes you could literally abstract the quality of a software company into a single metric, and some investors would invest just on a rule-of-40 score,” he said.

AI Inverted Nearly Every Rule

The most popular AI companies & categories now look almost like the exact inverse of their expected technology investment profile.

Distribution shifted from product-led growth to forward-deployed engineers, what Randle called “the Palantirification of everything.” That reintroduces services and implementation load that pure-software orthodoxy treated as a defect.

Gross margins flipped sign. “Now gross margins, if your gross margins are high, that’s actually a bad thing, because AI inference costs a lot of money, & if you have an AI product with high gross margins, that means that no one’s using your AI features.”

Capital intensity returned. To stay defensible against the foundation model labs, companies are increasingly expected to train or post-train their own models, which requires GPU spend that the old software model never carried. “All of the golden rules of the past that defined the spreadsheet investing era are all gone,” Randle said. “The most popular companies and categories are almost the inverse of all these golden rules.”

His point is not that the golden rules were wrong. In a vacuum they remain the first-principles building blocks of a high-quality company, since durable free cash flow over a long horizon is still what earns a high valuation. The difficulty is that the most popular AI companies look less attractive on those first principles than classic SaaS, which forces a rethink of how to underwrite them.

The New Taxonomy: “P” x “Q” x “M”

Randle’s organizing framework is P x Q x M, where P is price, Q is quantity, and M is margin. In SaaS, P was annual contract value, Q was the customer base or addressable market, and M was a gross margin of 70% to 90%.

In AI, the 3 variables move in different directions. Q, the quantity of customers, is roughly the same population that would buy SaaS. M, the margin, is almost always lower. P, the price, can be far higher, because inference platforms now sign nine-figure contracts with startups, a contract scale that was rare in SaaS even with the largest enterprises. In Randle’s words:

“In SaaS land, you had price was your ACV, what is the annual contract value of your contracts that you sell to customers. Your Q is how many customers either do you have or that are in your TAM, and then your M was gross margin, which was 70 to 90%.
Well, now in AI, if we take an AI app company, the Q is probably the same, you’re selling to the same people that would buy a SaaS. The M is almost definitively lower, for I think 99% of AI app companies it’s lower than 70%. But the P can be immensely high.
You have these inference platforms that have nine-figure contracts with startups. There’s very rare SaaS companies that have nine-figure contracts with anyone, much less a startup.”

He also stressed that AI companies are far less alike than software companies were. Robert Smith of Vista Equity Partners, where Randle started his career, used to tell the team that “software tastes like chicken, and that’s why it’s beautiful,” because every software P&L converged on the same line items at maturity. That convergence no longer holds.

Randle contrasted two Benchmark-adjacent inference names to make the point: Fireworks leases inference capacity & GPUs and monetizes the software layer that reduces cost and latency on top, while Crusoe builds data centers, acquires power, land, and permits, and operates a different business model with a different margin profile and capital intensity entirely. From the outside they look like 2 inference companies, however, they are more different than alike.

Inference Is the Demand Engine

Randle’s framing for the current market is to “walk under the waterfall.” Working with a portfolio company that had strong developer usage but no settled business model, he told them to stop sitting on the riverbank trying to fill a bucket and instead get under the waterfall, because the waterfall is inference. The volume of revenue and demand flowing out of inference, he argued, cannot be ignored.

He traced the new growth curves, companies going from 1 to 20 to 100 rather than the old 1 to 3 to 9 to 20, back to inference-enabled business models. Instead of charging a dollar amount per seat, companies now charge a margin on the inference they resell. Some models are thin, closer to brokering inference. Others abstract it heavily, such as Sierra’s outcome-based pricing on completed customer support deflections. In both cases the monetized unit is inference, which removes the rate limiter on revenue growth.

The clearest data point came from coding agents. Randle said developers were spending $3,000 per month each on Claude Code, or $36,000 per developer per year. In SaaS, a $50,000 ACV was a solid overall contract. Now that figure can recur per developer and keep growing. He framed the shift from a $200K line item for the average company toward a potential $20M line item, with some customers far higher.

He named the agent economy as the most important product and business-model shift in technology since the start of SaaS and the cloud, while noting the term itself has been over-marketed to the point of being “cooked.”

Frontier vs Open Source: The AI “Mom” Test

To gauge where frontier demand holds, Randle applies what he calls his AI mom test. The question is how many tasks a non-technical user actually needs from a frontier or near-frontier model rather than a cost-effective open-source one. Two years ago the answer was unclear. Today, he said, essentially none of his mother’s queries require the frontier (damn). Layering that logic across user types reveals a growing share of economy tasks that do not need frontier intelligence.

Molly O’Shea@MollySOShea

The best way to understand the open-source vs. frontier AI debate: @EverettRandle says ‘Your mom.’ Ev explains his "AI Mom Test" "What is the amount of queries or things that she needs out of AI that can't be done by a really, really cost-effective open-source model?"

Molly O’Shea @MollySOShea

NEW: The Death of Spreadsheet Investing. Inside Benchmark's New AI Playbook Everett Randle (@EverettRandle), GP at @benchmark Why every golden rule of SaaS just got inverted & how AI is rewriting venture investing: › You can now pass $1B+ in revenue with unproven unit

11:48 PM · Jun 29, 2026 · 3.26K Views

2 Replies · 3 Reposts · 9 Likes

He cited Cognition’s published work, which post-trained an open-source model on low-complexity tasks that were popular inside the product but did not require frontier capability. Moving those tasks off the frontier produced roughly 95% savings on that action.

The conclusion is not that the frontier loses. Demand for frontier intelligence is also growing, and Randle attributed the parabolic growth in Claude Code and Anthropic revenue to a genuine breakthrough in coding model quality around Opus 4.5.

sourcery@sourceryy

"My guess is that within 12-18 months, 80% of our workloads will be going toward models that are 99% cheaper." "20% of it will still go to frontier models where you need to be IQ-maxxing" @brian_armstrong explains how to keep AI spend flat while token usage grows exponentially:

Brian Armstrong @brian_armstrong

How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching. Better Defaults (not Usage Caps) – Engineers can choose any model they want, but defaults matter. We’re experimenting with defaulting

2:28 AM · Jun 27, 2026 · 8.61K Views

4 Replies · 5 Reposts · 35 Likes

Citing his partner Eric Vishria, he framed the market as non-zero-sum: on-device inference, open-source inference, and proprietary models all show rising demand at once. The open question is pricing power. If capabilities keep climbing toward recursive self-improvement, frontier labs retain the ability to charge a premium. If capabilities plateau and distillation lets open source reach 95% of the ceiling, premium margin compresses, even if the labs survive on product strength, since most of ChatGPT’s 900 million weekly active users could not identify which model they are using.

*DISCLOSURE: This could also be your dad, your aunt/uncle, your dog/cat/fish, or your grandparents, roommate or baby. We are well aware of some hard core tokenmaxxing moms out there. Damn Ev.

Liquidity Shock: Reframing the Returns Maths

Randle’s most concrete data exercise concerned the coming IPO wave. He charted 4 of the best pre-IPO rounds of the past decade, Slack, DoorDash, Snowflake, & Nubank. Those rounds were typically $500M to $2B in size and returned 2x to 5x over a 4-year period. The Snowflake pre-IPO round turned roughly $500M into about $2.5B, an outcome that left everyone involved satisfied.

He then applied the same math to Anthropic’s $380B round. If Anthropic reaches liquidity at $1T to $1.5T, a valuation he characterized as not particularly aggressive given the round priced near $1T, the $30B raised in that round would return roughly 35 times the Snowflake pre-IPO round. “It’s 35 Snowflake pre-IPO rounds in a single round,” he said.

The scale is hard to internalize. 5 years ago, before COVID inflated fund sizes, a normal growth fund was about $1B. Randle noted he knows individuals with $3B to $4B invested in Anthropic who could return 5x in under 5 years.

He flagged the second-order effects:

new company formation
reinvestment decisions
the San Francisco housing market, where he said homes are going for 2x asking price in cash or in lab equity (!!)

The unknown caution is that the ecosystem is not priced for the liquidity this pipeline will release.. and no one knows what will result because of it.

However, this also tied the wave to two structural shifts:

Companies stay private far longer, supported by deep venture-growth capital, so businesses that would have been public for years by 2005 standards remain private
AI has introduced large day-one costs, where a research direction might require $2B of compute before product market fit is even testable, a different funding equation from a $500K seed.

The combination produces what he called company rebirths, where a late-stage company can carry higher upside than a Series C. His own first investment at Kleiner Perkins was SpaceX at a $100B-plus valuation, where the Starlink broadband business, now the majority of revenue in the S-1, became a second growth engine that justified entry at a triple-digit-billion price.

Founders Over Themes

Against that backdrop, Randle described Benchmark’s strategy as entrepreneur-out rather than theme-in. The firm partners early, often at inception around a $50M post valuation, which sidesteps many late-stage questions about exit multiples and capital efficiency. “Great founders are always in style,” he said, “whereas these business models can go in and out of style.”

That discipline shows in a portfolio that looks thematically diverse but was not assembled thematically:

Semiconductors: Cerebras
Vertical AI: Legora
Horizontal: Sierra
Developer: LangChain
Prosumer: HeyGen
Orbital data centers: StarCloud

But, he said that coherence was retrospective. The investments were founder-driven, several were pivots, and some landed in categories the firm had not set out to back. Chetan Puttagunta closed the StarCloud investment weeks before Elon Musk began publicly championing orbital data centers, on the strength of a team that already had a GPU operating in space, not on a category call.

Molly O’Shea@MollySOShea

Benchmark missed OpenAI, Anthropic, & SpaceX.. yet still built one of the strongest AI portfolios: Semis: @cerebras Vertical AI: @WeAreLegora Horizontal: @SierraPlatform Developer: @LangChain Prosumer: @HeyGen Orbital data centers: @Starcloud_ Plus: Manus, Mercor, Decart,

Molly O’Shea @MollySOShea

1:23 AM · Jun 30, 2026 · 259 Views

1 Reply · 1 Repost · 3 Likes

Gumloop, Randle’s first Benchmark deal, fits the same pattern. It is a collaborative AI agent and automation canvas for enterprises, letting every employee, not only developers, build and run agents across functions. The thesis is that what happened in code, where Opus 4.5-class agents let developers offload a bulk of their work, extends to most white-collar and eventually blue-collar functions.

Randle also argued for the value of an independent third-party vendor in a world where models are jagged and good at different things, so customers need a router watching spend rather than checking the weather with a frontier model.

Venture Capital as a “Product,” Not a “Firm”

Randle closed on a distinction the industry has not kept up with linguistically. Many firms still described as venture capital have become alternative asset managers. General Catalyst, Thrive Capital, and Andreessen Horowitz run venture, growth, debt, and additional products, and Randle put a name to the shift. “Venture in many ways is still the same, but it’s a product now for many of these firms. It’s not the firms themselves.”

The PitchBook data shows how concentrated the capital behind that shift has become. In Q1, 73.1% of LP commitments went to 5 firms, and 6 megafund managers absorbed 76.2% of the quarter’s capital. When a small group of multi-product managers takes the bulk of new commitments, the economics of the business tilt toward gathering and deploying assets at scale rather than the small, early, concentrated bets that defined classic venture.

The deal side mirrors the same crowding. PitchBook put Q1 deal value at $267.2B, but stripping out the top 5 deals cuts that figure by 73.2%. AI took 88.8% of the dollars on 42.5% of deals. When 5 companies effectively are the market, the asset-gathering model and the venture model start to look like different businesses wearing the same label.

Benchmark sits outside that structure by design. It remains a small equal partnership rather than a multi-product manager, and it does not appear among the megafunds absorbing the quarter’s commitments. PitchBook has described the current environment as an era of consensus deals, with dry powder piling into the same perceived winners. Randle’s argument is that staying small & early is a feature, not a gap. The firm does not need to rebrand around whatever category is in style, because the constant it underwrites is the founder, & a concentrated early-stage partnership is built to make non-consensus bets precisely when capital is crowding the same names.

→ Listen on X, Spotify, YouTube, Apple

The material presented on Molly O’Shea’s website are my opinions only and are provided for informational purposes and should not be construed as investment advice. It is not a recommendation of, or an offer to sell or solicitation of an offer to buy, any particular security, strategy, or investment product. Any analysis or discussion of investments, sectors or the market generally are based on current information, including from public sources, that I consider reliable, but I do not represent that any research or the information provided is accurate or complete, and it should not be relied on as such. My views and opinions expressed in any website content are current at the time of publication and are subject to change. Past performance is not indicative of future results.

Paid Endorsement. Brokerage services by Open to the Public Investing Inc, member FINRA & SIPC. Advisory services by Public Advisors LLC, SEC-registered adviser. Crypto trading provided by Zero Hash LLC, licensed by the NYSDFS. Generated Assets is an interactive analysis tool by Public Advisors. Output is for informational purposes only and is not an investment recommendation or advice. See disclosures at public.com/disclosures/ga. Matched funds must remain in your account for at least 5 years. Match rate and other terms are subject to change at any time.

Discussion about this post

Ready for more?