The Quiet Oligopoly

Three companies. Roughly 90% of frontier inference revenue.

OpenAI, Anthropic, and Google now serve the vast majority of production API calls for models that score above the 85th percentile on major benchmarks. The fourth-largest provider, Mistral, holds an estimated 4% market share. Everyone else is smaller.

This is not a conspiracy. It is a market structure. And it arrived quietly.

How we got here

Training a frontier model costs hundreds of millions of dollars. The compute requirements double roughly every eight months. The talent pool is finite: a few thousand researchers worldwide have the experience to build models at this scale. The capital requirements, talent concentration, and infrastructure demands created a natural oligopoly before the market was old enough to have a regulatory framework.

The pattern is familiar. Telecommunications consolidated in the early 2000s. Four carriers now serve 98% of US wireless subscribers. The economics were similar: massive infrastructure costs, network effects, and regulatory barriers that favored incumbents. The result was predictable. Prices stabilized at a level the oligopoly found comfortable. Innovation slowed to the pace the incumbents chose. New entrants found the barriers impassable.

AI inference is following the same curve, but faster.

What "frontier" means

This analysis concerns frontier models specifically: systems scoring above the 85th percentile on benchmarks like GPQA Diamond (graduate-level reasoning), MATH-500 (mathematical problem-solving), SWE-bench Verified (real-world software engineering), and LMSYS Chatbot Arena (human preference ranking).

Below the frontier, the market is more diverse. Open-weight models from Meta, Mistral, and others serve a significant volume of inference for tasks where near-frontier quality is sufficient. Self-hosted Llama deployments handle classification, summarization, and extraction workloads competitively.

But the highest-stakes applications run at the frontier. Medical decision support. Legal analysis. Financial modeling. Code generation for production systems. These use cases demand the best available model, and the best available models come from three providers.

The concentration numbers

The Herfindahl-Hirschman Index (HHI) is the standard measure. The US Department of Justice considers any market with an HHI above 2,500 to be highly concentrated.

Estimated HHI for frontier AI inference: approximately 2,900 to 3,100, depending on how Meta's indirect contribution (open weights hosted on others' infrastructure) is allocated. For context, US cloud computing sits around 2,000 to 2,200. US wireless telecommunications is approximately 2,700.

Frontier AI inference is more concentrated than cloud computing and comparable to wireless carriers.

Who is missing

Meta releases Llama, among the best open-weight model families in the world. But Meta does not sell inference. It distributes weights. The inference runs on infrastructure operated by Microsoft Azure, Google Cloud, AWS, and smaller hosts. Meta's strategy reduces model concentration but does nothing for inference concentration. The same three companies' data centers process the requests.

Mistral builds excellent models from Paris. Mistral Large 2 is competitive with GPT-4o on several benchmarks. But Mistral's commercial distribution is a fraction of the big three's. Good engineering does not guarantee market share when the incumbents have distribution advantages built on existing cloud relationships.

Cohere pivoted to enterprise retrieval and search. The general-purpose frontier inference market lost a competitor.

The field is thinner than the narrative suggests. AI discourse is crowded. The actual market is not.

The regulatory gap

The EU AI Act, which began enforcement in 2025, regulates AI model providers based on risk classification. High-risk systems face conformity assessments, transparency obligations, and ongoing monitoring requirements. The Act addresses what models do.

It does not address who controls inference infrastructure. Market concentration in AI inference falls into a gap between AI regulation and competition law. The EU's Digital Markets Act targets platforms, not API providers. US antitrust enforcement has not designated any AI inference provider as a dominant firm.

No regulator is currently monitoring the concentration of frontier AI inference as a market structure problem. The tools exist. The HHI thresholds exist. The precedent from telecom regulation exists. The political will does not.

The open-weight caveat

This analysis has one significant qualifier. Open-weight models could break the oligopoly if two conditions are met: inference infrastructure becomes cheap enough for most organizations to self-host, and open models reach consistent frontier quality.

Neither condition is met today. Self-hosting a 400-billion-parameter model requires specialized hardware, operational expertise, and ongoing cost that most organizations cannot justify. Open models perform well but remain 6 to 12 months behind the frontier on the hardest benchmarks.

Both conditions could be met within two to three years. Hardware costs are declining. Open model quality is improving. If the trajectory holds, frontier inference could decentralize the way web hosting did after cloud computing commoditized basic compute.

But trajectories do not always hold. The incumbents are investing billions in maintaining their lead. The gap could widen instead of narrowing.

What this means

The companies that control frontier inference control the terms under which AI-dependent products operate. Pricing, acceptable use policies, rate limits, data handling practices, and availability guarantees are all set unilaterally by three providers.

A startup building on OpenAI's API has no meaningful alternative if OpenAI changes terms. A hospital using Anthropic's Claude for clinical decision support has no comparable fallback if Anthropic raises prices or restricts medical use cases. A government agency using Google's Gemini for document processing faces vendor lock-in that compounds with every month of accumulated context and fine-tuning.

This is not hypothetical. OpenAI has changed its pricing, terms, and model availability multiple times since 2023. Each change affected thousands of downstream businesses that had no input into the decision and no viable alternative.

The question is not whether concentration is bad. Markets concentrate for reasons, and those reasons often include genuine efficiency gains. The question is whether anyone is watching.