The Goldman Sachs Token Warning
Goldman Sachs just sounded the alarm. They call it a 24-fold increase. I call it a structural deficit in the making. According to the latest research released on May 24, the transition from conversational AI to autonomous agentic systems is about to trigger a vertical spike in token consumption. This is not a linear progression. It is a phase shift. The market is currently pricing in steady growth, but the underlying data suggests a supply-side bottleneck that few are prepared to navigate.
Agentic AI represents a fundamental change in how compute is utilized. Unlike a standard chatbot that responds to a single prompt, an autonomous agent operates in recursive loops. It plans. It executes. It self-corrects. Each of these steps consumes tokens at a rate that dwarfs current human-to-machine interactions. If the Goldman projections hold true, the global compute infrastructure will need to support a 2,400 percent increase in throughput by the end of the decade. This is the math of the new economy.
The Token Tax and the Death of SaaS
The traditional software-as-a-service model is dying. It is being replaced by the Token Tax. In a world of agentic workflows, value is no longer measured by seats or licenses. It is measured by the volume of inference. Per reports from Bloomberg Markets, enterprise spending is shifting away from fixed-cost subscriptions toward variable-cost compute models. This creates a massive arbitrage opportunity for those controlling the hardware. The cost of doing business is now tied directly to the efficiency of the underlying large language model.
Consider the technical overhead of a simple task like supply chain optimization. In 2024, a human might prompt a model to analyze a spreadsheet. In 2026, an agentic system monitors real-time shipping data, predicts weather disruptions, and automatically renegotiates contracts with logistics providers. This requires thousands of internal ‘thinking’ tokens for every one ‘output’ token seen by the user. The hidden consumption is where the 24-fold growth resides. We are moving from a world of ‘Ask and Receive’ to ‘Assign and Automate’.
Visualizing the Exponential Token Demand
Projected Global Token Consumption Growth (2024-2030)
The Infrastructure Wall
Data centers are hitting the wall. The power requirements for 24x token growth are staggering. In Northern Virginia, the world’s largest data center hub, utility providers are already struggling to meet the 2026 demand. The shift to agentic AI requires high-density cooling and massive power draws that legacy facilities cannot provide. We are seeing a divergence in the market between companies that own their power supply and those that rely on the public grid. The former will survive the surge. The latter will face crippling latency and cost spikes.
The technical constraint is not just the GPUs. It is the copper. It is the transformers. It is the physical reality of moving electrons. As Reuters Tech noted in their weekend analysis, the lead time for industrial-grade electrical equipment has stretched to 36 months. Goldman’s 2030 target assumes we can build the physical world fast enough to keep up with the digital one. That is a dangerous assumption. If the grid fails to scale, the token economy will face its first major deflationary event, not because of lack of demand, but because of physical scarcity.
Comparative Efficiency Analysis
The following table illustrates the disparity between legacy LLM interactions and the new agentic standard. The token multiplier is the primary driver of the Goldman Sachs forecast.
| Task Category | Legacy LLM (Tokens) | Agentic Workflow (Tokens) | Multiplier |
|---|---|---|---|
| Code Debugging | 800 | 15,000 | 18.7x |
| Market Research | 2,500 | 65,000 | 26.0x |
| Customer Support | 400 | 8,500 | 21.2x |
| Logistics Planning | 1,200 | 42,000 | 35.0x |
The Governance Gap
Who audits the agents? As token consumption scales, so does the risk of recursive errors. An autonomous agent that enters an infinite loop can burn through thousands of dollars in compute budget in seconds. This is the new ‘Flash Crash’ risk. Financial institutions are already implementing ‘circuit breakers’ for their AI agents to prevent runaway token consumption. The complexity of managing these autonomous entities will create a new sub-sector of the economy: AI Governance and Observability.
We are seeing the emergence of ‘Inference Insurance’. Companies are literally hedging against the volatility of token prices. If a major model provider raises prices by 10 percent, an enterprise consuming billions of tokens faces a multi-million dollar budget hole. The financialization of compute is no longer a theory. It is the current reality of May 2026. This is why the Goldman report is so critical. It provides the baseline for the next four years of capital expenditure and risk management.
The next milestone is the release of the Q3 energy consumption reports from the major cloud providers. Watch the ‘Energy-to-Inference’ ratio. This single data point will determine which hyperscalers can actually fulfill the 24-fold growth promise without collapsing their margins or the local power grid.