AI token costs are no longer just a rounding error, as the use of agentic coding tools and enterprise AI workflows expands from simple prompts to complex, multi-step inference tasks. According to Goldman Sachs Research, agentic AI could lead to a 24-fold surge in token consumption by 2030, reaching a staggering 120 quadrillion tokens per month as both consumers and businesses increasingly adopt AI technologies. The recent analysis by the bank suggests that this trend could potentially enhance the economics of hyperscalers and model providers if the costs of inference continue to decrease at a faster rate than the demand rises.
As AI token costs transition from being a mere buzzword to a significant line item in budgets, customers are facing challenges in managing their expenses. Agentic tools have the ability to repeatedly call models, analyze context, generate code, perform checks, and refine their own outputs, transforming a single developer request into a series of token-consuming operations. Consequently, token-based billing is now a practical concern for engineering organizations rather than just a matter related to cloud infrastructure.
Uber serves as a prominent example of a company grappling with the impact of escalating AI costs. Reports emerged that Uber had already exhausted its AI budget for 2026 within the initial months of the year. Andrew Macdonald, the President and COO of Uber, mentioned that the company has yet to establish a clear correlation between increased token consumption and the delivery of more valuable consumer-facing features. While the tools are not deemed ineffective, the cost-benefit analysis is no longer straightforward.
Within Microsoft, a similar issue is surfacing within its engineering divisions. The company is reportedly phasing out most internal Claude Code licenses within its Experiences + Devices group and directing developers towards GitHub Copilot CLI by the end of June. Additionally, GitHub has revealed plans to transition Copilot subscriptions to a usage-based billing model starting from June 1, 2026, with GitHub AI Credits being consumed based on token usage across input, output, and cached tokens.
The perspective from Goldman Sachs is not entirely pessimistic. The firm anticipates that semiconductor providers will reduce the cost of inference per token by 60% to 70% annually through enhancements in chip technology and architecture. Moreover, the supply of chips is expected to remain constrained for the next 12 to 18 months as production capacity catches up with the rapid adoption of new AI applications.
As agentic AI becomes the default interface for coding, customer service, search functions, and enterprise workflow automation, the focus shifts back to datacenter infrastructure such as silicon, networking, memory, storage, and power. Agentic AI is already driving the development of new processor strategies for AI datacenters, as highlighted in previous reports by eeNews Europe when ARM outlined its datacenter CPU plans. While AI token costs may decrease at the hardware level, the intricate workflows of agentic AI can swiftly consume these savings before they are realized by finance teams.