Stop freaking out about token costs and do this instead

By Michael Domanic, Section Head of AI

We're officially in the Hangover Stage of enterprise AI.

The Claude invoices have arrived. CFOs are asking questions - mostly “How the f*@# did we spend this much on inference?” Inference management is suddenly its own software category.

What's getting lost in the panic: token costs are not the problem. Token costs without business value are the problem. And most companies have done only 1% of the work required to drive business value from AI. They gave everyone access to the tools, said “go figure it out,” and are now shocked that the bill is high and the ROI is unclear.

If that sounds familiar, the answer isn't to kneejerk cut the budget. It's to go back and do the work you skipped.

1. Decide what you're using AI for and tell the organization

This sounds obvious and almost nobody does it. Most companies rolled out AI tools without articulating what they expected people to use them for. The result is that some employees are building genuinely valuable automations, some are using AI to vibe-code broken and pointless tools (many such cases), and some aren't touching it at all - and leadership has no idea which is which.

Get your leadership team in a room and answer these questions: What is the urgent, specific reason for your business to use AI? What key metrics are you trying to move? What does “good” AI use look like at your company?

Communicate that clearly and repeatedly. Every employee should be able to tell you what the company expects them to use AI for - and “everything” is not a good enough answer.

2. Set an AI budget as a percentage of salary

Inference is not a mystery line item. It’s a known cost of doing business. Our recommendation: budget 2-5% of salary for business roles and higher for engineering and technical teams.

For a team of 10 knowledge workers averaging $150,000, that's $30,000 to $75,000 a year - roughly the cost of a fraction of an additional hire. The question then becomes simple: are you getting that fraction of an additional hire's output back from the team? If yes, the math works.

Set this budget with the explicit understanding that after 12 months, each team leader will need to justify it with evidence of value - or lower it. Evidence of value at that point will look like a hiring freeze or meaningful change to department-level KPIs. That creates a forcing function for measurement without killing experimentation in the meantime.

3. Set token caps

For the love of God, set token caps. This is basic cost hygiene, and a shocking amount of companies are not doing it. You're not trying to restrict usage - you're trying to prevent a misconfigured automation from running every 15 minutes and racking up thousands of dollars in inference on something nobody is using.

How you do this depends on how your employees are accessing AI in the first place.

If your organization has an enterprise license to a frontier LLM, this is straightforward. Every major AI platform gives you admin dashboards with per-user and org-level spending controls. Use them.

If your workforce is accessing AI through a home-grown portal or internal harness, you have less out-of-the-box tooling, but the answer isn't to give up on cost control. Implement a gateway layer if you haven't already. An LLM gateway sits between your employees and the underlying models and gives you the ability to set per-user and per-team spend limits, rate limits, and model routing rules.

If your employees are on team-based plans or small-batch subscriptions, your cost control options are genuinely limited, and that's a real problem. Those subscription tiers are designed for individuals and small teams, not enterprise governance. If you're trying to manage AI spend at scale across a department or a company and you're still on fragmented individual licenses, the guidance is simple: consolidate onto an enterprise license. The admin controls alone justify it before you even get to volume pricing.

Set per-user caps, set organizational caps, and review them monthly. If people are consistently hitting their caps, that's a conversation about whether to raise them - not a crisis.

‍4. Educate employees on how token costs actually work

Most employees have no idea that their AI usage costs the company money, let alone how much. They don't know that an agentic workflow in Claude Code burns through orders of magnitude more tokens than a simple chat.

Educate them on the cost implications of chat vs. agents, the benefits and drawbacks of each, and how to build smartly to avoid unnecessary cost. When someone understands that a task running on a schedule consumes real resources, they make better decisions about what to automate and how often to run it.

Educating them on this isn’t punitive - it’s akin to saying, “Please don’t book a flight that costs $1,910 when there’s a $395 flight that gets in an hour later.”

5. Measure before and after on your critical AI projects

You will probably have to wait 12 months to see real movement in your core KPIs because of AI. But smaller, contained projects can prove value much more quickly.

Pick 3-5 initiatives in each department that could be meaningfully accelerated with AI. Define the before state in concrete terms - how long did this process take, how many people were involved, what was the output / success rate. Then measure the after. These operational improvements show up fast and they're hard to argue with.

At Section, we deployed a set of agents around sales discovery call quality. The lagging indicators - close rate and contract value - won't be clear for a while. But the leading indicators are already visible: discovery scores are improving, AEs are capturing more information earlier, and we're disqualifying bad-fit prospects faster. We can trace those improvements to a specific set of agents with a specific inference cost. That's the kind of evidence that satisfies a CFO while the bigger numbers catch up.

6. Track token spend and AI fitness in parallel

Alongside token spend, track how individuals are using AI - not just how much, but how deeply and effectively. At Section, we measure AI fitness for every employee across several dimensions: depth and complexity of conversations, use of advanced capabilities like agents and automations, number of agents and automations built, whether those tools are shared across the organization, etc.

When you have this data, you can have a more informed conversation about token spend and the value it’s driving for the business. You can say: “I notice you’ve built 10 agents this month - can you tell me more about what those do?” Or: “I saw your agent is being used across the org - let’s make sure it gets maintained and centralized.” This enables more targeted interventions than, “You’re all spending too much, cut it out.”

7. Accept that this is a 12-month learning curve

The honest truth is that nobody has fully figured this out yet. We're all in the experimental phase, and the companies that build measurement systems alongside their AI investments will be in a much stronger position than the ones that either panic and cut, or spend blindly and hope.

The worst outcome is cutting budgets before you've built the visibility to know what those costs were producing. Set the strategy, set the budget, set the caps, educate your people, measure your projects, track your outliers - and give yourself a year to see what the data actually says before making any major calls.

See you next week,

Michael

Your fellow Head of AI

‍