I was scrolling through a professional forum last week when I found a confession that made me laugh out loud, then immediately stop laughing.
An Amazon engineer wrote: "Whenever a project manager says something stupid, I launch ten AI agents to deeply research and analyze him. I paste our entire Slack history into the system and let it run wild. It's an excellent use of compute resources."
At first I thought it was just workplace toxicity dressed up as a joke. But the comments revealed something worse: it wasn't a joke. It was revenge by KPI.
Amazon had recently deployed an internal AI coding assistant called MeshClaw. Management, in their wisdom, set a hard target: 80% of developers must use it weekly. But they didn't stop there. They built a real-time leaderboard tracking the exact number of AI tokens each employee consumed. The more tokens you burned, the higher your rank.
The result was instant and completely predictable. Engineers started feeding massive, completely irrelevant documents into the AI—old meeting transcripts, random Wikipedia pages, their grocery lists—just to watch their token consumption skyrocket. They even gave it a name: "Tokenmaxxing."
Most executives read this story and blame the employees. "They're lazy! They're gaming the system!"
They're wrong. The employees aren't the problem. The architecture of the management system is the problem. The moment you introduce a leaderboard for a process metric, you trigger one of the oldest traps in human organization. And in the AI era, that trap is spinning faster than ever.
Goodhart's Law and the Useless Nails
There's a formal name for this trap, coined by a British economist: Goodhart's Law. It states: "When a measure becomes a target, it ceases to be a good measure."
If you want the visceral version, look at the Soviet nail factory. Under the planned economy, a factory manager was given a quota based on the weight of nails produced. So the factory churned out a small number of massive, heavy nails—useless to builders, but glorious on the scale. Management caught the error and changed the quota to the number of nails. The factory immediately pivoted to producing millions of tiny, microscopic pins. Also useless. But hey, the count was through the roof.
When you manage by arbitrary metrics, you get exactly what you asked for. And you completely destroy the actual product in the process.
Amazon's token leaderboard is just the Soviet nail factory with better UI.
The $1.4 Million Hallucination
Amazon wasn't alone. Meta had arguably the most absurd implementation.
An internal, unofficial leaderboard called "Claudeonomics" tracked token consumption across 85,000 employees. The top users were crowned "Token Legends." The #1 employee burned through 281 billion tokens in 30 days—roughly $1.4 million worth of API calls.
Management initially celebrated this as "AI adoption." Then a deeper audit revealed the truth: employees were running meaningless, loop-driven AI tasks purely to inflate their numbers. Worse, several live production outages were traced directly back to engineers rushing to deploy low-quality, AI-generated code just to hit their quotas. The leaderboard was quietly dismantled, but the cultural damage was done.
Salesforce did something similar—installing a widget on employee screens that refreshed every 15 minutes, showing their "AI Spend" and demanding they hit a "minimum consumption target." Developers who could have done a two-minute manual search instead forced the AI to read a 50-page technical manual, burning thousands of tokens, just to satisfy the dashboard.
The General Shouldn't Be Counting Heads
To understand why this keeps happening, we need to go back 2,000 years.
During China's Qin Dynasty, a reformer named Shang Yang created a military reward system called "Merit by Decapitation." Foot soldiers were rewarded with land and titles based strictly on the number of enemy heads they severed. Brutal, but highly effective for infantry. It turned the Qin army into a devastating force.
But Shang Yang was smarter than most modern CEOs. He explicitly stated that decapitation metrics only applied to frontline soldiers. Generals were absolutely forbidden from being evaluated by headcounts.
Why? Because a general's job isn't to kill individuals. It's to orchestrate the battlefield, manage logistics, and win the war. If you evaluate a general by how many heads he personally chops off, he'll abandon his strategic post, grab a sword, and start fighting in the mud. He'll win his personal KPI. And he'll lose the war.
Shang Yang understood that the metric must match the responsibility. The measure is a means to an end, not the end itself.
Modern management has forgotten this. We've replaced strategic judgment with dashboard worship. We track what's easy to count—tokens, hours, prompts—instead of what's hard to evaluate: judgment, quality, strategic impact.
Why We Killed the Dashboard at Mercury
At Mercury, we made a decision that sounds radical but is actually just sane: we strictly forbid tracking "Token Consumption," "Prompts Generated," or "Hours Saved by AI" as performance metrics.
I recently read a report about a pharmaceutical company that required every employee to fill out a weekly "AI Results Form," detailing exactly how many hours AI had saved them.
The result was heartbreaking. The engineers were working with highly classified R&D data that couldn't legally be uploaded to external LLMs. So they did the work manually—eight hours of actual coding. Then they spent an extra thirty minutes generating a fake, non-functional AI version of the code, just so they could write "AI saved me 3 hours" on their report.
An employee interviewed said something that stuck with me: "I wasn't actually opposed to using AI before this."
The management system didn't just fail to increase productivity. It actively destroyed the employee's genuine curiosity and goodwill toward the technology. It turned a potentially useful tool into a bureaucratic chore, and it turned honest engineers into liars.
What We Actually Look At
So how do you know if someone is effectively using AI?
You can't look at a dashboard. You have to look at the actual work.
Did the product manager ship a higher-quality competitive analysis in three days instead of five? Is the code deploying with fewer bugs? Are we closing deals faster? Is the client happier?
These outcomes cannot be tracked on a 15-minute token leaderboard. They require managers to actually engage with the work and evaluate qualitative results. Which is harder than reading a number. Which is why most organizations don't do it.
The Real Soviet Nail
Every era has its version of the useless nail. In the industrial era, it was tonnage quotas producing unwieldy steel. In the knowledge era, it was page-view journalism producing clickbait garbage. In the AI era, it's a massive, useless pile of API tokens burned purely to satisfy a blind executive who thinks consumption equals productivity.
The engineers tokenmaxxing at Amazon and Meta aren't stupid. They're rational actors in an irrational system. They've learned that the path of least resistance is to give the algorithm what it wants—a big number—while quietly preserving their actual sanity.
If you're running a team right now, and you're thinking about setting an "AI adoption target," or tracking "monthly token spend per employee," or building a leaderboard to gamify usage—stop. You're not measuring productivity. You're manufacturing nails that nobody can use.
Stop managing the tokens. Start managing the business.
— James, Mercury Technology Solutions, Hong Kong, May 2026


