El País ran a feature this week on the small subset of AI workers in Spain who actually train models. The number that traveled was 151,700 professionals in the broader AI category, of which only about 6,700 build the models themselves. The rest of us — the bulk — deploy, integrate, fine-tune, prompt, evaluate, deploy again. The piece, by Luis Enrique Velasco, profiles a handful of us, including Manuel Romero at Maisa, Rebeca Villalba, Carlos Puerto, and me.
My paragraph in the article was short. The framing it lived inside deserves more room than I had there, because the headline number is interesting in the boring way — there are fewer model trainers than data scientists — and underplays what's actually shifting. Let me unpack what I'd have said with more time.
The ML engineer curve is bending because we use AI to explain AI
Three years ago, when a model emitted a match score, the engineer's job was to reconstruct the reasoning by hand. SHAP values for the feature attributions, an Excel sheet with the historical comparisons, a deck explaining to a stakeholder why the model said what it said. The interpretation work was a human job because the model couldn't explain itself, and "couldn't" wasn't a limitation of effort — it was a limitation of the architecture.
Today a generative layer sits on top of the match. The ranker produces a candidate score; a second model pass translates the score into natural language tied to the underlying features — stack, availability, rate band, comparable hires. The explanation is not a post-hoc rationalization of an opaque ranker. It is wired into the same feature pipeline the ranker uses, has access to the same intermediate signals, and is constrained to produce text that's consistent with the audit trail. When it cannot produce a clean explanation, that itself is a signal worth investigating.
This sounds like a technical detail. It is the entire role transformation.
What changed under the surface
Three things shifted in the last twenty-four months. Any one of them would have justified the change. The combination forced it.
The first shift is the headcount one. Amazon laid off 14,000 engineers earlier this year. Block, 4,000. The pattern across tech is consistent: technical headcount is shrinking while the number of models in production is multiplying. The arithmetic doesn't work without leverage. An ML engineer who was responsible for five models in production two years ago is now responsible for fifty, and the only way that scales is if the interpretation work is largely automated.
The generative explanation layer is the leverage. Without it, every drift incident requires a human to manually reconstruct what the model was doing yesterday. With it, the engineer gets a natural-language diff between yesterday's behavior and today's, prioritized by impact, and can spend their time deciding what to do rather than recovering context.
The second shift is regulatory, even though the calendar moved. The AI Act's enforcement timeline has slipped, with several high-risk articles pushed back. That doesn't change the destination. Every high-risk system — and the matchmaking system Shakers runs falls squarely under Annex III as a personnel selection tool — will eventually need to explain its decisions to stakeholders in comprehensible terms. Not engineers. Stakeholders. Recruiters, candidates, the regulator.
If you're building toward that finish line, the generative explanation layer goes from being a nice-to-have to being legal infrastructure. The teams that have it already are not improvising under deadline. The teams that don't are betting the deadline will keep slipping, and history suggests that's a losing trade.
The third shift is the profile change. The El País piece points at something the feature didn't have space to develop: the ML engineer role is splitting into two. On one end, the hyperspecialized researcher who works on the math and the architectures. On the other, the sector hybrid who understands both the modeling and the domain it serves. The middle — the engineer who built generic models for a generic company — is the part being compressed.
The generative explanation layer is the bridge between those two ends. The researcher can ship work into a system the sector hybrid can operate, because the explanations are produced in the language the domain expert speaks. Without that bridge, the work either stays in research or gets handed across a wall and degrades on the way.
Autopilot vs co-pilot
A model without an explanation layer is an autopilot. It works until the day it doesn't, and on that day, nobody can read the cockpit instruments. We have all seen the screenshots of recruiters arguing with internal scoring systems — "but why did this candidate score lower?" — and the answer comes back as a number with no narrative. That is a failure mode you build a regulatory crisis around.
A model with an explanation layer is a co-pilot. The decision stays with the human. The reasoning is shared. When the model is wrong, the human can identify which part was wrong and correct it; when the model is right, the human can defend the decision when the stakeholder asks. That is the operating mode high-risk AI is supposed to converge to, and the layer that gets it there is the generative explanation pass on top of the ranker — the same one most teams still treat as a nice-to-have.
The ML engineer who survives
Reading my own paragraph in the El País piece, I'd add this: the engineer who survives the next five years is not the engineer who knows more math. It's the engineer who knows how to read traces.
Traces in the literal sense — the structured logs of what the agent did, which tools it called, with what inputs, in what order — but also traces in the broader sense. The patterns that show up across thousands of model invocations. The systematic biases that only become visible at scale. The drift that shows up not in any single output but in the distribution over a quarter.
Reading traces is a skill, and it's a different skill from training models. It overlaps with statistics, sits next to debugging, and feels closer to anomaly detection than to research. It's the skill the next ML engineer needs, and the one that's still mostly taught informally.
What the headline missed
The 6,700 number — model builders — is the one that's headline-friendly. The 145,000 number — everybody else in AI — is the one that's operationally important.
Spain doesn't lose the AI race because we have too few researchers. We have respected researchers. We lose it, if we lose it, because the 145,000 people doing the deployment, integration, and evaluation work aren't connected to a system that produces clear specifications, runs against shared evals, and ships explanations the legal team can read. The bottleneck is industrial discipline, not talent density.
The good news is that this is a fixable problem, and the kind of work it requires — eval infrastructure, observability tooling, sector-specific evaluation datasets, regulatory documentation — is exactly the work the AI Act will force into existence over the next twenty-four months. Companies that invest now will be operating with a stack the law is converging toward. Companies that wait will be retrofitting.
The 6,700 builders matter. The 145,000 operators matter more for what gets shipped this decade. I'd like the next El País piece to count them too.