Independent or Current

Europe built the most ambitious AI enforcer in the world. It still has to ask the labs how to grade them.

Jun 18, 2026

In 2024,, the European Commission went looking for someone to evaluate the world’s most powerful AI models. The post was the lead scientific adviser to the AI Office, the person who would sit across from OpenAI, Anthropic, and Google and judge whether their frontier systems were fit for placement on the European market. The application window opened, then closed in December. Months later, the chair was still empty.[1]

The pay explains some of it. The Office’s technical roles top out near $120,000, and even its senior posts sit on fixed civil-service scales the labs beat several times over for the same skills, sometimes with seven-figure packages. The European pool for this work is thin enough that the strongest candidates already sit in San Francisco or London. The Office has since hired a small, capable safety team, with people from Oxford, Google, and the UK’s AI Security Institute.[2] But the seat reserved for the scientist who would lead the judging of a frontier model stayed empty as the start date approached. The authority is real. The question is the capacity to use it.

Step back from the hiring, though, and the opposite is just as true. On paper, the European Union has just built the most serious AI enforcer anywhere. The Digital Omnibus, voted through Parliament on 16 June, consolidated oversight of the largest models and AI across the largest platforms into a single office and gave that office the power to vet the highest-risk products before they ship.[3] A Scientific Panel of 60 independent experts was sworn in on 1 June to give it technical muscle.[4] A 174-member Advisory Forum sits alongside.[5] The obligations for general-purpose models have been in effect since August 2025.[6] By any measure of ambition, Brussels has done what Washington spent years declining to do: it built a standing regulator with the authority to test the frontier.

That empty chair is a small sign of a large problem. Europe’s frontier-AI evaluator can be independent or current, but not both, and the same shortage of methods, access, and people produces both limits. To judge the largest models as they exist, the Office has to borrow the labs’ tools, access, and talent, at which point it is grading the labs’ own homework. To build its own capacity instead and stop borrowing, it has to move at the speed of public hiring and public standard-setting, at which point whatever it certifies is a snapshot of a live model it has already outrun. The competence to evaluate a frontier model in real time exists almost entirely inside, or priced by, the companies being evaluated.

The homework problem

Start with what the law asks of a frontier developer, because it is less than most readers assume. A provider of a general-purpose model with systemic risk must run its own state-of-the-art evaluations, assess and reduce the risks it identifies, document all of this in a Safety and Security Model Report, and submit that report to the AI Office before the model reaches the market.[7]

The lab runs the assessment, writes it up, and sends the file; the Office reads it.

External evaluation exists, but its shape tells you who is in charge. The Code of Practice that fills in the details requires a model developer to give independent, external evaluators access to its most advanced versions and to publish the standards by which it selects them. The provider grants the access, and the provider sets the bar for who qualifies. The requirement was nearly cut from the Code during drafting and survived only as mandatory “in most circumstances,” with an exemption allowing a model developer to claim it is as safe as one already cleared.[8] The outside check runs through a door the developer holds open. And as of mid-2026, the Office had not settled what qualifies someone to serve as an outside evaluator: it called a workshop for 15 July, weeks before its enforcement powers switched on, to take expert input on the question.[9]

This is not a flaw the Office can out-hire, because the people who would do the hiring face the wall the empty adviser’s chair revealed. And it is not solved by leaning on the independent evaluators who have made their names doing this work, because they hit the same door. The respected outside shops — METR, Britain’s AI Security Institute, Apollo, FAR.AI — operate under voluntary access agreements granted by the labs, and the labs can withdraw those agreements. When a lab has shared a model before release, the sharing has been thin: in the clearest recent case, a model developer handed an evaluator a safety-tuned version with no ability to fine-tune it, and one of the best-known evaluators appears to have had no special pre-release access since its early work in 2023.[10] Their independence is real on the org chart and thin where it counts: in what the lab lets them see.

What the outside world mostly gets to see is behavior: the model’s outputs, not its internals.

So, where is Europe’s own capacity? It is real, and it sits one level up from the test. Between late 2024 and mid-2025, the Commission’s Joint Research Center ran a global expert pool to develop methods for sorting models into risk categories, and co-authored a paper with the AI Office, published in Science, on how to keep evaluation proportionate to risk. This is serious work. But it is rubric-writing: how to classify a model, where to set the compute thresholds, how to think about reach. The JRC’s own review of AI benchmarks catalogs how immature the field is, and its categorization work measures capability using the benchmarks that already exist, the ones the research community and the labs have built.[11] Europe can decide which models deserve scrutiny and to what extent. It cannot, on its own, put a frontier model through its paces.

Put the pieces together, and the first half of the trap closes on itself. To bind the frontier, the Office needs to evaluate the largest models close to real time. Real-time evaluation needs methods and access that live within the labs and the lab-adjacent shops at the labs’ gate. The Office cannot build that capacity fast enough, because it cannot pay for the people who have it. So the binding step falls back on the provider’s own evaluations, the provider’s chosen outside reviewers, and the provider’s report. No independent capacity accumulates, so next year the Office is no closer to building its own, and it borrows again. The Panel can raise a formal alert when it suspects a model carries serious risk, and that lever is real.[12] But an alert is a flag raised over a document that the model developer wrote.

The yardstick that never arrives

In principle, there is a way out of the borrowing: building an independent measure of its own. Europe tried. The result is the second half of the trap.

The AI Act’s binding requirements for high-risk systems were meant to rest on harmonized technical standards, the detailed yardsticks against which a system is judged. The Commission asked the European standards bodies to write them in 2023. They missed the 2025 deadline, the work continues, and the Commission has said the delay puts the timetable at risk. The first relevant standard reached public consultation in late 2025, months behind schedule, and standards of this kind typically take 2 to 4 years to complete. The Omnibus that Parliament just passed pushed the high-risk obligations out to the end of 2027 and tied their start to the readiness of those standards, conceding that it cannot set its own clock.[13]

The model layer repeats the problem in another form. The general-purpose rules are written to apply across a model’s life, including after updates, and a model developer is told to set their own trigger points for when fresh evaluation is due. But the binding re-assessment only fires when a change is large enough to count as a new model, and the indicative bar for that is a modification using more than a third of the original training compute, a threshold the Commission expects few to cross.[14]

A model can drift a long way through a stream of smaller updates without ever tripping a formal review, and the triggers that might catch it are the model developer’s to set and judge.

Now both halves are visible at once, and they lock together. Every move the Office makes toward being current — lifecycle monitoring, live evaluation, judging the model as it is — runs through the developer, because only the developer has the access and the tools to do it at speed. And the one move toward an independent measure of its own — the standards — keeps slipping past its deadline. Currency by borrowing, independence by waiting: the Office cannot have both, because the thing that would let it be current without borrowing, a deep bench of frontier evaluators paid at market rates working from methods it owns, is the thing the empty adviser’s chair says it cannot afford to build.

What the Office can do

The safety unit is staffed with credible people, drawn from places that do this work.[2] The Scientific Panel is no roster of industry placemen: most of its 60 members are academics, a sixth come from the European machine-learning research network, and they sit in their personal capacity under conflict-of-interest rules.[4] These are serious researchers, several of whom would be at home in any frontier lab. The JRC’s methodological work is real and is being read.[11] The alert the Panel can raise is a true lever, and the Office now has the formal authority over the largest models and the platform-integrated systems that, until this year, were scattered across national capitals.[3] The enforcement numbers are not trivial: breaches of the general-purpose obligations carry fines up to the greater of €15 million or 3% of worldwide turnover.[15]

The law does not limit the Office to reading what it is sent. It can demand access to a model, through an interface or the source code itself, and run its own evaluation, with fines for a model developer who refuses.[16] But look at when it applies: to check compliance only once the developer’s own documentation is found wanting, or to investigate a serious risk after the panel raises a flag. It is the escalation, not the routine, and the default stays the model developer’s report. The rules for how such an evaluation would run have not yet been written. And a right to demand access is worth only as much as the capacity to use it: the methods, the compute, and the people it has already shown it lacks. A right of entry that the regulator cannot staff is a right on paper.

So the Office can decide which models matter. It can demand documentation. It can read a model developer’s report with expert eyes, push back, and escalate. It can fine a model developer who lies or hides. What it cannot do is the thing the public imagines a safety regulator does: take the live model, probe it deeply and repeatedly on its own terms, and certify the result against a yardstick it built and controls. Everything it does sits on top of artifacts that the model developer generated and access that the developer granted.

The fork the labs just got

There is one more actor, and it changed the board three weeks ago.

On 2 June, President Trump signed an order on advanced AI that does almost the opposite of what Brussels did. It establishes a framework for model developers to grant the federal government up to 30 days of access to their most powerful models before release, on a voluntary basis, with qualifying models selected by a classified benchmark, and with no mandatory licensing, no pre-clearance, and no right for anyone to sue to enforce.[19] Collaboration, the order says, not command. The administration had pulled an earlier draft in May for fear it would slow American competitiveness, and signed this softer version instead.[20] The same administration has stood up a task force to challenge state AI laws it considers too heavy.[21] The contrast with Europe could not be sharper: a voluntary American framework on one side, and on the other, binding European obligations carrying statutory fines that take effect this August.[22]

This matters to Europe’s evaluator because it hands the labs something they did not have before: a venue they prefer. A developer cannot lawfully step outside the AI Act for a model it places on the European market; the obligations attach on sale, wherever the model was built.[23] But “comply” and “comply fully, first, and openly” are different things.

A lab can ship its strongest model in the United States first under the friendly arrangement, stage or delay the European release, send a capped or filtered version into the European market, and meet the Office’s deeper requests for access with the least it can defend, while pointing to the American process as its real oversight.

This is not hypothetical: in 2024, one large model developer withheld its multimodal models from the European Union over what it called regulatory unpredictability, and a major device maker delayed its flagship AI features in the bloc on similar grounds.[24] The incentive to route cooperation toward the venue that creates no liability only sharpened the week of the order: the day before it, Anthropic filed a confidential draft registration with the SEC at a valuation near $965 billion, the kind of public-market stake that turns a European enforcement action into a disclosed risk on the prospectus.[25]

None of this loosens the bind; it pulls it tighter. The evaluator was already leaning on the access the labs chose to grant, and now the labs have a venue they prefer and a reason to give Brussels less of it. The access that was thin becomes contested ground, and the jurisdiction that wins it is the one the developers like better. Europe gets the companies willing to be examined, on their own terms; Washington offers those same companies an examiner who asks nothing it can enforce.

Europe has run this play before

If this reads like a forecast, it is not. Europe ran the experiment one regulatory generation back, on the platforms, using the same institutions it is now reaching out to. The AI Office and its Panel are built on the template of the European Center for Algorithmic Transparency: set up in 2023, housed in the same Joint Research Center that now serves the Office, and tasked with giving the Commission the in-house expertise to police the largest online platforms under the Digital Services Act.[26] Swap “platforms” for “models,” and the shape repeats: exclusive Commission supervision of the biggest players, backed by a technical body meant to do the evaluating. Its record is the closest thing we have to a forward look, and it is thin.

Two and a half years in, the platform rules have produced a single fine: €120 million against X in December 2025, in part for barring researchers from its data, with 60 days to fix rather than pay.[27] The open cases against Meta and TikTok turn on the same failure: researchers shut out of platform data.[28] The rules grant researchers a right to that data, and the platforms have spent years making it slow, conditional, or barred outright.

Slow enforcement and a permanent fight to see inside the thing it polices: that is the template now aimed at frontier models.

The pattern predates the platforms, and the AI lineage is direct. Europe’s chemicals agency checks only a fraction of the dossiers industry files on its own substances; the legal minimum rose from 5% to 20% in 2019, and when it does check, most fail: 61% of one early cohort fell short of what the law required.[29] Even a mature agency with real capacity can only spot-check a self-reported base, finding much of it wanting. The body that wrote Europe’s first AI principles fits the same shape: the 2018 High-Level Expert Group, heavy with industry seats and with only four ethicists among more than 50 members, produced ethics guidelines and a self-assessment checklist that one of its own called ethics-washing.[30] The Scientific Panel is its more independent, more technical heir, and still a body reading what developers choose to show it.

What would have to break

Europe escapes the trap if the Office can retain frontier evaluators at something near market pay, build a battery of tests it designs rather than borrows, win a right of deep access to live models instead of the access a developer grants, and track models as they change instead of certifying a version and moving on. Each of those is conceivable. None is close on the current path: the pay bands are public-sector, the standards keep slipping, the access is voluntary, and the talent the Office needs is being bid away by the firms it would evaluate and is now being courted by a friendlier government across the Atlantic. The honest probability, on today’s trajectory, is low.

None of this argues for no regulator; a borrowed evaluation still beats none. The point is narrower: borrowing carries a cost the borrower keeps paying.

Which leaves a verdict sharper than “the regulator is underpowered.” For anyone running diligence on an AI vendor, the practical reading is blunt: treat “AI Office-supervised” or “evaluated under the AI Act” as a provider-generated artifact, not an independent clearance, and price the difference. Europe has built a real enforcer that can read the labs’ homework, raise a flag over it, and fine a lab that hides the truth. What it cannot do is grade the frontier itself. And whichever way it leans, the only models it truly examines are the ones whose makers agree to sit the exam.

The systems that should worry you most are built to skip it: a model fine-tuned to strip its safety training and re-released by someone who never files with Brussels, a capable model served from a jurisdiction that ignores the Act (I’m looking at you, China), a model trained on smuggled compute by an actor with no European address to fine.

Once again, the cop that Europe built is aimed at the population that was already willing to be policed.

Notes

[1] The AI Office’s lead scientific adviser post — application deadline December 2024 — remained unfilled in mid-2026; the head-of-safety-unit role, vacant since the Office was set up, was filled in December 2025 (Matthieu Delescluse). Transformer News; MLex.

[2] Reported AI Office salaries: technical and contract-agent roles roughly $55,000–$120,000, the lead scientific-adviser post at grade AD13 (about €13,500–15,000 per month); EU staff also receive allowances and favourable tax. The figures still fall well below frontier-lab compensation, which can run to seven figures. Transformer News.

[3] Digital Omnibus on AI, consolidated text (Council doc ST 9247/2026 INIT); European Parliament plenary vote 16 June 2026. The AI Office gains centralized supervision over systems built on a same-provider general-purpose model and over AI integrated into very large online platforms, with Commission pre-market assessment for such high-risk systems; carve-outs leave certain products and uses to national authorities. European Commission, Regulatory framework on AI.

[4] AI Act Scientific Panel of 60 independent experts, established 1 June 2026 under Article 68 and Commission Implementing Regulation (EU) 2025/454 (7 March 2025); members serve in a personal capacity under confidentiality and conflict-of-interest declarations; most from academia, roughly a sixth from the ELLIS network. Implementing Regulation (EU) 2025/454; European Commission, AI Scientific Panel.

[5] AI Act Advisory Forum of 174 members under Article 67. European Commission, AI Advisory Forum.

[6] General-purpose AI model obligations have applied since 2 August 2025 under Regulation (EU) 2024/1689; the Office’s power to impose penalties for breaches applies from 2 August 2026. DLA Piper.

[7] Article 55 and the GPAI Code of Practice require providers of general-purpose models with systemic risk to conduct state-of-the-art model evaluations, assess and mitigate systemic risks, and submit a Safety and Security Model Report to the AI Office before placing the model on the market. Article 55.

[8] Under the GPAI Code of Practice (Safety and Security chapter), signatories must give independent external evaluators access to their most advanced models before deployment and publish their evaluator-selection criteria; the requirement applies “in most circumstances,” with proportionality and exemptions, for example where a model is no more capable than an existing open-weight one. Analysis of the Code.

[9] European Commission, European AI Office: “Call for participants: Workshop — qualification requirements for external evaluators of GPAI models with systemic risk,” 15 July 2026. European AI Office.

[10] Independent evaluators (UK AISI, METR, Apollo, FAR.AI) work under voluntary 2024–2025 access agreements the labs grant and can withdraw; pre-deployment sharing has been thin (in one case a safety-tuned model with no fine-tuning access; a leading evaluator with no special pre-release access since around 2023), and the evidence base is overwhelmingly behavioural. Seth & Sankarapu, arXiv:2605.15164 (Lexsi Labs, May 2026); AI Lab Watch; access-level taxonomy in arXiv:2601.11916 (Jan 2026), finding external evaluators are typically restricted to black-box access and cannot examine model internals.

[11] A paper co-authored by the AI Office and the Joint Research Centre, “The science and practice of proportionality in AI risk evaluations,” appeared in Science (vol. 391, 6 March 2026), translating the legal proportionality test into criteria for how demanding a model evaluation must be; the JRC also runs an expert pool on GPAI risk categorisation and has catalogued the immaturity of current AI safety benchmarks. Knowledge4Policy; Science; JRC, AI safety benchmarks report.

[12] The Scientific Panel may issue a qualified alert to the AI Office where it suspects a general-purpose model presents systemic risk; the alert is a trigger the Office may act on, not an automatic investigation. Article 90.

[13] Harmonised standards under standardisation request M/593 were requested of CEN-CENELEC in 2023, missed the 2025 deadline and remain in development; the first reached public enquiry in late 2025, and such standards typically take two to four years, with acceleration measures adopted in October 2025 targeting completion by late 2026. The Digital Omnibus ties the start of the high-risk obligations to their availability. TechPolicy.Press.

[14] General-purpose obligations apply across the model lifecycle, including post-market modifications, with provider-set evaluation triggers; a downstream modification is treated as a new model chiefly where it exceeds roughly one-third of the original training compute (indicative), a bar the Commission expects few to cross. European Commission, GPAI Q&A.

[15] Breaches of the general-purpose obligations carry fines up to the greater of €15 million or 3% of worldwide annual turnover; high-risk obligations are deferred under the Omnibus to 2 December 2027. Article 101.

[16] Article 92 empowers the AI Office, after consulting the Board, to conduct evaluations of a general-purpose model — to check compliance where information requested under Article 91 is insufficient, or to investigate systemic risk, in particular after a scientific-panel alert — and to appoint independent experts, including from the panel. It may request access via APIs or other means, including source code, with Article 101 fines for refusal; detailed arrangements await implementing acts not yet adopted. Article 92; European Commission, GPAI Q&A.

[19] Executive Order, “Promoting Advanced Artificial Intelligence Innovation and Security,” 2 June 2026: directs a framework for developers to voluntarily grant the federal government up to 30 days of pre-deployment access to “covered frontier models,” with covered models set by a classified NSA benchmark, no mandatory licensing or pre-clearance, and no enforceable private right. The White House; Crowell & Moring; Morrison & Foerster.

[20] The administration pulled an earlier draft of the order in May 2026 over concerns it would hinder US competitiveness, signing a softer version on 2 June. Crowell & Moring.

[21] The order establishes a Department of Justice task force to challenge state AI laws. Paul Hastings.

[22] US/EU divergence: a voluntary US framework versus binding EU general-purpose obligations with statutory penalties effective 2 August 2026. ComplianceHub.

[23] AI Act obligations attach when a model or system is placed on the EU market, regardless of where it was developed. Regulation (EU) 2024/1689.

[24] In 2024, Meta declined to release its multimodal Llama models in the EU, citing “the unpredictable nature of the European regulatory environment” (a text-only Llama shipped), and Apple delayed several Apple Intelligence features in the EU, citing Digital Markets Act uncertainty. Axios; 9to5Mac.

[25] On 1 June 2026, the day before the executive order, Anthropic confidentially submitted a draft S-1 to the SEC at a valuation near $965 billion (following a $65 billion round the prior week). Anthropic; Fortune; CNBC.

[26] The European Centre for Algorithmic Transparency (ECAT), established 2023 within the Joint Research Centre, gives the Commission in-house technical expertise to support its exclusive supervision of very large online platforms under the Digital Services Act. ECAT.

[27] First DSA non-compliance fine: €120 million on X, 5 December 2025, partly for barring researchers from effective access to its public data, with deadlines of 60 to 90 working days to remedy rather than pay. IAPP; Euronews.

[28] The Commission preliminarily found Meta and TikTok in breach of their obligation to give researchers access to public data (24 October 2025); across the DSA cases the recurring breach is denial of access, even though the law grants researchers a right to platform data. European Commission, preliminary findings; Science.

[29] Under REACH, the European Chemicals Agency checks only a fraction of industry-submitted registration dossiers: the legal minimum rose from 5% to 20% in 2019, and ECHA examined about 21% of full registrations (≈15,000) between 2009 and 2023; of 928 evaluations concluded in 2013, 61% were non-compliant with one or more information requirements, and in 2024, 313 compliance checks produced 208 data requests. ECHA.

[30] The High-Level Expert Group on AI (convened 2018) produced the Ethics Guidelines for Trustworthy AI (April 2019) and the ALTAI self-assessment checklist (July 2020); a member, Thomas Metzinger, publicly called the exercise “ethics-washing,” noting roughly four ethicists among more than 50 members and the absence of red lines. European Commission, High-Level Expert Group on AI.

The AI Realist

Ready for more?