Measuring Axiom 2 Across Professional Domains (A Snap Shot)
by Tyler M | March 2026 | Theory of Recursive Displacement, AI Labor, Empirical Validation
Research compiled from Anthropic Economic Index, Stripe Engineering, METR, Stanford Digital Economy Lab, EA Forum, IAPP, BLS, Coupé and Wu meta-analysis, Dell'Acqua et al., Brynjolfsson et al., Noy and Zhang, and primary corporate disclosures
Axiom 2 — the Recursive Substitution Loop — predicts that the lag between new task creation and new task automation is collapsing, making the reinstatement effect ephemeral rather than durable. The prediction, tested systematically across professional domains, does not hold universally. It holds powerfully in a bounded attack surface. And it may operate on a level the occupation-based catalog does not capture.
The evidence reveals not universal compression but a sharp bifurcation. Of ten AI-native task categories tracked with enough data to measure, two show clear compression (prompt engineering, routine software engineering tasks), while eight show sustained expansion after three to five or more years with no compression signals. The structural features predicting this split are identifiable and consistent: formal output, objective quality metrics, and single-agent execution structures compress. Adversarial dynamics, liability exposure, regulatory moats, and judgment under uncertainty resist. But the 2:8 ratio is asymmetric in scale — the compressing domains represent millions of workers; the expanding categories represent tens of thousands — and most of the expanding categories have no functioning junior pipeline.
Within the compressing domains, the data splits again along organizational lines. The average productivity studies show mixed results. Frontier organizations that have redesigned around autonomous agents — Stripe producing 1,300+ merged pull requests per week with no human-written code — are operating at a cost structure the copilot-paradigm firms cannot match. The aggregate average masks this bimodal distribution. Meanwhile, four independent studies using administrative data converge on the pipeline signal: an approximately 13–20% employment decline for ages 22–25 in AI-exposed occupations (synthesized across different methodologies and datasets), driven by hiring freezes rather than layoffs — the Dissipation Veil (essay soon) operating at the cohort level.
The most important finding is not in the catalog. It is in the Stanford data showing that the junior employment decline concentrates entirely in automation-prone occupations. Where AI augments rather than automates, junior employment is stable. The recursive substitution loop is not a technological inevitability — it is contingent on deployment choices. Some firms are choosing automation of junior tasks (Stripe Minions). Others are choosing augmentation that preserves and redesigns the entry pathway (OpenAI's "super junior" model, IBM's tripled entry-level hiring). The firms choosing augmentation appear to be doing so because they can see Competence Insolvency forming — recognizing that without entry-level investment, the senior talent pipeline dries up within five years. This is institutional redirect operating through competitive self-interest rather than government mandate. Whether it scales is the genuinely open question.
Confidence calibration: 60–65% that the occupation-level bifurcation persists for at least five more years. 50–55% that the pipeline contraction in automation-prone roles is partially attributable to AI (versus post-pandemic corrections, rate hikes, and R&D tax changes). 55–60% that Competence Insolvency concerns drive broader adoption of augmentation-over-automation strategies within three years. 40–50% that cost structure differentials between agent-paradigm and copilot-paradigm firms produce observable competitive displacement within five years.
Part I: The Compression Catalog
Ten AI-created task categories have enough data to track with measured or estimated automation timelines. Their trajectories diverge dramatically.
The Compressed
Prompt engineering is the paradigm case of rapid compression. The standalone role emerged in late 2022, with Indeed search volume peaking around April 2023. Salaries at frontier labs reached $335K–$375K (Anthropic's posted range for the title). By mid-2024, the standalone title was fading. By early 2025, Indeed's VP of AI confirmed job postings were minimal. ZipRecruiter data shows the average salary falling to approximately $70K by February 2026 — roughly half the peak average. Microsoft's 2025 Work Trend Index survey ranked prompt engineer second-to-last among new roles companies planned to add. [Estimated — aggregated from Indeed, ZipRecruiter, Microsoft survey data with different methodologies]
The compression lag: approximately 18 months from emergence to measurable decline. But a critical nuance often missed in the headline: prompt engineering did not vanish. The standalone title compressed; the underlying skill migrated upward into AI Engineer, ML Engineer, AI Solutions Architect, and LLM Engineer roles. The Recursive Substitution Loop's first observable cycle operated exactly as Axiom 2 predicts — the new task was not eliminated but absorbed, its distinct occupational identity collapsing into a broader role where it functions as one skill among many rather than a standalone career.
AI-assisted code review and generation shows active compression in progress, but the compression pattern is far more complex than the prompt engineering case. The data splits along organizational lines rather than occupational ones.
The Expanding
AI safety research emerged in its current form around 2020–2021 and has reached the five-year mark with zero compression signals. Per an EA Forum field analysis, technical AI safety full-time equivalents grew from roughly 300 in 2022 to 620 in 2025, a 21% annual growth rate spread across 68 active technical organizations. Non-technical AI safety FTEs grew from approximately 100 to 489 in the same period. National AI Safety Institutes have been established in the US, UK, EU, and Japan. The work is inherently adversarial — finding novel failure modes in systems that have never existed before — and requires creative reasoning about unprecedented problems.
The pipeline question: AI safety has almost no entry-level pathway. 80,000 Hours notes that "almost everyone is doing or has completed a PhD." MATS — the field's most prominent training fellowship — has graduated roughly 500 fellows over 3.5 years with a 4–7% acceptance rate. The field explicitly describes itself as talent-constrained, and its primary feeder pipeline is ML engineering — the occupation showing the sharpest entry-level hiring decline. The category is expanding while the pipeline feeding it is contracting. [Estimated — EA Forum census methodology, self-reported organizational data; pipeline assessment from 80,000 Hours and MATS program data]
AI governance and policy emerged around 2022–2023 and is expanding rapidly. The IAPP's 2024 survey found 77% of organizations building AI governance programs while only 1.5% reported satisfaction with current headcount — a massive demand-supply gap. The EU AI Act's phased implementation through 2027 creates sustained regulatory demand that is, by definition, not automatable: the regulations require human judgment about compliance, human accountability for enforcement, and human interpretation of ambiguous requirements. The World Economic Forum's Future of Jobs Report 2025 identified governance, risk, and oversight roles among the fastest-growing job categories globally. A TechJackSolutions analysis of 20 governance role types found only 4 of 20 accessible at entry level — the rest require prior policy, legal, or technical experience. [Estimated — IAPP survey, WEF report, EU regulatory timeline, job market analysis]
AI ethics consulting occupies similar territory at the three-year mark, increasingly folded into governance roles. No compression evidence exists. The domain's output is inherently judgment-intensive: determining what constitutes responsible AI deployment requires contextual reasoning about values, stakeholders, and consequences that resist formalization. [Assessment]
Human-AI interaction design approaches the four-year mark. MIT's NANDA study found that the "last mile" of making AI tools usable in organizational workflows is consistently the primary failure point — more than 90% of employees use personal AI tools while companies fail to integrate them officially. This integration gap is expanding, not shrinking, because each new AI capability creates new design challenges. The demand for designers who understand both AI capabilities and human cognitive limitations is growing faster than the tools to automate that understanding. [Estimated]
AI training data curation represents the most nuanced case. At five to seven years old, the global data labeling market reached $3.7 billion in 2024 with projections of $17 billion by 2030. Synthetic data is partially automating volume annotation, with some providers claiming 70% reduction in manual labeling requirements. But the role is transforming rather than vanishing — shifting from annotator to curator and auditor. The judgment required to evaluate whether training data is representative, unbiased, and appropriate for a given application is increasing as AI systems become more capable and the consequences of training data quality become more visible. Total spending and employment continue growing. [Estimated — market research aggregates with varying methodologies]
MLOps and AI infrastructure engineering has survived five or more years without compression. LinkedIn data shows 9.8x growth over five years. Glassdoor lists thousands of active MLOps roles in the US alone. The emergence of LLMOps as a subspecialty adds another layer of complexity. The pattern is unambiguous: more models deployed means more infrastructure to manage. Each layer of automation generates demand for managing the next layer. [Estimated]
AI model evaluation and red-teaming reached $1.43 billion in market size in 2024, projected to grow at 26.1% CAGR to $11.6 billion by 2033. US Executive Orders and the EU AI Act formally mandate red-teaming and evaluation for high-risk systems, creating regulatory demand that is structurally resistant to automation — you cannot automate the process of finding novel failure modes in AI systems using the same AI systems being evaluated without circular dependency. [Estimated — market research projections]
AI agent development is the newest category and the fastest-growing. Job listings mentioning agentic AI jumped 986% between 2023 and 2024. The global AI agents market was valued at $3.86 billion in 2023 with a projected 45.1% CAGR through 2030. This category is too young for compression analysis but represents the frontier of AI-native work — and its explosive growth is itself evidence of expanding task creation. [Estimated]
The Scorecard
Of ten tracked categories, two show compression (one complete, one in progress). Eight show expansion with no compression signals after three to five or more years. The ratio — 2:8 — is the opposite of what universal compression would predict. If this ratio holds, Axiom 2 describes a narrow phenomenon rather than a general one.
Two critical qualifiers prevent this ratio from being dispositive. First, scale asymmetry: the 8 expanding categories are small in absolute employment — AI safety is 1,100 FTEs, MLOps runs in the thousands, governance is growing from a small base. Collectively they may represent 50,000–100,000 jobs. The compressing domains — software engineering, customer service, content creation, data entry — represent millions. The ratio is 2:8 in categories but potentially inverted in affected headcount. Second, pipeline depth: an Axial Search analysis of 10,133 AI/ML engineering positions found 78% target professionals with 5 or more years of experience. Most of the expanding categories are hiring from a finite pool of senior talent without building the junior pathways that replenish it. A category can expand for years while drawing down a non-renewable resource. Whether these categories can sustain expansion without a functioning entry-level pipeline is a separate question from whether they are expanding now.
Part II: The Software Engineering Case — Three Paradigms, Not One
Software engineering is the strongest compression case and the most empirically rich. It also demonstrates why averaging across organizational paradigms produces misleading results. The data reveals three distinct regimes operating simultaneously.
Paradigm 1: The Copilot Model
The majority of the productivity literature measures individual developers using AI-assisted coding tools in a copilot configuration — the human writes code with AI suggestions, completions, and chat-based assistance. The results are decidedly mixed.
The METR randomized controlled trial (July 2025; 16 experienced open-source developers, 246 real tasks on familiar codebases) found that developers using AI tools took 19% longer to complete tasks — while believing they were 24% faster. The perception-reality gap was 43 percentage points. [Measured — RCT with small but carefully controlled sample]
This finding is not anomalous. The Anthropic learning study (January 2026; 52 mostly junior engineers) found AI-assisted learners scored 17% lower on comprehension assessments, with the largest gaps on debugging questions. The researchers identified a critical distinction between "AI delegation" (outsourcing thinking) and "conceptual inquiry" (using AI to deepen understanding). The former degraded performance; the latter improved it. [Measured — RCT]
The Vaccaro et al. meta-analysis (106 studies, 370 effect sizes) found that human-AI combinations performed significantly worse than the best of either humans or AI alone on average. [Measured — meta-analysis]
These results do not mean AI coding tools are useless. They mean the copilot paradigm — human writes code, AI assists — produces inconsistent and often negative productivity effects for experienced developers on complex tasks. The gains are concentrated among less-experienced developers working on routine tasks. Brynjolfsson et al. (2023) found the 14% productivity improvement in customer service was driven almost entirely by lower-skilled workers, with top performers showing near-zero benefit. Dell'Acqua et al. (2023; BCG consultants, N=758) found 40% quality improvement on tasks within the AI frontier but 19 percentage points worse performance on tasks outside it. Noy and Zhang (2023; professional writing, N=453) found 40% time reduction concentrated among lower-ability writers. [Measured — individual RCTs]
The meta-analytic average across 45 studies is a 17% productivity gain (Coupé and Wu, 2025). [Measured] But this average is dominated by studies measuring routine tasks, lower-skilled workers, and simple outcome metrics. The average is real. It is also misleading as a predictor of compression, because it describes the modal case while the transformative case operates at a completely different scale.
Paradigm 2: The Agent Model
Stripe's Minions system, publicly documented in February 2026, operates in an entirely different paradigm. Minions are not copilots. They are fully autonomous, unattended coding agents that produce complete pull requests from Slack messages with no human interaction during execution.
The architecture is precise and worth understanding in detail, because the design choices explain why Stripe's results diverge so dramatically from the copilot literature.
A Minion run begins when an engineer tags a Slack bot with a task description. Before the LLM is invoked, a deterministic orchestrator prefetches context — scanning the thread for links, pulling Jira tickets, retrieving documentation, searching code via Sourcegraph through MCP (Model Context Protocol). The agent then operates on an isolated "devbox" — a pre-warmed AWS EC2 instance containing Stripe's source code, identical to what human engineers use, that spins up in 10 seconds. The devbox is isolated from production and the internet, enabling full autonomy without human permission checks.
The core innovation is what Stripe calls "blueprints" — hybrid workflows that interleave deterministic code nodes with agentic nodes. Some steps are hardcoded: git operations, linting, CI submission. These always execute identically. Other steps — "Implement task," "Fix CI failures" — invoke the LLM with latitude to make decisions. Stripe's own description: "putting LLMs into contained boxes compounds into system-wide reliability upside." The system runs the model. Not the reverse.
Minions connect to Stripe's internal MCP server "Toolshed," which hosts nearly 500 tools spanning internal systems and external platforms. Agents receive curated subsets of these tools — smaller boxes for higher reliability. Feedback loops operate in three tiers: local linting in under five seconds, selective CI from Stripe's battery of over three million tests, and a maximum of two CI rounds before the task returns to a human. [Measured — Stripe Engineering Blog, February 2026]
The numbers: Part 1 (February 9, 2026) reported over 1,000 merged pull requests per week completely minion-produced, human-reviewed, containing no human-written code. Part 2 (February 19, 2026) reported over 1,300. That is a 30% increase in ten days — a system still accelerating on its internal adoption curve. [Measured — Stripe corporate disclosure]
The critical distinction from the copilot paradigm: the unit of analysis is not "developer productivity" but "organizational output." An individual Stripe engineer can spin up multiple Minions in parallel, each on its own devbox, each producing a complete PR. The human becomes the orchestrator and reviewer, not the producer. The marginal cost of additional output is compute, not salary.
A structural dependency: every Minion-produced PR is human-reviewed before merge. Stripe is explicit about this. The agent paradigm produces output; the human curation layer — the senior engineer who reviews, rejects, or redirects — determines whether that output ships. The Enshittification Engine (essay soon) documents what happens when organizations eliminate this review layer under cost pressure: unfiltered agent output accumulates structural damage that manifests as product degradation in every domain requiring judgment. Stripe's productivity gains are contingent on the curation layer surviving. The Ratchet's budget pressure creates incentive to remove it. Whether agent-paradigm firms preserve the review step or optimize it away is a leading indicator of whether Regime 2 compression produces sustainable productivity or the enshittification spiral.
Paradigm 3: The Competitive Dynamic
The observed and the projected diverge here.
The observable: Stripe's engineering cost structure is diverging from organizations still operating in the copilot paradigm. Tasks that would previously require hiring and onboarding a junior engineer — well-scoped bug fixes, routine feature work, on-call issue resolution, lint fixes, test updates — are now handled by agents whose marginal cost is measured in compute tokens rather than annual compensation. [Measured — inferred from Stripe's public description of Minion use cases]
The projected: if this cost structure advantage compounds — if Stripe ships faster at lower marginal cost per feature than competitors who haven't made this transition — competitive pressure would drive adoption or drive out firms that can't match the cost structure. This is the entity substitution mechanism operating at the organizational level. [Projected — theoretical extrapolation from cost structure differential]
What has not happened: Stripe is hiring. Their competitors are hiring. No firm has entered bankruptcy or restructuring because it failed to adopt autonomous coding agents. The competitive displacement that entity substitution predicts has not occurred in software engineering. [Projected — theoretical extrapolation from cost structure differential, no confirming cases as of March 2026]
In banking, the lateral dynamic is further along. JPMorgan Chase topped the Evident AI Index for three consecutive years, with $534,191 revenue per employee — highest among large global banks — an $18–20 billion annual technology budget, and 450+ AI use cases in production. The Evident AI Index shows top-10 banks increasing AI scores at 2.3x the rate of peers — a widening lateral gap. JPMorgan analysts now predict AI spending requirements may force smaller banks into mergers. This is entity substitution operating laterally — not lean AI-native entrant versus burdened incumbent, but burdened incumbent with better infrastructure outcompeting peer incumbents with the same regulatory burdens but weaker AI deployment. The incumbent's existing data infrastructure and compliance architecture becomes a moat rather than a vulnerability. [Measured — Evident AI Index, JPMorgan financial data, analyst reports]
The lateral pattern does not generalize everywhere. Among consulting firms — McKinsey, BCG, and Bain — all invested aggressively in AI, all built internal tools, and no clear competitive separation emerged. In domains where AI tools commoditize rapidly across competitors, the lateral advantage dissipates. The structural features that predict lateral entity substitution appear similar to those predicting task compression: data-rich, infrastructure-deep, compliance-heavy industries where proprietary systems compound advantages. [Estimated — industry analysis]
A faster channel may run through retention rather than cost competition. The Enshittification Engine (essay soon) documents how firms that eliminate the curation layer — the senior talent that exercises priority judgment — generate their own competitors through voluntary departure. The documented cases are dramatic: Anthropic's founding team left OpenAI voluntarily and built a $380 billion competitor. Perplexity's founder left Google Brain, DeepMind, and OpenAI. Mistral's three founders departed DeepMind and Meta. CB Insights tracked 14 AI startups led by former Google employees that collectively reached valuations exceeding $70 billion. [Measured/Estimated] The boomerang runs through retention, not termination — senior talent that can see structural damage accumulating leaves before the consequences arrive, carrying institutional knowledge of exactly where the legacy firm's vulnerabilities lie. In lateral entity substitution, the relevant question may not be "which incumbent deploys AI better?" but "which incumbent retains the curators who make AI deployment work?"
The Jevons paradox is the strongest counterargument. If Minions make engineering output 5x cheaper at the margin, Stripe may respond by building 5x more product — pursuing projects that weren't cost-justified before, expanding into adjacent markets, increasing feature velocity. Demand for software appears highly elastic. The radiologist parallel is instructive: Geoff Hinton predicted in 2016 that AI would make radiologists obsolete within five years. Instead, CT and MRI scan volumes nearly doubled in US emergency departments as AI-driven efficiency coincided with expanded utilization. If cognitive labor gets cheaper, organizations may consume more of it rather than less. [Framework — Original, supported by historical analogy]
Why the Average Misleads
The METR finding (19% slower) and the Stripe finding (1,300+ autonomous PRs per week) are not in tension. They are measuring different phenomena. The METR study measured the copilot paradigm: individual developers using AI tools within their existing workflow. Stripe built the agent paradigm: organizational infrastructure that enables AI to execute entire task pipelines autonomously.
The aggregate productivity literature — the 17% average gain — blends measurements from both paradigms plus everything in between. This average is the Dissipation Veil operating within the measurement itself. It allows observers to look at the data and conclude "AI provides moderate productivity improvements." Meanwhile, the distribution is bimodal: copilot-paradigm organizations seeing marginal or negative effects on experienced developer productivity, and agent-paradigm organizations seeing effectively unlimited parallelization of well-scoped tasks.
The relevant unit of analysis for compression is the organization, not the individual. Task compression does not advance worker by worker. It advances organization by organization, as firms cross the threshold from copilot paradigm to agent paradigm. The high enterprise AI failure rate — multiple surveys place it above 80%, with some estimates as high as 95% — is not evidence against compression. It is a measure of how few organizations have crossed the threshold. The question for the Recursive Substitution Loop is not "does the average improve?" but "how fast does the frontier diffuse?"
Stripe's own infrastructure moat provides a partial answer. Their system works because of years of investment in developer tooling that predates LLM agents: devboxes, comprehensive test suites, linting infrastructure, MCP integration, code search via Sourcegraph. Stripe explicitly states that tools built for human developer productivity became the scaffolding that made agents work. Organizations without this foundation — the median enterprise running legacy systems with thin test coverage and manual deployment pipelines — cannot replicate Stripe's results by purchasing an off-the-shelf agent. The diffusion is gated by infrastructure maturity, not model capability.
This has a counterintuitive implication for the compression timeline. The organizations most likely to achieve the agent paradigm are those that already invested heavily in engineering infrastructure — which are also the organizations least likely to experience competitive pressure from AI-native entrants, because they are already the market leaders. The competitive dynamic that drives entity substitution may therefore operate more slowly than the technology's raw capability would suggest.
Part III: Structural Features That Predict Compression Rate
The structural features that resist compression share a common thread: they are all manifestations of what the Orchestration Class essay identifies as the curation function — priority judgment about what is worth doing, exercised through the organizational act of saying no. AI safety research is curation of failure modes. AI governance is curation of deployment decisions. Red-teaming is curation of vulnerabilities. Interaction design is curation of the boundary between AI capability and human cognition. The Enshittification Engine (essay soon) documents what happens when organizations eliminate this function: product degradation in every domain requiring judgment, accelerating voluntary departure of the senior talent who carried it. The expanding categories in the compression catalog resist compression because they are the curation function — and AI systems cannot curate their own output without circular dependency.
Three structural hypotheses find strong empirical support; two additional features show meaningful but more ambiguous resistance.
Formal versus informal output is the strongest predictor. Prompt engineering produces structured text that AI models can generate natively — it was the skill of speaking to AI, which AI naturally learned to do itself. Software engineering produces code with deterministic verification: tests pass or fail, builds succeed or break, linters flag or clear. Customer service has clear resolution metrics. These domains all show measurable compression. By contrast, AI safety research, ethics consulting, governance policy, and interaction design produce judgment, narrative, and strategy — outputs where quality assessment is inherently subjective and context-dependent. None show compression. The Anthropic Economic Index confirms this alignment: 68% of observed Claude usage falls on tasks rated fully feasible for LLMs alone, and computer programming leads at 75% observed coverage precisely because code is formal, testable output. [Framework — Original, supported by Anthropic Economic Index data]
Adversarial dynamics create structural cost inflation rather than compression. A February 2026 Lawfare paper on AI in litigation argued that the adversarial legal system prevents AI from lowering the cost of achieving legal outcomes even as it reduces cost per legal task: when both sides become more productive via AI, the competitive equilibrium shifts upward. Historical precedent supports this — e-discovery was supposed to reduce litigation costs but instead enabled more extensive discovery demands, leaving total spending high. Cybersecurity demonstrates the same pattern: global spending reached $213 billion in 2025, with 87% of organizations experiencing an AI-driven cyberattack in the past year. AI lowered the attacker skill barrier, expanding the threat surface faster than defensive AI could contract it. [Framework — Original, supported by historical pattern matching and measured spending data]
Liability as compression brake operates powerfully in medicine, law, engineering, and finance. Over 1,250 AI-enabled medical devices have received FDA authorization, with approximately 76% in radiology, yet almost all are classified as decision-support tools rather than autonomous diagnosticians. The liability gap is structural: physicians bear malpractice liability for following AI recommendations, while AI developers are largely shielded because software is classified as a service rather than a product. In law, unauthorized practice statutes in every state prohibit AI from providing legal advice directly. A standard-of-care paradox is emerging: failure to use AI may become malpractice while reliance on erroneous AI is also malpractice, creating a zone of irreducible human responsibility that no technical improvement can eliminate. [Framework — Original, supported by FDA data, liability case law, and regulatory analysis]
Physical embodiment establishes a compression floor for roughly 60–65% of the US workforce that cannot fully telework. Despite progress, humanoid robotics remains far from matching human dexterity in unstructured environments. Boston Dynamics' CEO acknowledged in 2026 that building reliable machines requires sustained engineering iteration. Tesla's Optimus Gen 3 encountered delays from overheating, hand-load capacity, and battery life issues. The first commercially deployed humanoid (Agility Robotics' Digit) operates only in structured warehouse environments. The timeline for general-purpose physical capability matching human dexterity in unstructured settings is realistically 10–15 or more years. [Assessment]
Regulatory moats are strong in the near term but face erosion pressure. Approximately 22% of US workers hold a state professional license. Healthcare shows 72.6% licensure rates; law approximately 84%. No AI system can hold a medical license, PE stamp, bar admission, or CPA credential. However, erosion signals exist: the Healthy Technology Act of 2025 would allow AI systems to serve as drug prescribers under FDA authorization; Utah operates a regulatory sandbox for legal services innovation; a Trump administration executive order created an AI Litigation Task Force to challenge state AI regulation. The moats interlock with liability — erosion requires simultaneous resolution of both licensing and liability questions, making them more durable in combination than either alone. [Assessment — regulatory analysis with explicit uncertainty about erosion timeline]
Part IV: The Five-Year Test
The Theory's falsification conditions state that if new task categories created by AI remain inaccessible to AI automation for more than five years, this demonstrates durable human comparative advantage in newly created work. Which categories are approaching this threshold?
Three categories have clearly reached or passed it. AI safety research at five years shows 21% annual FTE growth and zero compression. MLOps at five-plus years shows 9.8x growth with no compression. AI training data curation at five to seven years shows a $3.7 billion market growing toward $17 billion, with role transformation but continued employment expansion. Two additional categories — AI governance and AI ethics consulting — are at the three-year mark with strong growth trajectories and no compression indicators. Human-AI interaction design approaches four years with expanding demand. [Assessment]
The historical base rate provides additional context. Across six prior technology-created job categories — web development, mobile app development, social media management, data science, cloud engineering/DevOps, and SEO — none have experienced occupation-level compression despite automation tools arriving within 2–11 years of the occupation's emergence. All six expanded as automation handled lower-level tasks and the occupation shifted to higher-complexity work. If AI-created categories follow the same pattern, they may never compress at the occupation level. Individual tasks will be automated, but the occupation evolves upward. [Estimated — historical pattern analysis]
Against this, AI capability benchmarks are improving at a pace with no historical precedent. SWE-bench Verified went from 4.4% to over 70% in roughly one year. Medical licensing exam (USMLE) AI performance went from approximately 50% to 100% in three years. MMLU, GPQA, and other reasoning benchmarks are being saturated faster than researchers can create new ones. The cost of achieving GPT-3.5-level performance fell 280x in 18 months. This acceleration complicates any argument from base rates — the prior technology waves did not improve at this pace, and the historical pattern may break as capabilities approach broader competence thresholds. [Measured — benchmark tracking data]
The most important question the five-year test raises is whether the categories that survive represent structural resistance or merely delayed compression. The framework cannot currently distinguish between these possibilities with confidence. The structural features documented in Part III — adversarial dynamics, liability, embodiment, regulation, judgment under uncertainty — provide a theoretical basis for structural resistance. But theoretical bases have failed before. Three to five years of expansion is meaningful evidence against universal compression, but not yet conclusive evidence of permanent resistance. The data continues to accumulate.
Part V: What This Means for Axiom 2
The empirical record does not validate Axiom 2 in its strongest form — the claim that compression is universally accelerating. It reveals three distinct compression regimes, each with different implications for the Recursive Substitution Loop.
Regime 1: Rapid compression (18–36 months). Prompt engineering is the exemplar. The task involves mediating between humans and AI. The output is formal and testable. The task is single-agent. Quality metrics are objective. As AI models improve at understanding context and intent, the mediation function becomes unnecessary. This regime validates Axiom 2 at full strength — the reinstatement effect is genuinely ephemeral.
Regime 2: Organizational compression (2–5 years, paradigm-dependent). Software engineering illustrates this. The compression is real but operates at the organizational level, not the individual level. The copilot paradigm shows marginal effects. The agent paradigm — demonstrated by Stripe — shows dramatic compression of well-scoped tasks. The observable outcome today is productivity improvement and pipeline exclusion: Stripe can produce more output per engineer while the marginal cost of routine tasks drops to compute.
Four independent studies using different datasets converge on the entry-level channel. Stanford's "Canaries in the Coal Mine" (Brynjolfsson, Chandar, Chen; ADP payroll data covering millions of workers) found software developer employment for ages 22–25 declined nearly 20% from the late 2022 peak, while employment for ages 30+ grew 6–13%. A Harvard/Revelio study (62 million résumés, 285,000 firms) found junior headcount at AI-adopting firms fell 7.7–10% relative to non-adopters within six quarters — driven by hiring freezes, not layoffs. The Giné and Azar study (IESE; 138 million workers) found junior position wages dropped 6.3% post-ChatGPT while senior wages held stable. The Anthropic Economic Index detected a 14% hiring reduction for ages 22–25 in AI-exposed occupations. [Measured — four independent administrative-data studies, attribution uncertain]
The "hiring freezes, not layoffs" finding is the Dissipation Veil (essay soon) operating at the cohort level. Nobody is fired. Positions stop being created. The displacement channel is non-hiring rather than termination — invisible in unemployment statistics because people who were never hired do not appear as unemployed. Anthropic's own paper notes that affected young workers may be "exiting the labor force rather than appearing as unemployed." The signal routes around every standard measurement instrument designed to detect labor market distress.
An important caution on attribution: the 13–20% junior employment decline coincides with post-pandemic overhiring corrections, Federal Reserve rate hikes beginning Q1 2023, and Section 174 R&D tax changes that increased the cost of hiring. Brynjolfsson himself declines to claim the findings are "fully driven by AI." The Stanford paper notes that much of the downturn aligns with monetary policy tightening. AI is a meaningful contributor to an age-stratified employment shift that also reflects macroeconomic factors — not the sole cause of a structural transformation. [Measured — attribution uncertain, multi-causal]
But the Jevons paradox counterargument is not just theoretical here. Stripe ends both blog posts with "we're hiring." The company is not reducing headcount. It is increasing output per engineer and pursuing more ambitious projects. More broadly, some analyses of US labor data — including reports from large asset managers and cited in Fortune — find that occupations most exposed to AI automation actually outperform the rest of the job market in employment growth, suggesting demand expansion may currently dominate displacement at the aggregate level. Whether this is a transitional pattern (hire while expanding, then reduce once the expansion stabilizes) or a durable equilibrium (cheaper cognitive labor means more demand for cognitive labor, permanently) is an empirical question that the current data cannot resolve. [Estimated — synthesized from multiple employment analyses; Framework — Original on Jevons interpretation]
Regime 3: Expansion without compression (3–5+ years and counting). AI safety, governance, MLOps, red-teaming, and interaction design all show this pattern. Their structural features are not merely slowing compression but actively generating new work as AI capabilities advance. More capable AI creates more safety concerns, more governance requirements, more infrastructure to manage, more integration challenges. This regime challenges Axiom 2 fundamentally: these categories operate on a positive feedback loop where AI advancement drives demand expansion rather than task substitution.
But "expanding" and "sustainable" are different claims. These categories are growing while drawing from a senior talent pool that is not being replenished at the rate it is being consumed. AI safety recruits from ML engineering. AI governance recruits from policy and legal backgrounds. MLOps recruits from software engineering and DevOps. The traditional feeder pipelines for all three are experiencing the entry-level contraction documented in Regime 2. A category can expand for years by consuming existing senior talent. The question is whether it can sustain that expansion once the reservoir thins — and the pipeline data suggests the reservoir is already thinning. IBM's CHRO stated plainly in 2026: "If we don't continue to invest in entry-level hires, what happens in 3-5 years? There's no pipeline; the well simply dries up." [Assessment — pipeline concern supported by hiring data, timeline uncertain]
The framework's central prediction — that the lag between task creation and task automation is collapsing — holds for Regime 1 and partially for Regime 2, representing perhaps 30–40% of AI-adjacent work (the formally-structured, objectively-measured, single-agent portion). It fails for Regime 3 at the occupation level — the majority of tracked AI-native categories are expanding, not compressing. But "occupation-level expansion" and "pipeline-level contraction" can coexist, and the data suggests they do. The 8 expanding categories may represent a lagging indicator consuming the output of pipelines that are already contracting, rather than a durable reinstatement effect. Which interpretation is correct depends on a variable the catalog cannot measure: whether the expanding categories build sustainable junior pathways or exhaust the senior talent reservoir they inherited from the pre-AI pipeline.
The most significant macro-level finding comes from Denmark. Humlum and Vestergaard (2025), using administrative labor records through December 2024, found essentially zero effects on earnings and recorded hours at both worker and workplace levels — null results holding even for intensive AI users, early adopters, and workers reporting large productivity gains. [Measured] This result has two possible interpretations. The first: the Dissipation Veil is operating — the deployment gap (multiple surveys place regular enterprise AI adoption well below 25%, with some estimates near 10%) obscures the compression that frontier firms are already experiencing, and the macro effects will appear once deployment diffuses. The second, simpler interpretation: the macro effects genuinely might be smaller than the micro signals suggest, at least on the current timeline. Denmark's economy is small, open, and heavily unionized — conditions that may not generalize. But administrative records covering an entire national economy are harder to dismiss than survey data. The null finding is a serious empirical challenge to any framework predicting near-term labor market disruption, including this one. [Assessment — both interpretations remain live]
Part VI: What Would Prove This Wrong
Four defeat conditions test whether the compression thesis holds.
Defeat Condition 1: Three or more AI-created task categories survive past five years without measurable automation compression, collectively representing more than 15% of AI-adjacent employment.
Verdict: Condition met at the occupation level. MLOps (5+ years, 9.8x growth), AI safety research (5 years, 21% annual growth), and AI training data curation (5–7 years, market growing from $3.7B to projected $17B) all survive past five years with continued expansion. This is evidence against universal compression at the occupation level. Whether these categories can sustain expansion is a separate question — AI safety's primary feeder pipeline (ML engineering) is the occupation showing the sharpest entry-level decline, and the field describes itself as talent-constrained even at current scale. The categories survived. Whether they can continue to grow while drawing from a contracting talent pool is untested. [Assessment — moderate confidence on occupation survival, high uncertainty on pipeline sustainability]
Defeat Condition 2: Structural features predicting compression resistance account for more than 50% of current professional employment.
Verdict: Likely met. Licensed professions account for 22% of US workers. Physical presence requirements affect approximately 60–65% (only 22.9% teleworked in Q1 2024). Adversarial dynamics, liability exposure, and judgment-intensive functions cover substantial additional employment in healthcare (18M workers), education (12M), legal (1.8M), and financial services. A conservative estimate is that 50–65% of US employment has at least one structural resistance feature. [Estimated — moderate-to-high confidence]
Defeat Condition 3: The historical base rate holds — AI-created categories compress on the same 5–15 year timeline as prior technology waves.
Verdict: Ambiguous. Current AI-created categories track similar timelines to prior waves — none of six historical comparators experienced occupation-level compression even after 15–25 years. But AI capability improvement rates have no historical precedent. The base rate may hold for current categories but break for future ones as capabilities approach broader thresholds. [Assessment — high uncertainty]
Defeat Condition 4: Task-level complementarity stabilizes — workers retain 30% or more of tasks with no compression trend over three or more years of measurement.
Verdict: Contingent on deployment type. The Anthropic Economic Index shows Computer and Math occupations at only 33% observed coverage versus 94% theoretical feasibility — a massive retained-task fraction. But the Stanford "Canaries" data reveals a critical moderating variable: the junior employment decline concentrates entirely in automation-prone occupations. In augmentation-prone roles — where AI assists rather than replaces human judgment — junior employment is stable. This means task-level complementarity does not stabilize uniformly. It stabilizes where firms choose augmentation and erodes where firms choose automation. The outcome is contingent on deployment choices that firms are making right now, not on a technological inevitability. [Measured — Stanford administrative data, Anthropic Economic Index; Assessment — moderate confidence, deployment-type distinction is the key moderating variable]
Part VII: The Fork
The strongest counter-thesis to Axiom 2 is that the domains showing task compression share specific structural features — formal output, objective quality metrics, low liability — that make them unrepresentative of professional work in general. Most professional domains have structural features that resist compression indefinitely, making the recursive substitution loop a narrow phenomenon rather than a general one.
The current data substantially supports this counter-thesis. The compression catalog shows 2 of 10 categories compressing. The structural resistance features cover an estimated 50–65% of employment. The historical base rate shows no prior technology-created occupation experiencing occupation-level compression. The Jevons paradox is not merely theoretical — some analyses of US employment data find AI-exposed occupations outperforming the broader job market in employment growth right now. [Estimated — synthesized from multiple employment analyses]
But the occupation-level scorecard may be measuring the wrong thing. The pipeline data — four independent studies converging on 13–20% junior employment decline in AI-exposed fields — suggests the recursive substitution loop operates on a level the catalog does not capture. The occupations expand. The pathways into them contract. Both statements are true simultaneously. Which one determines the long-run outcome depends on a single variable that the Stanford data identifies with unusual precision.
The automation-versus-augmentation deployment choice is the fork. Stanford's administrative data shows the junior employment decline concentrating entirely in occupations where AI automates tasks. In occupations where AI augments human judgment, junior employment is stable. The recursive substitution loop is not a technological inevitability. It is contingent on deployment choices that firms are making right now.
Some firms are choosing automation. Stripe's Minions automate the well-scoped tasks that previously justified hiring junior engineers. The budget channel documents how other firms are cutting junior headcount to fund AI experiments — displacement through non-hiring rather than termination, invisible in every measurement instrument designed to detect labor market distress. This is the Dissipation Veil (essay soon) operating at the pipeline level.
Other firms are choosing augmentation — and some appear to be doing so because they can see the Competence Insolvency forming. OpenAI is experimenting with "super juniors" — entry-level engineers with 0–3 years of experience who possess native AI fluency rather than traditional coding backgrounds, paired with very senior orchestrators. IBM's CHRO explicitly warned that without entry-level investment, "the well simply dries up," and the company tripled US entry-level hiring in 2026. These are firms acting on Competence Insolvency before it arrives — recognizing that if nobody trains juniors on fundamentals today, there are no seniors to hire in 2031.
This is what institutional redirect looks like when it works: not government regulation from above, but competitive self-interest from within. Firms that see the pipeline thinning and choose augmentation to preserve their future talent supply are performing institutional redirect at the firm level — driven by the same mechanism the framework describes in negative terms, producing the correction the framework assigns only 20–35% probability.
The Wage Signal Collapse essay specified a falsification condition: "New high-premium expertise categories emerge that absorb redirected human capital investment... stable career ladders with steep experience-earnings curves that attract and retain entrants over multi-year timescales." OpenAI's super-junior model is a variant — not new categories, but redesigned entry pathways into existing categories. If it scales, if IBM's entry-level reinvestment propagates across the industry, if AI-native degree programs produce graduates who can enter the expanding categories without the traditional CS-to-ML-to-safety pipeline — then the Competence Insolvency is averted and the framework requires revision. These signals are early. They are small relative to the contraction. They are also real, and the anti-confirmation protocol requires giving them their full weight.
The current data cannot resolve which side of the fork dominates. The automation path leads toward Competence Insolvency — expanding categories consuming a finite talent reservoir while the traditional pipeline atrophies. The augmentation path leads toward a restructured but functional labor market — new entry pathways, redesigned junior roles, institutional redirect through competitive self-interest. Both paths are empirically active. Both have measurable signals. The outcome is contested, not determined. And the fact that some firms are already choosing augmentation because they can see the insolvency forming demonstrates something the framework did not predict: its own mechanisms are visible enough to trigger a corrective response.
Evidence Classification Summary
| Claim | Classification |
|---|---|
| Prompt engineering compression lag ~18 months | [Estimated — aggregated from Indeed, ZipRecruiter, Microsoft, LinkedIn] |
| Stripe Minions 1,300+ PRs/week, no human-written code | [Measured — Stripe Engineering Blog, Feb 2026] |
| METR: experienced developers 19% slower with AI tools | [Measured — RCT, N=16, 246 tasks] |
| Meta-analytic average ~17% productivity gain | [Measured — Coupé and Wu 2025, 45 studies; mid-teens average consistent with secondary summaries] |
| AI safety FTEs grew 21% annually 2022–2025 | [Estimated — EA Forum census, self-reported] |
| IAPP: 1.5% governance headcount satisfaction | [Estimated — industry survey] |
| ~13–20% employment decline ages 22–25, AI-exposed occupations | [Measured — approximate range synthesized across Stanford/Harvard/IESE/Anthropic; individual studies vary in methodology and precise estimates; attribution uncertain, multi-causal] |
| Junior decline concentrates in automation-prone, not augmentation-prone roles | [Measured — Stanford "Canaries" ADP administrative data] |
| Pipeline disruption via hiring freezes, not layoffs | [Measured — Harvard/Revelio résumé data, 62M workers] |
| 78% of AI/ML roles require 5+ years experience | [Estimated — Axial Search, 10,133 job postings] |
| 50–65% employment has structural resistance features | [Estimated — aggregated from BLS, NCSL, telework data] |
| Denmark: zero macro effects on earnings and hours | [Measured — Humlum and Vestergaard, admin records] |
| AI-exposed occupations outperform job market in employment growth | [Estimated — synthesized from multiple employment analyses including Fortune-cited reports] |
| JPMorgan revenue/employee $534K, top Evident AI Index 3 years running | [Measured — financial data, Evident AI Index] |
| Top-10 banks AI scores growing 2.3x rate of peers | [Measured — Evident AI Index] |
| SWE-bench Verified: 4.4% to 70%+ in one year | [Measured — benchmark tracking] |
| Enterprise AI pilot failure rate (80–95% range across surveys) | [Estimated — synthesized from MIT NANDA interviews, McKinsey, Gartner surveys; exact figure varies by methodology] |
| OpenAI "super junior" model as pipeline redesign | [Estimated — Pragmatic Engineer reporting, OpenAI careers data] |
| Agent paradigm driving entity substitution | [Projected — theoretical extrapolation] |
| Boomerang: voluntary departures generating competitors (Anthropic, Perplexity, Mistral) | [Measured — valuations and founding histories] |
| Curation-layer removal producing quality degradation (Sonos, CrowdStrike, Klarna) | [Case Studies — Illustrative, multi-causal] |
| Compression bifurcation persists 5+ years | [Assessment — 60–65% confidence] |
Where This Connects
The Theory of Recursive Displacement — Axiom 2. The finding: Axiom 2 holds in its attack surface (formal output, objective metrics, single-agent tasks) but the attack surface is narrower than the discontinuity claim requires at the occupation level. At the pipeline level, the loop may be wider — operating through the automation of junior tasks that built the expertise the expanding categories now consume.
The Dissipation Veil (essay soon) — two layers. First, the aggregated productivity data masks a bimodal distribution between copilot-paradigm and agent-paradigm organizations — the average is the Veil operating within the measurement itself. Second, the pipeline disruption — hiring freezes rather than layoffs, non-creation of positions rather than termination of workers — is the Veil's specific prediction about how displacement presents: invisible in unemployment statistics because people who were never hired do not appear as unemployed. The Stanford/Harvard/IESE convergence on "hiring freezes not layoffs" is independent empirical confirmation of the budget channel the Veil essay documents.
The Adversarial Equilibrium Trap — adversarial dynamics as a structural resistance feature confirmed across litigation, cybersecurity, and competitive strategy.
Structural Exclusion — the 13–20% employment decline for ages 22–25 aligns with what the agent paradigm automates: well-scoped tasks that previously justified entry-level hiring. The four-study convergence on this age-stratified pattern is the strongest empirical anchor for the Structural Exclusion thesis.
The Entity Substitution Problem — lateral entity substitution between peer incumbents (JPMorgan vs. smaller banks, Stripe vs. legacy payment processors) is an empirically supported extension of the original vertical model. The infrastructure moat inverts the expected direction: burdened incumbents with mature tooling deploy agents more effectively than lean entrants or weaker peers. The boomerang mechanism documented in the Enshittification Engine adds a faster channel: voluntary departure of curators generating the AI-native competitors that entity-substitute the firm that failed to retain them.
The Enshittification Engine (essay soon) — the curation function identified as the unifying structural resistance feature in Part III is the same function the Enshittification Engine documents being eliminated from production organizations. The expanding categories in the compression catalog resist compression because they are curation functions. Stripe's agent paradigm works because the curation layer (human review) is preserved. The Enshittification Engine predicts what happens when that layer is removed: quality degradation, senior talent departure, and self-generated competitive displacement. The compression catalog and the enshittification spiral are measuring the same boundary from opposite sides — one tracks what resists automation, the other tracks what breaks when the resistant function is eliminated.
Competence Insolvency — the pipeline finding is Competence Insolvency in formation. The expanding AI-native categories draw from a senior talent pool whose feeder pipeline is contracting. OpenAI and IBM are acting on this — building new junior pathways because they can see the insolvency forming. Whether enough firms follow is the live test of Competence Insolvency's falsification conditions.
The Orchestration Class — Stripe engineers operating Minions are the orchestration class in action: humans who review, direct, and decide rather than produce. Whether this role represents a durable chokepoint or a transitional waypoint remains open.
The Wage Signal Collapse — the automation-versus-augmentation fork determines whether the wage signal collapse is reversible. If firms choose augmentation and build new junior pathways (OpenAI, IBM), the career ladder survives in redesigned form and Falsification Condition 5 activates. If firms choose automation and let the pipeline atrophy, the wage signal collapses as predicted and the competence pipeline degrades on a 5–10 year delay.
tylermaddox.info Theory of Recursive Displacement — Empirical Validation Series March 2026
Ask questions about this content?
I'm here to help clarify anything