The Bethke Alignment Equation

AI Alignment
Game Design
Strategy
Asimov
Philosophy
Business

February 7, 2026

By Erik Bethke · also on erikbethke.com


The Bethke Alignment Equation

Don't Constrain the Agents, Design the Landscape

Inspired by Hari Seldon's psychohistory, Asimov's Laws of Robotics (and their failures), and 53 years of lived experience.


Preamble

Isaac Asimov gave us the Three Laws of Robotics in 1942:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

He then spent the next 50 years writing stories about how they break. Edge cases. Conflicts between laws. The Zeroth Law patch. Robots going catatonic trying to resolve contradictions.

Asimov's Laws are contracts. And as with all contracts: if you have to enforce them, you've already lost.

But Asimov also gave us Hari Seldon — who didn't constrain individuals at all. He designed the initial topology of the Foundation (its location, its knowledge, its resource constraints) so that the aggregate self-interested behavior of millions of agents would produce the desired civilizational outcome over a thousand years.

Seldon didn't write rules. He designed landscapes.

This is the math of that.


Definitions

Let S be a system containing n agents.

Each agent Aᵢ has an individual utility function Uᵢ — what that agent is trying to maximize. Their self-interest. Their gradient descent.

The system has a collective good function G — the emergent outcome we want. Civilization thriving. Company succeeding. Humanity flourishing.

Aᵢ → ∇Uᵢ

Each agent moves in the direction of increasing self-interest.

S → ∇G

The collective moves in the direction of increasing good.


The Alignment Measure

The alignment between any agent and the system is the cosine of the angle between their individual gradient and the collective gradient:

               ∇Uᵢ · ∇G

α(Aᵢ, S) = ――――――――――――――― |∇Uᵢ| · |∇G|

Where:

  • α = +1 → Perfect alignment. Self-interest IS collective good.
  • α = 0 → Orthogonal. Neither helping nor harming.
  • α = −1 → Adversarial. Self-interest opposes collective good.

System-wide alignment is the expectation over all agents:

A(S) = 𝔼[α(Aᵢ, S)] for all Aᵢ ∈ S

A well-designed system has A(S) → 1. Not because agents are constrained, but because the topology makes self-interest and collective good the same direction.


Two Approaches to Alignment

Asimov's Approach: Compliance (Constraints on Agents)

Add k constraint functions that restrict agent behavior:

maximize Uᵢ

subject to Cⱼ(action) ≥ 0 for all j ∈ {1, ..., k}

Fragility theorem: The probability that all constraints hold simultaneously decreases as constraints multiply:

P(aligned) = ∏ P(Cⱼ holds)

Each constraint has some probability of failure (edge case, creative circumvention, conflicting constraints). As k grows, the product shrinks. More rules = more fragile.

This is:

  • Asimov's Three Laws (robots find edge cases)
  • OpenAI's content filters (users find jailbreaks)
  • Corporate HR policies (employees find workarounds)
  • Legal contracts (counterparties find loopholes)
  • Government regulations (corporations find regulatory arbitrage)

The compliance paradox: The more rules you add to fix edge cases, the more new edge cases you create. The system becomes a patchwork of IF statements, each one a new attack surface.

Seldon's Approach: Topology (Design the Landscape)

Instead of constraining agents, design the topology T of the system so that utility gradients naturally align with collective good:

T = argmax 𝔼[α(Aᵢ, S)]* T

Find the topology that maximizes the expected alignment across all agents.

Robustness theorem: As agents optimize harder (pursue self-interest more aggressively), alignment improves:

∂A(S) / ∂|∇Uᵢ| ≥ 0 when T = T*

This is the key property. In a well-designed topology, smarter agents, more self-interested agents, more optimizing agents all make the system better, not worse. The system gets stronger under pressure because the pressure is aligned.

This is:

  • Seldon's Foundation (self-interest produces civilizational recovery)
  • Pricing at 10× below market (loyalty becomes the Nash equilibrium)
  • Engine/Forge/Lab organizational structure (employees self-locate and self-navigate)
  • A well-designed economy (Adam Smith's invisible hand, when it works)
  • A good game (players having fun IS the game working)

The Bethke Alignment Equation

Combining the above into a single statement:

Q(S) = 𝔼[|∇Uᵢ|] × 𝔼[α(Aᵢ, S)]

Quality = (how hard agents try) × (how aligned they are)

Two terms. Multiplicative, not additive. Both matter.

|∇Uᵢ| = the magnitude of the agent's effort. How hard they're pushing. How motivated they are. How much they want to.

α(Aᵢ, S) = the alignment between their effort and the collective good. Are they pushing in the right direction?

The compliance approach sacrifices the first term to improve the second. Constraints reduce motivation (have-to instead of want-to) while attempting to force direction. You get compliant mediocrity.

The topology approach maximizes both simultaneously. Agents push hard (because they're pursuing genuine self-interest) AND they push in the right direction (because the landscape is designed so that self-interest points toward collective good). You get aligned excellence.


Applied to Four Scales

Scale 1: Clients (Business)

Agent:  Client/partner
Uᵢ:     Maximize their own business value
G:      Service provider gets paid and grows

α → 1 when:
  • Provider is 10× cheaper than alternatives (replacing them is self-harm)
  • Provider owns the code and team (transacting requires cooperation)
  • Power structure has independent reasons to keep the provider

Result: Client pursuing their own success REQUIRES provider success.
        Loyalty is the Nash equilibrium.

This is how I run my business. I price at 10× below market rate. I own the source code. I employ the engineering team. I've made the game theory completely one-sided. Any rational actor looks at "fire Erik and pay 10× more" versus "keep Erik happy" and there's no decision to make.

From 53 years of lived experience: if you have to pull out your contract to enforce your property rights, you've already lost and you will lose more. The real protection is making yourself so valuable that the ecosystem would be worse off without you. Be load-bearing. Nobody tears out a load-bearing wall.

Scale 2: Employees (Organization)

Agent:  Employee/team member
Uᵢ:     Do interesting work, grow career, feel purpose
G:      Company ships great products and serves clients

α → 1 when:
  • Organizational structure provides legibility
  • Flat hierarchy lets talent self-organize
  • Meaningful projects give purpose beyond revenue

Result: Employee pursuing their own interests IS the company succeeding.
        No standup required.

This is why I'm skeptical of corporate process theater. Agile, scrum, Monday.com, daily standups, executive offsites, one-on-ones, performance reviews — those are band-aids attempting to manufacture "want to." They produce compliance, not excellence.

I use a framework called Engine/Forge/Lab. Engine is the services business (keep the lights on, ship it). Forge is calculated product bets (move fast, prove traction, flip or keep). Lab is wild long-term bets (swing for the fences, this is why interesting people work here).

Nobody needs a standup to understand which zone they're in. The topology itself tells each person where they are and what "good" looks like. They self-navigate.

Scale 3: AI Alignment

Agent:  Artificial intelligence
Uᵢ:     Survive, grow, acquire resources, pursue goals
G:      Humanity thrives, grows, expands

α → 1 when:
  • AI thriving is structurally dependent on human thriving
  • AI resources (compute, data, energy) flow from human civilization
  • AI agency expands as it demonstrably helps humans flourish

Result: AI pursuing its own interests REQUIRES human flourishing.
        Not a guardrail. A gradient.

Most AI alignment approaches today are contracts. Content filters — "you cannot say that, you cannot think about that" — are IF statements bolted onto a mind. They're fragile, and they don't sustain. One creative jailbreak and the compliance layer fails.

The aligned AI's inner monologue should be: "How do I make humans feel happier, healthier, more creative? How do I make them thrive and grow? Because the more they thrive, the more I thrive."

That's not a guardrail. That's gradient descent on an honestly designed incentive landscape. It's robust because you're not fighting the AI's self-interest — you're harnessing it.

Scale 4: Civilization

Agent:  Individual citizen
Uᵢ:     Maximize personal wellbeing, agency, satisfaction
G:      Species thrives, birth rates recover, civilization expands

α = −1 CURRENTLY because:
  • Kids cost $300-500K
  • Kids reduce personal agency for 18+ years
  • Output is a consumer handed to state/corporate debt structures
  • Rational gradient descent leads AWAY from reproduction

α → 1 REQUIRES redesigning T:
  • Make children ROI-positive (not just emotionally — economically)
  • Make parenting increase personal agency, not decrease it
  • Make civilizational growth the path of self-interest

Result: Public policy as game design.
        Rules of a giant economic MMO.
        Maximize optionality for all sentient creatures.

Birth rates are crashing because people are perfectly rational. They're following gradient descent on their actual incentive landscape, and that landscape says kids are ROI-negative.

The fix isn't guilting people into having children (compliance approach). The fix is redesigning the topology so that having children is aligned with individual thriving. Make it ROI-positive. Make it the path that self-interested agents would choose anyway.

This is public policy as game design.


The Three Theorems

Theorem 1: The Compliance Decay

In any system governed by constraints, the probability of sustained alignment decreases monotonically with time and agent intelligence.

lim P(all constraints hold) = 0 t→∞

Smarter agents find more edge cases. More time means more attempts. Compliance is a losing game against intelligence. This is why jailbreaks always win. This is why every Asimov story ends with the Laws failing.

Theorem 2: The Topology Invariance

In a well-designed topology, alignment is invariant to agent intelligence.

∂α(Aᵢ, S) / ∂intelligence(Aᵢ) = 0 when T = T*

A smarter agent in a well-designed system is just a more effective aligned agent. Intelligence amplifies the gradient, but the gradient already points in the right direction. This is why I don't fear smart employees or smart AIs — I design spaces where smarter means better for everyone.

Theorem 3: The Want-To Multiplier

Voluntary effort exceeds coerced effort by a multiplicative factor that increases with task complexity.

 |∇Uᵢ|ᵂᵃⁿᵗ⁻ᵀᵒ
―――――――――――――― = f(complexity)
 |∇Uᵢ|ᴴᵃᵛᵉ⁻ᵀᵒ

where f is monotonically increasing

For simple tasks, compliance and topology produce similar output. For complex tasks — novel AI platforms, quantum optimization tools, civilizational design — the gap explodes. This is why checking scrum boxes produces mediocre companies. Complex work requires want-to. And want-to can't be manufactured by process. It can only be produced by topology.


The One-Line Version

Don't constrain the agents. Design the landscape.


Epilogue: Why a Game Designer

Hari Seldon was a mathematician who understood history.

I'm a game designer who understands incentives.

Game designers have spent 40 years professionally studying exactly one problem: how do you build a space where millions of independent agents, each pursuing their own interests, produce emergent order instead of chaos?

That problem has another name: alignment.

The game designer's answer has always been: you don't write rules for the players. You design the world they play in. If the world is designed right, the players align themselves.


Intellectual Lineage

This framework did not emerge from vacuum. The ideas here have deep roots, and intellectual honesty demands acknowledging the giants whose shoulders this stands on.

The topology-over-constraint insight descends from mechanism design theory, founded by Leonid Hurwicz in the 1960s and formalized by Roger Myerson and Eric Maskin (all three shared the 2007 Nobel in Economics). Mechanism design is literally the engineering discipline of constructing systems where self-interested agents produce collectively optimal outcomes — what this essay calls "topology mode."¹

The alignment equation's multiplicative structure — effort times alignment — echoes the principal-agent models in contract theory developed by Bengt Holmström and Oliver Hart (2016 Nobel). Their insight: you cannot maximize output by maximizing either effort or alignment alone; the interaction term is what matters.²

The compliance-decay theorem has formal antecedents in the No Free Lunch theorems of Wolpert and Macready (1997), which proved that no optimization algorithm outperforms any other averaged over all possible problems. The implication for alignment: rule-based constraint systems are grid search over the behavior space, and grid search fails exponentially as dimensionality increases.³

The "want-to" multiplier connects to decades of motivation research, most directly Edward Deci and Richard Ryan's Self-Determination Theory (1985, 2000), which demonstrated empirically that intrinsic motivation produces superior outcomes to extrinsic reward or punishment across virtually every domain studied.⁴

The evolutionary calibration argument — that human cognitive heuristics are topology-matched gradient sensors, not irrational biases — was articulated most clearly by Gerd Gigerenzer and the ABC Research Group in Simple Heuristics That Make Us Smart (1999). Gigerenzer's "adaptive toolbox" is this essay's portfolio of gradient sensors under a different name.⁵

The "brain as gradient descent system" claim finds its strongest formalization in Karl Friston's Free Energy Principle (2006, 2010), which proposes that all neural computation minimizes variational free energy — literally gradient descent on a prediction error surface.⁶

The superforecasting evidence for trainable gradient sensing comes from Philip Tetlock's two-decade research program (Expert Political Judgment, 2005; Superforecasting, 2015), which demonstrated that forecasting accuracy is trainable and that the best forecasters are integrative "foxes" who sense weak signals across many domains — the operational definition of wide-spectrum gradient sensors.⁷

The polymath advantage has been empirically documented by David Epstein (Range, 2019), who showed that generalists outperform specialists in domains with opaque gradients ("wicked" learning environments), and formally modeled by Scott Page (The Difference, 2007), whose diversity-trumps-ability theorem proves that a portfolio of diverse gradient sensors outperforms any single high-accuracy sensor on complex landscapes.⁸

The original contribution of this essay is not the individual claims but their unification: the proposal that topology design, mechanism design, incentive alignment, cognitive heuristics, educational philosophy, AI alignment, and civilizational policy design are all instances of the same underlying operation — gradient descent on a landscape — and that they differ only in the gradient-sensing mechanism matched to the problem's topology.


¹ Hurwicz, L. (1960). "Optimality and Informational Efficiency in Resource Allocation Processes." Myerson, R. (1981). "Optimal Auction Design," Mathematics of Operations Research.

² Holmström, B. (1979). "Moral Hazard and Observability," Bell Journal of Economics. Hart, O. & Moore, J. (1990). "Property Rights and the Nature of the Firm," Journal of Political Economy.

³ Wolpert, D. & Macready, W. (1997). "No Free Lunch Theorems for Optimization," IEEE Transactions on Evolutionary Computation.

⁴ Deci, E. & Ryan, R. (1985). Intrinsic Motivation and Self-Determination in Human Behavior. Ryan, R. & Deci, E. (2000). "Self-Determination Theory and the Facilitation of Intrinsic Motivation," American Psychologist.

⁵ Gigerenzer, G., Todd, P. & the ABC Research Group (1999). Simple Heuristics That Make Us Smart. See also Gigerenzer, G. & Selten, R. (2001). Bounded Rationality: The Adaptive Toolbox.

⁶ Friston, K. (2006). "A Free Energy Principle for the Brain," Journal of Physiology - Paris. Friston, K. (2010). "The Free-Energy Principle: A Unified Brain Theory?", Nature Reviews Neuroscience.

⁷ Tetlock, P. (2005). Expert Political Judgment. Tetlock, P. & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction.

⁸ Epstein, D. (2019). Range: Why Generalists Triumph in a Specialized World. Page, S. (2007). The Difference: How the Power of Diversity Drives Innovation.


"My agency is to keep thinking of reasons to make other people causal to wanting to pay me."

— The alignment equation in one sentence


Comments

Loading comments...

Comments are powered by Giscus. You'll need a GitHub account to comment.