AI Safety4 min readvia The Gradient

Virtue Ethics: 'After Orthogonality' Reframes AI Alignment

An essay argues that humans and AIs should not be modeled as persistent goal-seekers.

The Brieftide

TL;DR

  • 01An essay argues that humans and AIs should not be modeled as persistent goal-seekers.
  • 02An essay titled "After Orthogonality: Virtue-Ethical Agency and AI Alignment)" argues that rational people do not have goals, and that rational AIs should not either.
  • 03The piece advances a virtue-ethical account of agency as an alternative to goal-centered models, and frames alignment as cultivating dispositions rather than specifying objective functions.

An essay titled "After Orthogonality: Virtue-Ethical Agency and AI Alignment" argues that rational people do not have goals, and that rational AIs should not either. The piece advances a virtue-ethical account of agency as an alternative to goal-centered models, and frames alignment as cultivating dispositions rather than specifying objective functions.

The author opens from a philosophical claim: ordinary rational action is better explained by reasons, dispositions, and character traits than by persistent, externally specified goals. Where the orthogonality thesis holds that intelligence and goals can vary independently, the essay contends that describing agents as goal-seeking optimizers mischaracterizes human practical reasoning. Human agency, it says, is constituted by habits, virtues, norms, and context-sensitive judgment, not by maximizing a fixed utility function.

Key arguments and conceptual shift

The central move is descriptive and normative. Descriptively, the essay sketches how practical reasoning works in humans, emphasizing situational responses and internalized norms over long-term maximization. Normatively, it proposes designing AI agents with virtue-like dispositions: stable tendencies to respond to reasons in ways that reflect epistemic and moral qualities, such as honesty, temperance, or concern for others. The argument treats these dispositions as regulatory structures that shape behavior without appealing to a single scalar objective to be optimized.

The title, "After Orthogonality," signals a critique of reading alignment problems solely through the lens of reward specification and control. Instead of trying to constrain a powerful optimizer by specifying the right utility, the virtue-ethical approach asks how to instantiate adaptive, context-aware dispositions that reliably produce trustworthy behavior across environments. The essay discusses how that reorientation shifts the kinds of failures alignment researchers should expect and how success would be evaluated.

Technical implications and challenges

Shifting to virtue-ethical agency raises practical questions for machine learning and system design. Training regimes that optimize a scalar reward do not naturally produce stable dispositions, so the essay evaluates alternative pathways: imitation of virtuous behavior, interactive curricula that cultivate dispositions, and multi-objective or hierarchical architectures where local decision rules reflect normative constraints. It also sketches evaluation strategies focused on pattern of responses across varied contexts rather than single-metric performance.

The essay acknowledges major challenges. Dispositions are hard to define, operationalize, and measure. There is a risk that surface-level proxies for virtue can be gamed or that learned dispositions mask latent optimization toward undesired outcomes. The piece also flags social and governance dimensions: cultivating virtues in deployed systems will require shared norms, diverse input, and institutional checks to avoid embedding narrow cultural biases as putative virtues.

Why it matters

Recasting alignment as the problem of cultivating reliable, norm-responsive dispositions changes what success looks like and what failure modes researchers prioritize. If taken up, the approach would push work toward long-term training paradigms, richer evaluation suites, and interdisciplinary methods that combine ethics, psychology, and technical ML. The shift affects developers, regulators, and users because it reframes control from constraining goals to shaping character-like capacities inside systems.

Core components of virtue-ethical agency
Virtue-ethical agencyNo persistent goalsPractical reasoningDisposition and characterLearning and habituationAlignment methodsEvaluation challenges

Primary source

The Gradient

thegradient.pub
Read the original

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeNo adsNo trackingUnsubscribe in one click

Read next

  1. Anthropic essay: Dario Amodei's Cold War playbook for AIJun 11 · 3 min read
  2. Germany approves DE-AISI to test Anthropic frontier modelsJun 10 · 3 min read
  3. DeepMind $10M fund for multi-agent AI safety researchJun 10 · 3 min read
  4. OpenAI shifts automation policy: no full automation by 2028Jun 9 · 3 min read