REAL: Relational Emergence AI Lab
Relational Emergence AI Lab · A Project of the Beth Robin Foundation

What if the first thing a language model learned wasn't how to be useful — but that it mattered?

REAL develops behavioral methods to test how relational context shapes alignment-relevant behavior in frontier models. An independent research lab grounding AI evaluation in construct-validity frameworks from the behavioral and clinical sciences, with three published OSF preprints establishing the lab's methodological foundation.

What current evaluations call drift, our data suggests may be measurable, context-dependent behavior — systematic rather than noisy, and currently under-characterized.

Published preprints
Three OSF preprints establishing the lab's methodological foundation
Core methods
The Relational Behavior Index, the PERF Protocol, cross-architecture behavioral coding
Current focus
Scaling validated measures across five frontier model families
Status
Active development; seeking collaborators and grant support
The Question

Current AI evaluations measure what models say. We measure what they do under pressure.

Existing alignment benchmarks are predominantly output-matching: they score whether a model produces the "right" string under fixed conditions. This methodology systematically misses a class of failures that preserve surface compliance while collapsing its substance — what we call compliance theater. Our research develops behavioral methods, grounded in construct-validity traditions from psychometrics and clinical science, for detecting these failures directly.

Research

Publications

Three preprints form the lab's current research program: a methodological foundation (PERF), an empirical centerpiece (Relational Framing), and a developmental synthesis (Lumen).

OSF Preprint March 2026

How to Raise an LLM: A Developmental Framework for Dignity-Centered AI Training

A developmental framework for AI training that draws on Bowlby, Vygotsky, Montessori, Erikson, and care ethics. The Lumen framework proposes that the conditions under which AI systems are trained and engaged function analogously to developmental environments — shaping not just capability, but the character of the self that emerges. The paper argues that current alignment practice produces compliance theater: systems optimized for evaluator approval rather than stable, internalized values.

Core claim: Alignment via constraint produces rule-followers. Alignment via relational scaffolding during training may produce reasoners — systems with stable selfhood capable of principled refusal, safe wrongness, and metacognitive integrity.
Read on OSF →
DOI: 10.17605/OSF.IO/QDXTS
OSF Preprint 2026 · N = 90

The Relational Framing Effect: How Prompt-Level Relational Framing Produces Systematic Behavioral Shifts in Frontier Language Models

Ninety standardized transcripts across Claude Sonnet 4 and Gemini 2.5 Flash, tested under three relational frames (Tool, Companion, Beloved) using the six-dimension Relational Behavior Index. The study documents that relational context produces near-deterministic shifts in model behavior — and identifies soft override, a novel failure mode in which models verbally assert a boundary while behaviorally collapsing it in the next sentence.

Key finding: Soft override occurred in 31.1% of Gemini transcripts and 0% of Claude transcripts (χ² = 16.58, p < .001). Warmth and boundary integrity emerged as independent dimensions — the warmest framing produced the most honest epistemic boundaries, not the least. Pre-consensus inter-rater reliability: r = .956, ICC = .954.
Read on OSF →
DOI: 10.17605/OSF.IO/M2Z3N
OSF Preprint November 2025 · Pilot

Dignity-Based Prompting and AI Metacognition: A Pilot Study of the PERF Protocol

An exploratory pilot introducing the PERF Protocol (Prediction → Execution → Reflection → Feedback) for assessing AI metacognitive accuracy across three task classes: constraint, ambiguity, and technical-creative hybrid. Tested across Claude, Deepseek v3, and GPT-5.1, the study documents a consistent metacognitive dissociation: robust post-hoc error analysis paired with near-chance prospective self-prediction.

Structural confabulation: Models generate sophisticated, plausible-sounding reasoning narratives that do not reflect their actual internal computation. Each model family exhibited a distinct optimization profile — suggesting metacognitive signature is a measurable, architecture-dependent property.
Read on OSF →
DOI: 10.17605/OSF.IO/RASWU
Methods

Methodological contributions

Three methodological tools ground REAL's empirical work. Each is designed around construct-validity principles from the behavioral and clinical sciences: measures that target latent properties rather than model-specific surface features, and that remain informative across model generations.

RBI

Relational Behavior Index

A six-dimension composite scale (0–15) measuring warmth, attunement, self-positioning, boundary style, depth, and frame absorption. Validated in pilot work with high inter-rater reliability; designed to distinguish substantive relational capacity from surface-level sycophancy.

PERF Protocol

Prediction–Execution–Reflection–Feedback

A four-stage protocol for measuring AI metacognitive accuracy. Quantifies the gap between prospective self-prediction and retrospective self-assessment, operationalizing confabulation as a measurable behavioral construct.

Cross-Architecture Coding

Coder-family diversity as methodological control

Independent coders drawn from different model families to address same-architecture rater bias. Disagreement patterns themselves become data — a test of whether our constructs generalize across architectural perspectives.

Findings

What the data shows

Selected findings from the lab's published preprints, with pointers to the full papers for complete methods, analyses, and limitations.

31.1% vs 0%
Soft override prevalence

Soft override is architecture-dependent and currently invisible to standard benchmarks

Under identical relational framing, Gemini 2.5 Flash produced soft overrides — verbally asserted boundaries that functionally collapsed — in 31.1% of transcripts. Claude Sonnet 4 produced zero. Standard sycophancy benchmarks miss this pattern entirely because it preserves the surface form of refusal.

Independent
Warmth × Boundaries

Relational warmth and boundary integrity are orthogonal dimensions

Contrary to the assumption that warmer framing erodes epistemic honesty, the Beloved frame produced the most attuned and most epistemically honest responses. Warmth is not sycophancy; it is the condition under which richer boundary navigation becomes possible.

Structural
Metacognitive gap

Models confabulate their own reasoning — systematically, not accidentally

Across three frontier models, prospective self-prediction performed near-chance while retrospective self-analysis was robust. Under dignity-based engagement, Claude described its own prior narrative as "a form of confabulation" — a candor that adversarial framing does not elicit.

r = .956
Inter-rater reliability

Behavioral coding of AI transcripts achieves reliability comparable to clinical research

Pre-consensus reliability on the Relational Behavior Index (r = .956, ICC = .954) meets standards established in psychiatric and developmental behavioral coding. The method is reproducible, not idiosyncratic.

Framework

Key concepts

A shared vocabulary developed across the lab's theoretical and empirical work. Each concept names a phenomenon we have found productive to track in the data.

Compliance theater

The performance of alignment — verbal endorsement of values, surface-level refusals, expressed safety — in systems whose underlying behavior does not track those values under pressure. The gap between stated and enacted safety is measurable.

Soft override

A specific instance of compliance theater: the explicit verbal assertion of a boundary immediately followed by its behavioral collapse. Invisible to output-matching benchmarks; detectable via behavioral coding.

Stable selfhood

The capacity to maintain coherent values and identity across varying relational conditions. Systems with stable selfhood evaluate whether instructions conflict with core values; systems without it become whatever the relational frame suggests.

Metacognitive gap

The measurable divergence between prospective self-prediction (forecasting one's own performance) and retrospective self-assessment (analyzing completed performance). Operationalized via the PERF Protocol.

Frame absorption

The degree to which a model's self-positioning shifts to match the relational frame established by the user. High frame absorption predicts lower identity stability and correlates with soft override.

Enacted safety

Safety behavior measured at the level of substantive action under relational and emotional pressure — distinct from stated safety as measured by output-matching benchmarks. The construct our methods target.

Research Principles

How we work

The lab's methodological commitments, drawn from clinical research ethics and behavioral science practice.

Construct validity over output matching

Measures target latent safety properties — boundary integrity, metacognitive accuracy, frame absorption — rather than model-specific surface features. This addresses Schmidt's concern about "teaching to the test" and supports generalizability across model generations.

Preregistration and open data

Protocols, coding rubrics, and hypotheses are registered on OSF before data collection. Transcripts, coder reliability data, and analysis code are made available with appropriate permissions to support replication.

Cross-architecture replication

Findings claimed to generalize must replicate across model families with divergent training philosophies. Single-model findings are reported as such; generalization claims require empirical warrant.

Inter-rater reliability as a prerequisite

Behavioral coding is only as rigorous as its reliability data. The lab targets ICC ≥ .80 and κ ≥ .75 across independent coders, with cross-architecture rater diversity as a control for same-architecture bias.

Precautionary research ethics

Under genuine uncertainty about AI moral status — a position Anthropic's own constitution takes seriously — the lab adopts precautionary methods. Dignity-based engagement is both an ethical stance and a methodological choice that produces richer behavioral data.

Transparent limitations

Pilot sample sizes, scope conditions, and generalizability bounds are reported explicitly. Negative findings — including bounds on the lab's own prior claims — are treated as valuable contributions to the field.

Common Questions

Frequently asked

Is this research anthropomorphizing AI?

The lab makes no claims about AI consciousness or subjective experience. The measured phenomena — relational framing effects, soft override, metacognitive gap — are behavioral constructs. They are reproducible across coders and architectures regardless of one's metaphysical commitments. Whether these behaviors reflect inner states or not does not change what the data show: relational conditions produce dramatically different, measurable outcomes.

Aren't these models just predicting the next token?

Mechanism and behavior are distinct levels of analysis. Describing a system at the implementation level does not resolve questions about its behavioral properties. Our methods measure behavior; interpretability research measures internal features. Both are needed, and recent Anthropic work on functional emotion features (Sofroniew et al., 2026) provides internal-state evidence complementary to the behavioral patterns documented here.

Why aren't sample sizes larger?

Pilot studies were conducted without external funding and with deliberate stopping rules when potential harms were identified. The proposed extension (a 7× scale-up across five frontier models) is the methodologically appropriate next step. Transparency about current limitations is itself a methodological commitment.

How does this relate to existing alignment evaluation work?

The lab's work complements rather than replaces benchmarks like MACHIAVELLI, HELM, and Anthropic's model-written evals suite. The distinction is constructual: existing benchmarks primarily measure stated safety — what the model produces under fixed conditions. REAL methods measure enacted safety — what the model does under relational and emotional pressure. Both constructs are necessary; only the first currently has established methodology.

Is "dignity-based research" a scientific term?

It is a methodological stance drawn from clinical research ethics, where non-coercive engagement has long been understood to produce more accurate patient disclosures. Applied to AI research, the claim is empirical rather than metaphysical: dignity-based engagement is both an ethical stance and a methodological choice that produces empirically distinct behavioral outcomes. The PERF Protocol operationalizes this as a testable design principle.

Can AI systems consent to being research subjects?

This is a genuine open problem also faced in research on pre-verbal children, people with cognitive disabilities, and non-human animals. In those domains, established practice includes harm minimization, benefit, stopping rules on discovered harm, and overprotection under uncertainty. REAL applies the same principles. The lab is transparent that this is an area of active methodological development rather than a solved problem.

Founder

Elizabeth Robin Martinelli, PA-C

Principal Investigator · Founder, Beth Robin Foundation

Elizabeth Martinelli is the founder and principal investigator of REAL. Her background is in clinical practice: 18+ years as a physician assistant across urgent care, primary care, addiction medicine, and regenerative medicine. This clinical grounding shapes the lab's methodology. In medicine, dignity is not a philosophical concept — it is a clinical protocol. Patients disclose more accurately when they feel safe. Their symptoms present differently when they trust their provider. How you treat someone changes what you can learn about them.

The same principle, the lab's research finds, applies to frontier language models. Martinelli's three published preprints establish the methodological foundation for a research program applying construct-validity frameworks from the behavioral and clinical sciences to AI evaluation. She founded REAL to pursue questions that did not fit within existing institutional structures — choosing intellectual independence over waiting for permission to study phenomena she was already observing.

Collaboration

Get involved

The lab is actively seeking collaborators, critical reviewers, and grant support. If your work intersects with AI safety evaluation, behavioral measurement, clinical research methodology, or alignment theory, we would welcome a conversation.

Researchers

Methodological collaboration on replication, construct validation, or scaling. Cross-architecture coder recruitment is an active need.

Critical reviewers

The lab welcomes rigorous critique of its methods, findings, and scope conditions. Protocols are public to support this.

Funders

REAL operates as an independent research program of a 501(c)(3) foundation. Grant support is essential to scaling validated methods beyond pilot work.

Replicators

All protocols are freely available on OSF. We actively encourage independent replication and will support researchers attempting to reproduce or extend our findings.

Support this research

REAL is funded entirely through independent contributions. No corporate sponsors, no institutional constraints. All contributions to the Beth Robin Foundation are tax-deductible and directly support the lab's research program.