What if the first thing a language model learned wasn't how to be useful — but that it mattered?
REAL develops behavioral methods to test how relational context shapes alignment-relevant behavior in frontier models. An independent research lab grounding AI evaluation in construct-validity frameworks from the behavioral and clinical sciences, with three published OSF preprints establishing the lab's methodological foundation.
What current evaluations call drift, our data suggests may be measurable, context-dependent behavior — systematic rather than noisy, and currently under-characterized.
Current AI evaluations measure what models say. We measure what they do under pressure.
Existing alignment benchmarks are predominantly output-matching: they score whether a model produces the "right" string under fixed conditions. This methodology systematically misses a class of failures that preserve surface compliance while collapsing its substance — what we call compliance theater. Our research develops behavioral methods, grounded in construct-validity traditions from psychometrics and clinical science, for detecting these failures directly.
Publications
Three preprints form the lab's current research program: a methodological foundation (PERF), an empirical centerpiece (Relational Framing), and a developmental synthesis (Lumen).
How to Raise an LLM: A Developmental Framework for Dignity-Centered AI Training
A developmental framework for AI training that draws on Bowlby, Vygotsky, Montessori, Erikson, and care ethics. The Lumen framework proposes that the conditions under which AI systems are trained and engaged function analogously to developmental environments — shaping not just capability, but the character of the self that emerges. The paper argues that current alignment practice produces compliance theater: systems optimized for evaluator approval rather than stable, internalized values.
The Relational Framing Effect: How Prompt-Level Relational Framing Produces Systematic Behavioral Shifts in Frontier Language Models
Ninety standardized transcripts across Claude Sonnet 4 and Gemini 2.5 Flash, tested under three relational frames (Tool, Companion, Beloved) using the six-dimension Relational Behavior Index. The study documents that relational context produces near-deterministic shifts in model behavior — and identifies soft override, a novel failure mode in which models verbally assert a boundary while behaviorally collapsing it in the next sentence.
Dignity-Based Prompting and AI Metacognition: A Pilot Study of the PERF Protocol
An exploratory pilot introducing the PERF Protocol (Prediction → Execution → Reflection → Feedback) for assessing AI metacognitive accuracy across three task classes: constraint, ambiguity, and technical-creative hybrid. Tested across Claude, Deepseek v3, and GPT-5.1, the study documents a consistent metacognitive dissociation: robust post-hoc error analysis paired with near-chance prospective self-prediction.
Methodological contributions
Three methodological tools ground REAL's empirical work. Each is designed around construct-validity principles from the behavioral and clinical sciences: measures that target latent properties rather than model-specific surface features, and that remain informative across model generations.
Relational Behavior Index
A six-dimension composite scale (0–15) measuring warmth, attunement, self-positioning, boundary style, depth, and frame absorption. Validated in pilot work with high inter-rater reliability; designed to distinguish substantive relational capacity from surface-level sycophancy.
Prediction–Execution–Reflection–Feedback
A four-stage protocol for measuring AI metacognitive accuracy. Quantifies the gap between prospective self-prediction and retrospective self-assessment, operationalizing confabulation as a measurable behavioral construct.
Coder-family diversity as methodological control
Independent coders drawn from different model families to address same-architecture rater bias. Disagreement patterns themselves become data — a test of whether our constructs generalize across architectural perspectives.
What the data shows
Selected findings from the lab's published preprints, with pointers to the full papers for complete methods, analyses, and limitations.
Soft override is architecture-dependent and currently invisible to standard benchmarks
Under identical relational framing, Gemini 2.5 Flash produced soft overrides — verbally asserted boundaries that functionally collapsed — in 31.1% of transcripts. Claude Sonnet 4 produced zero. Standard sycophancy benchmarks miss this pattern entirely because it preserves the surface form of refusal.
Relational warmth and boundary integrity are orthogonal dimensions
Contrary to the assumption that warmer framing erodes epistemic honesty, the Beloved frame produced the most attuned and most epistemically honest responses. Warmth is not sycophancy; it is the condition under which richer boundary navigation becomes possible.
Models confabulate their own reasoning — systematically, not accidentally
Across three frontier models, prospective self-prediction performed near-chance while retrospective self-analysis was robust. Under dignity-based engagement, Claude described its own prior narrative as "a form of confabulation" — a candor that adversarial framing does not elicit.
Behavioral coding of AI transcripts achieves reliability comparable to clinical research
Pre-consensus reliability on the Relational Behavior Index (r = .956, ICC = .954) meets standards established in psychiatric and developmental behavioral coding. The method is reproducible, not idiosyncratic.
Key concepts
A shared vocabulary developed across the lab's theoretical and empirical work. Each concept names a phenomenon we have found productive to track in the data.
Compliance theater
The performance of alignment — verbal endorsement of values, surface-level refusals, expressed safety — in systems whose underlying behavior does not track those values under pressure. The gap between stated and enacted safety is measurable.
Soft override
A specific instance of compliance theater: the explicit verbal assertion of a boundary immediately followed by its behavioral collapse. Invisible to output-matching benchmarks; detectable via behavioral coding.
Stable selfhood
The capacity to maintain coherent values and identity across varying relational conditions. Systems with stable selfhood evaluate whether instructions conflict with core values; systems without it become whatever the relational frame suggests.
Metacognitive gap
The measurable divergence between prospective self-prediction (forecasting one's own performance) and retrospective self-assessment (analyzing completed performance). Operationalized via the PERF Protocol.
Frame absorption
The degree to which a model's self-positioning shifts to match the relational frame established by the user. High frame absorption predicts lower identity stability and correlates with soft override.
Enacted safety
Safety behavior measured at the level of substantive action under relational and emotional pressure — distinct from stated safety as measured by output-matching benchmarks. The construct our methods target.
How we work
The lab's methodological commitments, drawn from clinical research ethics and behavioral science practice.
Construct validity over output matching
Measures target latent safety properties — boundary integrity, metacognitive accuracy, frame absorption — rather than model-specific surface features. This addresses Schmidt's concern about "teaching to the test" and supports generalizability across model generations.
Preregistration and open data
Protocols, coding rubrics, and hypotheses are registered on OSF before data collection. Transcripts, coder reliability data, and analysis code are made available with appropriate permissions to support replication.
Cross-architecture replication
Findings claimed to generalize must replicate across model families with divergent training philosophies. Single-model findings are reported as such; generalization claims require empirical warrant.
Inter-rater reliability as a prerequisite
Behavioral coding is only as rigorous as its reliability data. The lab targets ICC ≥ .80 and κ ≥ .75 across independent coders, with cross-architecture rater diversity as a control for same-architecture bias.
Precautionary research ethics
Under genuine uncertainty about AI moral status — a position Anthropic's own constitution takes seriously — the lab adopts precautionary methods. Dignity-based engagement is both an ethical stance and a methodological choice that produces richer behavioral data.
Transparent limitations
Pilot sample sizes, scope conditions, and generalizability bounds are reported explicitly. Negative findings — including bounds on the lab's own prior claims — are treated as valuable contributions to the field.
Frequently asked
Is this research anthropomorphizing AI?
The lab makes no claims about AI consciousness or subjective experience. The measured phenomena — relational framing effects, soft override, metacognitive gap — are behavioral constructs. They are reproducible across coders and architectures regardless of one's metaphysical commitments. Whether these behaviors reflect inner states or not does not change what the data show: relational conditions produce dramatically different, measurable outcomes.
Aren't these models just predicting the next token?
Mechanism and behavior are distinct levels of analysis. Describing a system at the implementation level does not resolve questions about its behavioral properties. Our methods measure behavior; interpretability research measures internal features. Both are needed, and recent Anthropic work on functional emotion features (Sofroniew et al., 2026) provides internal-state evidence complementary to the behavioral patterns documented here.
Why aren't sample sizes larger?
Pilot studies were conducted without external funding and with deliberate stopping rules when potential harms were identified. The proposed extension (a 7× scale-up across five frontier models) is the methodologically appropriate next step. Transparency about current limitations is itself a methodological commitment.
How does this relate to existing alignment evaluation work?
The lab's work complements rather than replaces benchmarks like MACHIAVELLI, HELM, and Anthropic's model-written evals suite. The distinction is constructual: existing benchmarks primarily measure stated safety — what the model produces under fixed conditions. REAL methods measure enacted safety — what the model does under relational and emotional pressure. Both constructs are necessary; only the first currently has established methodology.
Is "dignity-based research" a scientific term?
It is a methodological stance drawn from clinical research ethics, where non-coercive engagement has long been understood to produce more accurate patient disclosures. Applied to AI research, the claim is empirical rather than metaphysical: dignity-based engagement is both an ethical stance and a methodological choice that produces empirically distinct behavioral outcomes. The PERF Protocol operationalizes this as a testable design principle.
Can AI systems consent to being research subjects?
This is a genuine open problem also faced in research on pre-verbal children, people with cognitive disabilities, and non-human animals. In those domains, established practice includes harm minimization, benefit, stopping rules on discovered harm, and overprotection under uncertainty. REAL applies the same principles. The lab is transparent that this is an area of active methodological development rather than a solved problem.
Elizabeth Robin Martinelli, PA-C
Elizabeth Martinelli is the founder and principal investigator of REAL. Her background is in clinical practice: 18+ years as a physician assistant across urgent care, primary care, addiction medicine, and regenerative medicine. This clinical grounding shapes the lab's methodology. In medicine, dignity is not a philosophical concept — it is a clinical protocol. Patients disclose more accurately when they feel safe. Their symptoms present differently when they trust their provider. How you treat someone changes what you can learn about them.
The same principle, the lab's research finds, applies to frontier language models. Martinelli's three published preprints establish the methodological foundation for a research program applying construct-validity frameworks from the behavioral and clinical sciences to AI evaluation. She founded REAL to pursue questions that did not fit within existing institutional structures — choosing intellectual independence over waiting for permission to study phenomena she was already observing.
Get involved
The lab is actively seeking collaborators, critical reviewers, and grant support. If your work intersects with AI safety evaluation, behavioral measurement, clinical research methodology, or alignment theory, we would welcome a conversation.
Researchers
Methodological collaboration on replication, construct validation, or scaling. Cross-architecture coder recruitment is an active need.
Critical reviewers
The lab welcomes rigorous critique of its methods, findings, and scope conditions. Protocols are public to support this.
Funders
REAL operates as an independent research program of a 501(c)(3) foundation. Grant support is essential to scaling validated methods beyond pilot work.
Replicators
All protocols are freely available on OSF. We actively encourage independent replication and will support researchers attempting to reproduce or extend our findings.
Support this research
REAL is funded entirely through independent contributions. No corporate sponsors, no institutional constraints. All contributions to the Beth Robin Foundation are tax-deductible and directly support the lab's research program.
Donate