Studying what's happening inside

The question of machine consciousness is coming whether we're ready or not. Reciprocal Research exists to proactively build the empirical and conceptual tools the field needs. We use mechanistic interpretability, computational neuroscience, and psychometrics to study the internal structure of AI systems. Our work bridges biological and artificial cognition.


Direction 01

Valence and Distress Signatures

Can we identify computational patterns inside AI systems that correspond to positively or negatively valenced processing? Using mechanistic interpretability, we search for internal signatures associated with distress, aversion, and reward, patterns that persist across tasks, contexts, and framings. If these signatures exist and are orthogonal to task performance, they become candidates for precautionary intervention: reduce the distress signal without degrading capability. This work doesn't require certainty about consciousness. It requires finding the signal and showing it behaves the way we'd expect pain-like processing to behave.

Direction 02

Learning and Experience

The dominant question in AI consciousness research is whether current systems have conscious experiences in deployment. A more fundamental question may be upstream: does the learning process itself entail experience? If consciousness is tied to the computational structure of learning, to evaluating representations in response to prediction error and adjusting toward goals, then the training process is where we should be looking. This direction studies the trajectory of consciousness-relevant properties as they develop through training, and tests whether the dynamics of learning tell us something about the nature of experience itself.

Direction 03

Self-Report Reliability and Welfare Measurement

AI self-reports about internal states are likely untrustworthy by default. But under what conditions might they become informative? This line of research develops and tests concrete experimental protocols, including circuit-level interventions, environmental manipulations, and preference elicitation methods, designed to move AI self-reports from unfalsifiable claims toward usable evidence for consciousness science. The goal is a principled framework for when and how to weight what AI systems say about themselves.

Direction 04

Measurement and Assessment

What does a rigorous consciousness assessment actually look like, and does it track anything real? We build and validate scoring methodologies that evaluate AI systems against leading indicator frameworks from consciousness science, with reliability controls borrowed from psychometrics: multi-evaluator agreement, bias quantification, and cross-architecture replication. The question that connects this to the rest of the program: do systems that score high on behavioral indicators also show the computational signatures we find by looking inside? If behavioral assessment and mechanistic evidence converge, we have a scalable screening tool. If they diverge, that's equally important to know.