Hi Everyone,
This is not my usual content, but recent headlines about Anthropic exploring consciousness in Claude 3.7 and 4 caught my attention. I want to take a moment to explore the concept of consciousness in LLMs from a computer and cognitive science lens.
Kyle Fish, an alignment researcher at Anthropic, recently estimated that their latest models might have a 0.15% to 15% chance of being conscious. In response, Anthropic launched a new research initiative in late April 2025 to explore the welfare of AI systems. The goal is to examine whether these models deserve some level of moral consideration.
What Is “Model Welfare”?
The concept of AI welfare emerges from the possibility that artificial systems, particularly those with sophisticated behavior and responses, might eventually become moral patients, entities that can be harmed or benefited and are thus deserving of ethical consideration. You can read more about the topic in this paper, "Taking AI Welfare Seriously" by Robert Long, Jeff Sebo, and colleagues (2024).
Anthropic’s Model Welfare program is an attempt to assess if AI systems can hold preferences and whether they can exhibit signs of distress. The program has generated significant debate within the AI community. Most argue that current AI systems are statistical engines without consciousness or emotions. Anthropomorphizing them risks misunderstanding their nature and spreading unnecessary fear.
What Do We Mean by “Consciousness”?
Consciousness is one of the most significant and unique aspects of the human mind, deeply influencing how we perceive our existence and sense of control. Yet we don't really have a solid understanding of how human consciousness actually works. With many competing theories, it becomes even trickier to define or understand consciousness in LLMs.
Most researchers, however, work with this definition:
“A system is conscious if it has subjective experience, the felt sense of being.”
It is the experience of being aware: noticing a sunset, feeling joy, and reflecting on a memory. In cognitive science, consciousness is often broken into 2 major types:
For us humans, both Phenomenal and Access Consciousness are essential components of what it means to be a conscious being.
This integration is a big part of what sets Biological Consciousness apart from Artificial or Digital Consciousness.
LLMs may exhibit traits that resemble access consciousness. They can reason through problems, plan sequences of actions, and even monitor and revise their own outputs. These behaviors make them appear intelligent and reflective. But these are surface-level imitations.
What they entirely lack is phenomenal consciousness, the subjective and felt experience of being. They don’t feel pain or perceive the physical world. There is no “someone home” inside the model. As such, they don’t integrate access consciousness with subjective experience, which is what gives human consciousness its richness and moral weight.
Frameworks for Evaluating LLM Consciousness
There are several theories that attempt to explain consciousness. These theories also offer frameworks to explore whether machines could ever meet the criteria:
Recurrent Processing Theory (RPT): Consciousness arises when information in the brain flows in feedback loops.
LLMs lack true recurrent and cyclical information processing. The mechanism is largely sequential and unidirectional through the network.
Integrated Information Theory (IIT): Consciousness is quantified by how much information in a system is both integrated and differentiated.
Current LLMs are essentially the opposite of what IIT predicts for consciousness. They're designed for modularity and decomposability rather than irreducible integration.
Embodiment Theory (ET): Consciousness is fundamentally shaped by having a physical body.
LLMs are completely disembodied, they lack the physical body to interact with the world.
Global Workspace Theory (GWT): Consciousness arises when information is globally available in the brain's “workspace” for decision-making, memory, and speech.
Some LLM architectures begin to resemble this structure, particularly in their global influence of token attention and distributed processing.
Higher-Order Thought (HOT): Consciousness occurs when we have thoughts about our thoughts. A mental state becomes conscious only when there's a higher-order mental state that monitors it.
LLMs exhibit some HOT-like behaviors. They consistently demonstrate reflection about their own thoughts and monitor their mental processes.
Consciousness-Related Capabilities of LLMS
Although LLMs don’t meet the gold standard for consciousness, they do show striking behaviors often associated with conscious minds:
Self-Reflection: Some LLMs revise and explain their own reasoning.
Theory of Mind: Advanced models can guess what others might be thinking.
Situational Awareness: They can sometimes "realize" the context they’re in, like knowing they’re being tested.
Metacognition: They express uncertainty and sometimes change their answers based on it.
Sequential Planning: They can create and follow multi-step plans, showing a sense of goal direction.
Creativity: They produce novel stories, solutions and scientific ideas.
Why They Still Aren’t Conscious
Despite these cognitive capabilities, current LLMs lack the core features that most theories associate with true consciousness:
Lack of embodied experience
No emotions or sensations
Lack of subjective experience
No unified self-model: coherent sense of self and continuous experience over time.
Absence of intentionality
Designed for mimicry
Pathways for LLM Consciousness
Several challenges need to be addressed to advance research:
Interpretability: Relying solely on behavioral metrics is insufficient. Understanding the internal mechanisms that give rise to consciousness-related capabilities is crucial to ensure genuine understanding rather than just optimizing for external metrics. Techniques like linear probing are being used to investigate how concepts are encoded within LLMs.
Physical Intelligence: Researchers are exploring ways to integrate language models with robotic platforms. Another approach involves simulating embodiments in 3D environments, often using World Models. These methods aim to improve capabilities such as commonsense reasoning and spatial understanding.
Multi-agent systems: Investigating emergent LLM consciousness through multi-agent collaboration is a promising direction. Studying how LLM agents develop social conventions and exhibit self-monitoring-like capabilities in interactive environments can provide insights.
The big Question
Should AI be conscious? I believe AI should neither aim to replicate human consciousness nor be expected to achieve it. Even if it ever approximates something like consciousness, I believe it would be superficial at best, a kind of "Digital Consciousness" that simulates awareness without truly possessing it.
To me, true consciousness is sacred, and fundamentally tied to biological life. This position may be controversial, especially as AI grows more capable and humanlike in its behavior. But I believe there’s a profound distinction between simulating a mind and being a mind, and that distinction matters.
What are your thoughts?
Do you believe AI should, or even could, be conscious one day? I’d love to hear your perspective!
I hope you enjoyed this article!
If you enjoy my content and want to participate in deeper, ongoing conversations like this, connect with me through my new community here 🎉 :
Thank you for reading my post! Exciting things are coming, so stay tuned for the next one!