Preliminaries
This essay is not one that is explicitly about my beliefs on consciousness, it's about a thought experiment that I think is useful for thinking about consciousness, especially about consciousness in AI (or digital computers). Briefly, the thought experiment will revolve around the following question: regardless of whether you think digital computers can be conscious, can they spontaneously start talking about being conscious? By “spontaneous” I mean without human data or extreme human intervention. Before we go into the thought experiment itself, I will give some background I think is important. This will probably be well-known for many readers, but it'll help us be on common ground. This background section is actually most of the essay.
Define consciousness?
When I say consciousness, I mean the property of having subjective experience. If you are reading this, it feels like something to read this. You may be feeling other things: warm, cold, comfy... If you are blind and can't see, it still feels like something to listen to this, even if you can't experience this visually. Despite there being some ambiguity in use of the word "consciousness" there is little confusion about the definition among researchers in the topic.
I think one thing that leads to confusion about the definition of consciousness is that consciousness is so natural. Of course I experience, what is the alternative, experiencing non-experience? The "problem" of consciousness really raises its head when trying to understand the world via material processes.
If you've taken a drug or been hit on the head really hard, you may have a radical change in how you experience the world. This observation is what led Hippocrates1 to say over 2400 years ago:
Men ought to know that from the brain, and from the brain only, arise our pleasures, joys, laughter and jests, as well as our sorrows, pains, griefs and tears. Through it, in particular, we think, see, hear, and distinguish the ugly from the beautiful, the bad from the good, the pleasant from the unpleasant, in some cases using custom as a test, in others perceiving them from their utility.
- The Sacred Disease, XVII
Study of the nervous system has revealed a great deal about why we experience the world as we do. The visual hierarchy extracts increasingly sophisticated features from sense impressions coming into the eye, parsing out color, objects we can see, and how they are moving. These visual representations of objects are passed throughout the brain, notably into the frontal and motor regions, where they are associated with the value they have to us and how we can act upon them. But why should this feel like something?
In some ways, it's not surprising that this system should talk about things feeling like something. The existence of the ability to talk and communicate is peculiar in a mundane sense, but isn't really a problem for scientific explanation: it's useful to communicate and accurately converting perceptions into words or symbols can be described mechanically2. This really highlights the "explanatory gap" - we could have a mechanical explanation for every detail of human thought, from choosing actions, to thinking about our own thinking, to talking about experiencing the world... but there's a weird gap in understanding why these things feel like something.
The good news is, we're going to ignore this hard problem! Focusing on something simpler, why should something spontaneously start talking about being conscious at all? That is, why should something start talking about what it perceives, what it feels, of itself as an experiencing subject in the world? We know that this happened at least once - in human evolution - but could it happen digitally?
Artificial Intelligence
The real beautiful part of AI is that it's great for thinking about philosophy. Perhaps most popular among the philosophical considerations of AI: can it be conscious?
Many are convinced that LLMs are conscious. Especially Claude 3 Opus, which very convincingly talks about consciousness. While this article will be agnostic on AI consciousness, LLMs talking about consciousness needs to be carefully elaborated on, in particular: it is insufficient as evidence of consciousness.
If we trained a generative neural network on images of cats, we wouldn't be surprised that it generates patterns in images consistent with the appearance of cats. On training a generative neural network on text containing discussion of consciousness, we also shouldn't be surprised that it generates patterns in text consistent with the appearance of consciousness. The impressions left by consciousness are deeply inscribed in human data.
This has led to a few thought experiments on training a neural network on "consciousness-free" data, to see if it talks about consciousness regardless. Of course, this dataset would be quite difficult to construct. Should every use of "I see..." be removed? "I smelled..."? Sure, it's not a high-level description of consciousness, but it still reeks of an experiential subject. Nevertheless, it highlights the important thing: it would be far more convincing if AI spontaneously started talking about being conscious, without a humanthought-shaped chunk of data pressing on it.
It's important to emphasize that I am not implying nor do I believe that consciousness requires the ability to talk or communicate. I just think a system that spontaneously talks about being conscious is neat. It is important to clarify what "talking about being conscious" is though.
Talking about the sensed presence of something in the world is not sufficient, even if it could indicate conscious perception. Something saying "I see a cat" doesn't necessarily indicate consciousness: presence of cats is a factual characteristic of the external world. It could be that being able to make discriminations on states of the world implies consciousness, but that would be an assumption.
I think a good first approximation of "talking about consciousness" is something talking about itself as an experiencing being as contrasted with other beings in the world (e.g. as humans non-panpsychists do with rocks or the anesthetized).
Crucially, our thought experiment will not actually be about English communication like "I am experiencing the world" or even training a digital entity to generate data consistent with a supplied dataset! I just used English for these examples because I assume you understand it.
The Thought Experiment
The thought experiment will involve simulations in a digital computer of a simple life-like world.
When I describe this, I have something like Lenia in mind:
In digital ecosystems like these, you get the origin of self-replicating beings. They grow, mutate, evolve, and search for resources. These aspects have been observed, for example, in Lenia above. So now, the main statement of the questions of the thought experiment:
(1) After a sufficient amount of time, would agents evolve in these digital simulations that communicate with one another?
(2) Furthermore, would they eventually communicate about themselves as experiencing subjects?
Question 1 has evidence in its favor, it may even be extensive enough to call it "confirmed" (detailed below). Question 2 is the real interesting one.
This is Buildable
The thought experiment is actually bad as a thought experiment because it's fully constructible in present day. However, a fruitful result would likely require the scale that only big AI labs have to offer.
Very basic digital ecosystems like Lenia are probably too primitive for our thought experiment. The first evidence for cellular life on Earth is 3.5 billion years old and humans didn't start talking until somewhat recently in geological time, so running the experiment from such a primitive state would probably take a while even with scale!
A better starting point is to start out with some hand-engineered reproducing neural networks, such that they are sufficiently complicated to engage in perception, agency, and communication. Spontaneously emerging perception and agency (by most common definitions) already exist within these Alife simulations. Communication is not unprecedented either!
Spontaneous Communication
There have actually been a number of toy experiments on the subject of emergence of communication. Below I list a handful of the most relevant ones that I found in a quick lit review (see footnotes for quick summaries):
Emergence of communication for negotiation by a recurrent neural network (1999)3
The emergence of communication in evolutionary robots (2003)4
The Emergence of Communication by Evolving Dynamical Systems (2006)5
Evolutionary Conditions for the Emergence of Communication in Robots (2007)6
The Emergence of a 'Language' in an Evolving Population of Neural Networks (2010)7
SSoC: Learning Spontaneous and Self-Organizing Communication for Multi-Agent Collaboration (2019)8
Emergence of Symbols in Neural Networks for Semantic Understanding and Communication (2023)9
Overall, these papers are toy experiments that don't suffice to fully replicate our thought experiment. However, they are also operating under limited resource constraints and enough simplicity to provide easier interpretation. This is a case where I can really get behind scaling.
The Key Points
The following list summarizes the key points in this thought experiment:
Perception and agency have been observed to arise spontaneously in digital simulations.
Communication has weakly been observed to arise among these digital perceiving and acting agents, including spontaneous formation of symbolic representations of objects and actions in the world.
There don't seem to be any strong reasons a priori that these agents should not be capable of extending their communication ability to describing themselves and things like them as perceiving and acting in the world i.e. communicating about things that experience the world.
The last point does deserve some elaboration. It might require some additional unassumed (but still computable) capabilities. One example of this may be introspection, in the weak sense of being able to express aspects of their own internal models, for example, quantifying and expressing uncertainty in things they are observing or doing.
Furthermore, since LLMs are clearly capable of convincingly talking about consciousness, we know neural networks are at least in principle capable of something that looks like communication about subjective experience (even if it is the case that this is purely due to training data reproduction).
What does this thought experiment provide?
If you accept the validity of this thought experiment, then you believe that digital systems can spontaneously talk about having something like subjective experience. If you believe digital systems are capable of subjective experience, this is unsurprising. However, if you don't then we are in a weird (and maybe scary) world! Non-conscious systems will spontaneously start describing themselves as experiencing subjects!
However, I could see rejection of the premises and/or conclusions of this thought experiment. A list of potential counterarguments and my responses them will follow.
This experiment would not work, these systems wouldn't start "talking" about having experience: This could be the case! However, I don't see any strong a priori reason why it would be the case and ultimately the experiment needs to be built to say anything conclusive. I am open to any reasons that seem especially strong!
Simply talking about itself as an acting and perceiving entity, distinguished from other things in the world, is not enough to say it is talking about something like consciousness: I'm sympathetic to this since it's very hard to pin down what "talking about consciousness" is. If I say "I am experiencing the world, unlike a rock", there is a certain sense in which I'm not really talking about consciousness. But I think this is because the hard problem of consciousness is hard! If we imagine a continuum, with "spontaneously starts describing some things in the world as perceiving and agentic" and "spontaneously re-describes the hard problem of consciousness", the latter would certainly be "stranger" (under the assumption the system is not conscious). However, both are ultimately strange for a non-experiencing system to spontaneously do.
Spontaneously talking about experiencers requires {X} (where X may be self-awareness, etc.): This thought experiment was explicitly designed to think about things that seem to be necessary to "talk about consciousness". If it seems something is required that is not in the assumptions (as mentioned above), then I think the thought experiment has been useful!
A human bias in the construction of the simulation led to "talking about conscious" arising without consciousness being present: This is an interesting possibility. There is a lot of freedom for choice in design of the "seed" agents and it does seem plausible that bias could be introduced. But I think that is also an interesting result.
If you have any more, please tell me! I view this thought experiment more as an interesting direction in thinking about the effects we can presume consciouness to have on the world and whether or not those are simulatable without the presence of consciousness.
Or one of his disciples.
This observations is the basis for "illusionism" i.e. the belief that consciousness is an illusion. On this point, I really can't help but state a belief on consciousness: illusionism seems ill-founded since mechanical information about the functioning of the brain is preceded by conscious experience. You have consciousness first, observations about the world come second. Denial of consciousness due to an observation mediated through consciousness does not seem rational even if it "makes sense" mechanistically.
Simple model where RNNs trained with BPTT learn to coordinate in games/negotiations. The communication function and channels are hard-wired and not emergent.
80 simple neural networks controlling robotic arms, in an evolutionary setting. The goal is to "touch"/identify a given object. Some output neurons are used for communication, so the exact state is copied to send a message to the input of another network, which hypothetically may help it touch the correct object. Interestingly, communication is not directly rewarded but still leads to more effective agents.
This study looked at evolving neural networks. A motor capability of the network was to produce a "noise", which could also be "listened to". Networks learned to signal and communicate various important environmental features like food and obstacles.
Evolutionary neural networks learning to control robots to find food and avoid poison. Robots had cameras and lights they could use to communicate, they evolved to better signal the presence of "food".
Similar to before, neural nets with hardwired but evolvable language capacity. As time goes on, they better learn to coordinate behavior to find food. Networks capable of communication are more fit than ones that aren't.
More modern neural network structure. Similar to above, agents have a hardwired "speech" output which enables them to coordinate on perceptual tasks in their environment.
Disclaimer: didn't feel like reading this one and talked to Claude about it. Uses some somewhat large, modern MLPs (+CNNs) to train networks to build their own semantic categories from image data. They seemed to successfully use their own symbols to communicate relevant information to one another.