The following will derive some of the recent results on nonequilibrium thermodynamics and how they lead up to Jeremy England's thoughts on the relationship between nonequilibrium thermodynamics and life. It will naturally be a bit mathematically dense, but should be readable if you have some background in probability theory and linear algebra; with topics like conditional probabilities, expectations, and entropy being helpful to understand. If the following relationships look familiar you should be fine:
Then you probably remember enough to follow this! However expect a bit of handwaving, as this is distillation of decades of human thought :)
Thermodynamics
Equilibrium and Statistical Mechanics
I'll briefly give some background on thermodynamics and statistical mechanics. The study of thermodynamics arose primarily with the study of heat engines. Heat engines rely on heating a gas to expand it, which can then be used to derive mechanical work, like moving a piston. Early experimentalists were able to precisely quantify how much work they could get when a gas was heated with certain amounts of fuel. They found that these could be quantified with intuitive variables like pressure, volume, temperature, and heat.
In the process, they discovered phenomena like the 2nd Law of Thermodynamics, which led to the introduction of additional variables like entropy. At this point in time, thermodynamics was mostly experimental. Major progress in understanding thermodynamics then came in the form of statistical mechanics. Statistical mechanics aims to understand the macroscopic properties of matter by combining simplified models of the microscopic details of matter with statistical assumptions about how the matter behaves in bulk.
As an example, a gas can be thought of as many poolball-like molecules bouncing around, rarely interacting or colliding. By considering the average force and momentum that gets transferred as these molecules collide with a container they are in, the experimentally observed laws describing gases can be derived from pure theory. Statistical mechanics applies to more than just gases though, it has been applied across a range of systems, from solids to magnets.
A key idea here is that the "macrostate" of a system - what we observe when we measure a large system with, for example, thermometers, pressure gauges, and calorimeters - can be realized by many different "microstates" of the system. Microstates represent the detailed state of every molecule (or more generally, component) in a system.
The number of microstates of a given system is usually incomprehensibly large. As a concrete example, consider the number of "microstates" consistent with the claim that "the first 5 cards in a given shuffled standard deck of cards are red" (which is a macrostate description!). This is a somewhat rare occurrence! There are 26 unique red cards in a deck (13 hearts, 13 diamonds), so there are 26 choices for the first card, 25 choices for the second card... yielding 26 × 25 × 24 × 23× 22 = 7,893,600 possible choices for these first five cards. The possible number of choices for the remaining 47 cards is 47! which is about 2.6 × 10⁵⁹. So despite the slightly uncommon nature of the first 5 cards being red, there is an incomprehensible number of ways it could happen.
Overall, the chance of a shuffled (which here means "all cards have equal chance") deck of cards starting with 5 red cards is about 1%. So despite the incomprehensibly high number of ways it can happen, the first 5 cards of a shuffled deck of cards being red is still somewhat rare.
Foreshadowing in a way, it goes without saying an intelligent being can organize the formless chaotic possibilities a deck of cards can have and make all sorts of possibilities more likely than shuffling can!
Actual physical systems have an even more vastly incomprehensible number of ways of being than a deck of cards. Something to note here is that they don't necessarily have a concept of shuffling. A gas might: the random motions of molecules in a gas might just shuffle all the molecules around so that any possible microstate of a gas is about equally likely. Building on this, the broader assumption that all microstates of a given physical system have equal probabilities is called the assumption of "equal a priori" probabilities. It's not accurate for most systems.
A slightly more educated view comes from what is called the "canonical ensemble" in statistical mechanics. In the canonical ensemble, the probability of a state is related to the negative exponential of its energy - put in more understandable terms, a state is exponentially less likely the more energetic it is. An intuition for why this is can be gained from the common "ball rolling over hills" analogy. If we imagine a hilly energy landscape with a ball rolling on it, the ball is more likely to get trapped in a deep energy valley and to not balance on the tops of the hills for long. The explicit equation for the probability of a microstate with energy
in a canonical ensemble is:
where the sum over j ranges over all possible microstates and β is proportional to 1/T for the temperature T of the system of interest. Those familiar with machine learning may recognize the familiar softmax function here. The derivation of this assumption is not especially difficult (given knowledge of thermodynamics), but I'll skip it here; the interested reader can see here for the derivation. The canonical ensemble will be showing up some more later.
There is one big issue in the historical study of thermodynamics however. Much of thermodynamics requires systems to be in equilibrium: they need to have been isolated and mixing for long enough that most differences in temperature, chemical concentration, or energy have disappeared. Our beloved canonical ensemble, mentioned above, usually represents a state of equilibrium as well.
The equilibrium assumption does not apply to most interesting systems. Our solar system is constantly being supplied with energy released by nuclear fusion in the sun and a living thing that goes into equilibrium is a dead one. For this reason, the development of nonequilibrium thermodynamics has been a major research goal in the last century and still continues in this century.
Nonequilibrium Thermodynamics
Early steps in humanity's investigation of nonequilibrium thermodynamics involved what is known as "linear response theory". This involves assuming the system of interest is only slightly perturbed from equilibrium, so that the equations can be modeled as the equilibrium system with a small linear correction. In this treatment, I'm going to skip over this theory as it is not completely relevant to us.
Our journey will start with a derivation of Crook's Fluctuation Theorem, one of the cornerstones of modern nonequilibrium thermodynamics. What follows will be a derivation of this for discrete systems, in particular Markov chains, where there are only a finite number of states the system can occupy. The continuous case can basically be derived by replacing all sums with integrals.
Crook's Fluctuation Theorem
Suppose a system can occupy a finite number of states, which can be indexed by natural numbers:
The system will be assumed to be evolving, so that if it is in a state i at a time t, it will evolve to some state j at t+1 (note j could be the same as i, i.e. it could "transition" to the same state). The transitions will be assumed "Markovian": a state in the system's evolution only depends on the state immediately preceding it, and not explicitly on states before that.
In general we will describe the state of a system at a time t by a variable x(t). The evolution of the systems we are describing is fundamental here, so we will be interested in trajectories of states. For example, if we look at our system from time 0 to time τ, it will assume states
We can describe the whole path taken by a trajectory
by taking each state and appending it into a single vector. This describes a system which involves in an arbitrary way, but we will add a little more structure to how our system evolves.
To describe how the states of the system evolve in a general way, we can use the following representation (if the right-hand side is confusing, don't worry, it's just notation!):
where M(t) is a matrix-valued function such that
One way to put this is that each entry of M indicates the probability (hence the P above) of state j transitioning to state i at time t: being greater than or equal to zero ensures there are no negative probabilities and summing to 1 indicates that at least one of the transitions must happen.
We will now introduce p(t), a vector whose length is equal to the number of states, which specifies the probability of the system being in a state at a given time t. If we know the system state x(t) at time t, then the probability vector p(t) will simply have a 1 at the index of that state and be 0 elsewhere. However if we are a bit uncertain of the state, we can still understand the evolution of the system without perfect certainty of the exact state. This is because when the matrix M(t) acts on p(t) it describes the expected distribution of the subsequent states of the system:
In this way, M(t) can represent a "flow" of the probability distribution for the states: our system is one that evolves from a potentially uncertain initial state to another uncertain state, but if we know the chances of each state at a point in time, we can know the chances of the states at a future point in time.
We can imagine that as M(t) acts on a system, the probabilities will increasingly be mixed around. Are there any states exempt from this? For example, is there a maximally mixed state that can't be mixed anymore? Indeed, a system is said to be in balance if its state probability distribution is stationary when acted on by M:
which means that the probability distribution is unchanged at time t+1. The pi (π) symbol will be used to denote distributions that are stationary under the evolution matrix M(t), it's just a special symbol for this special probability distribution. Stationary distributions are comparable to being in thermodynamic equilibrium.
Energies
Now, we will assume each state has some energy associated with it, which for now is just a number, which for a time t and state x can be denoted by
We don't actually need to bring any associations with physical energy to this thing we are calling energy, we can actually get it by "reversing" the canonical ensemble to generate energies from probabilities. If we consider an arbitrary probability distribution
over the N states, we can use the fact that"
and the fact that for real z:
to define the energy of a state in that probability distribution from the probabilities:
where we write it as conditionally dependent on variables β and E, since those can in principle vary. In other words, we can define an "energy" from probabilities by reversing the softmax function (this may be familiar to those who have worked with energy-based models in ML). The term F in the equation above is just a way of absorbing the denominator
into the single exponential and is defined as:
where the facts that
were used. F is known as the (Helmholtz) free energy]; in physical systems (and when the actual physical quantity known as energy is used for the energies), this measures the amount of useful work a system can perform. You may notice that all of the above in this section up to now has been statistical, while we may have used some names from physics, all the defined quantities were defined in a purely statistical way. We will now tie physics in to this.
Connecting with Physics
In classical thermodynamics, heat and work are core concepts. Heat is the energy associated with a temperature difference between two systems: energy flows from hotter systems to cooler systems. Work describes how energy can be extracted from a system, for example, a car's engine does work when it converts chemical energy stored in gasoline to the movement of the wheels of the car. An important fact in classical thermodynamics is the first law of thermodynamics: for a system that is isolated so that it can't gain or lose energy, the sum of the heat and work is constant. This is often stated as "energy can't be created or destroyed".
In the more probabilistic formulation we have been developing, the heat
gained by a trajectory, and the work
done on it or by it over a period of time τ can be defined as:
What is the intuitive interpretation of these variables and how do they connect with thermodynamics in the real world?
The heat change Q on a trajectory over τ time steps is the change in energy due to change of the state x at each timestep, ignoring the change in energy between steps. In some ways, we can conceptualize it as "passive" energy change, it is the way the energy changes with passive shifting of our system from one state to the next, just as transfer of heat in the real world is usually a passive flow from hot to cold as time goes on.
The work is the change in energy associated with a change in the energy levels of the system. In some ways, this is more active as it reflects an addition of energy from an outside source. The work doesn't look at a difference in the state of system that are just due to random transitions, it looks at the how the energy associated with a given state changed. Noting that above we essentially defined energies from probabilities, a change in the energy of a state is a change in its probability. We can therefore imagine a change in the energies as an active change in likelihood of states in the system; in the literature, this often described as being brought about by a "driving-protocol", which may be something like an externally applied electric field.
Something to note here is that these definitions (which are technically probabilistic!) have an analog of the first law of thermodynamics:
where intermediate terms in the sum have canceled each other out. So the sum of heat and work on a trajectory are equal to the total change of energy for the system ( if no energy was added, it's unchanged).
A key point to mention here is that the work W is not generally the same if we reverse the trajectory. This contrasts with classical thermodynamics, where processes (such as compressing a gas to do work on it) are modeled as being frictionless, very slow etc. so that they are reversible - reversing the process returns everything to where it was and no energy is gained or lost. In the non-equilibrium case, we can't assume this reversibility.
We can still define a "reversible work" though. This is the minimum work required to change between two energy distributions. The reversible work Wᵣ is defined to be the change in free energy between the beginning and end states of a trajectory:
Note that reversible work only depends on initial and final state. The difference between the total work W[x] and the reversible work Wᵣ is called the dissipative work done during the state transitions, i.e.
Unlike the reversible work, the dissipative and total work depend on the path taken. One way to think about this: given an initial and final state, the minimal amount of work to go between them is simply the difference in their energies. However, when you go between them in practice, the work it takes will depend the details of the path you actually take.
A natural question in the development of non-equilibrium thermodynamics was: how is the dissipative work on a trajectory related to the reversed trajectory? Since we don't have reversibility, they will not be the same in general.
Reversing time
Given a trajectory x(t) and energy E(t) we can define the time reversed path and energy as:
So to reverse a trajectory (from a point in time τ), we start at $x(\tau)$ and go backwards each time step. Note that we denote the reversed trajectories with little hats in general.
Reversing the transition matrices M is a little more work. We'll start with the simple case where M doesn't depend on time (the time-homogenous case). We will say a probability distribution π is invariant under M if
so that the transition preserves the distribution. Invariance is another name for the fact that π is a stationary distribution. Why is this property important?
In the case of a time-homogenous transition matrix, the time-reversed transition matrix
should also have the same invariant distribution. Furthermore, the probability of a transition from state i → j in the forward trajectory should be equal to the transition j → i in the reverse trajectory for the invariant distribution. These requirements lead to the following equation for the entries of the time-reversed transition matrix:
Letting diag(v) denote the m×m matrix formed from an m-dimensional vector v by filling out a matrix diagonal with the values of v. Note this is just a convenient notational way to treat vectors as matrices. To get the matrix form of the time-reversed transition matrix satisfying the above equation, we multiply by the inverse of π (on the right) and transpose both sides, yielding (with some coercion of the vector values to diagonal matrices):
This gives us the time-reversed transition matrix (when M does not depend on t). Note that this does implicitly assume π is nowhere equal to zero, which can also be interpreted as meaning that all energies are finite in the stationary distribution.
Moving on to the time-varying case, we rely on the fact that the above equation for time-reversed transition matrix holds at each point in time : if we freeze a time-varying transition matrix at a given instant in time, it should behave like a time-inhomogeneous M for that point in time. Then for the time-inhomogeneous (time-varying) transition matrix:
Note that in this case, π(t) is the invariant distribution only for the transition matrix M(t) at that point in time t: in other words, generally:
The time-reversed transition matrix can now be used to advance the time-reversed system, with the following relationship:
Putting the pieces together
To recap, we have looked at the probability distribution our state transition system should have, defined some analogues to thermodynamic quantities in this evolving system, and set up a framework for investigating the time-reversed system. We'll now combine all of these features, with the goal of relating the ratio of the probabilities of forward and reversed trajectories in our system. Why is this useful? We can imagine the forward trajectory as being the probability something forms, for example, the trajectory oil molecules form when stirred in water, as they assemble into bubbles of oil. The reverse trajectory then would describe the opposite process, where oil bubbles break down and mix evenly with water (note that this doesn't happen usually unless a force, like stirring, is applied).
In some ways we are interested in describing the arrow-of-time itself: why things in the universe seem to evolve in time in a single "direction" of increasing entropy, despite the apparent reversibility of the laws of physics. By relating the ratio of the probabilities of a forward and reverse trajectory, we can describe why one trajectory is more or less likely to occur.
Continuing, we'll remind the reader that we assume that the dynamics of our system are Markovian, meaning that the state at a given point in time only depends on the previous point in time. In terms of probability, this Markov assumption has the following form:
What this equation says is that the probability for x(0) to evolve on a trajectory is equal to the product of the probabilities of each individual state transition, i.e. the overall path taken only depends on the transition probabilities for each step. Note that the same applies for the time reversed trajectory
since it is also Markovian. Now, we will consider the ratio of the probabilities of the forward and reversed trajectories, i.e. their relative likelihood. Then:
The first line uses the Markov property to decompose the trajectory to the probability of the individual steps. The second line makes the dependence on (inverse) temperature (β) and energy explicit, and writes the probabilities in terms of the canonical distribution. The last line uses the properties of the exponential function and the definition of heat on a trajectory. The final result is:
This property is referred to as microscopic reversibility and the equation is also known as Crooks' Equation. If you're interested in further derivations, for example, the continuous case, you can read Gavin Crooks' thesis.
Jeremy England's Derivation
We'll now go on to derive the results from Jeremy England's paper on the statistical physics of self-replication.
Suppose that I and II are two macrostates of a non-equilibrium physical system after some time τ. In particular, we might imagine I represents the macrostate of a nutrient-rich, thermostatted petri dish with a single bacterium present and II represents its macrostate after the average bacterial division time, where we may presume the bacterium will have duplicated.
In a slight change of notation to England's paper, let
be the probability that a system with macrostate I is in microstate i (with similar notation for microstates of state II). The probability of microstate j of macrostate II is the sum over all the possible paths from microstates i of macrostate I to j, i.e.:
where p(j|i) represents the probability of being in microstate j after time τ given that the system started in microstate i. We omit subscripts on p(j|i) because it potentially represents intermediate macrostates outside of I or II.
Suppose we are interested in the reversed trajectory; in particular, suppose we want to know the probability of being in state I given you started in state II (where the same time period τ is assumed). This can be thought of as the probability of a two bacterium breaking down into one and the nutrients composing the second. This trajectory is the sum over all paths from II to I weighted by the probability of these paths and their initial and final states, i.e.:
where we expanded the conditional probabilities and multiplied by
Crooks' equation can be written in this situation as (compare above):
which we can insert in the above equation as:
Lastly, we can use the fact that
and the definition of expectations to get:
So that this sequence of steps has given us:
We can divide by the transition probability and take it into the expectation, yielding:
Jensen's inequality can be used to show that
so:
Then using
and multiplying by -1:
where the standard equation
for Shannon entropy was used to generate entropies for states I and II. Noting that this equation contains
the difference in entropies between the two states, this above inequality is a statement of the second law. Note that this derivation is general: it made no actual reference to the fact that we are interested in self-replicating systems i.e. that the states I and II represent the undivided and divided state, respectively, of a replicating bacterium.
The final steps of England's paper plug-in some experimental estimates of heat production and entropy into the inequality above. I actually won't do these steps, if you read this far you are now fully equipped to read the remainder of the paper. The beginning of the paper should feel quite familiar too, we've just gone through all of its steps (except in a discrete sum as opposed to integral formulation). A couple of interesting highlights of the final results:
Bacterial replication seems to be near the maximum possible efficiency replication can have
RNA molecule replication is even closer to the thermodynamic limits of replication efficiency
Conclusion
I hope this presentation of non-equilibrium thermodynamics was helpful and that you've learned something. You should now have reasonable ability to read through some papers, like England's, on this topic.
Note that these thermodynamic relationships don't actually explain how self-replication occurs or how it arises in the first place. For this reason, I personally don't think they capture the essence of life. Furthermore, I think thermodynamics can't even do that. We know that life is thermodynamically favored, otherwise it wouldn't exist for very long. To truly understand the origins of life and how to create artificial life, we can't rely on rough macroscopic description. We have to actually understand the informational processes and detailed dynamics.