Table of Contents

Introduction

This is part 1/? of a series of posts on the theory of “reaction networks”, a.k.a. “Petri Nets”, a.k.a. “mass-action kinematics”. For the most part this material will be assembled from various links on John Baez’ page. Almost nothing here is my own work, but I will be restating it in my own words and partially in my own notation.

My own interests are in:

  • Elementary examples of parts of quantum-mechanics and statistical-mechanics, e.g. Fock Space, and in decomposing these theories into simpler ones.
  • The relationships between descriptions of the same system at levels of abstraction, with a renormalization flavor.
  • The zoo of equilibrium behaviors of dynamical systems, and how they relate to the underlying parameters
  • Generating function methods, wherever they appear.
  • The opportunity to develop methods of diagramming high-dimensional spaces.

Reaction Networks

A “reaction network” is a graph whose edges represents “reactions” (or “processes” or “transitions”) between different “complexes” (or “combinations” or “tuples”) of “species” (or “molecules” or “compounds”)

Here’s a generic example:

The network represents a set of “processes” by which a system can evolve. From it we can produce various mathematical models, e.g. a set of coupled differential equations for “concentration” of each species, where we assert that each process occurs with a corresponding rate constant and at a rate proportional to the product of its inputs’ concentrations. The result is a set of coupled differential equations called a “rate equation”:

As no other reactions occur, any state which is not accessible via some sequence of the reactions in the network is unreachable. The set of possible states is further constrained by the rate constants, meaning that some states may turn out to be attractors while others are dynamically unreachable.

Here’s an example of a chemical network:

This system’s behavior is oscillating. You can sort of see this just by inspecting the formulas: the first reaction is a positive feedback loop fueled by which produces . The second reaction restores the which gates the first, while the third reaction takes off as accumulates from the second reaction, with the effect of consuming and slowing the first. Nothing restores , so the overall process ends when the initial surplus of this reactant is depleted.

Here’s a Lotke-Volterra predator-prey model:

where is the concentration of rabbits and is foxes. The first reaction represents rabbits spawning (asexually, in this simple model), the second represents the foxes spawning in turn by eating rabbits, and the third is the foxes dying off. This system, too, exhibits oscillations: first rabbit populations explode, then foxes explode in response while rabbits die off, then the foxes die, and then rabbits dominate again, etc.

The system of equations is:

And here is a simple model of disease spread:

where is “healthy”, is “sick”, and is “dead”, giving a system of differential equations

In this (very simple) model, “infections” can be observed either to die out rapidly (the number of sick goes quickly to zero) or will grow into an epidemic, where the rate of new infections equilibrates with the rate of deaths. Which phenomenon occurs depends on the values of the rate constants .

Evidently reaction networks imply a large class of “model systems” which exhibit various system dynamics, among them, the cycles and bifurcations seen in these examples.

Levels of Abstraction

I want to emphasize here at the outset that a “reaction network” describes a set of processes, which can then be “translated” into equations in various ways—in particular, at different “resolutions”, “levels of analysis”, or “levels of abstraction”.

  1. Equilibrium. The highest level of abstraction. Here we characterize a system by its equilibrium or long-time behavior, which might be an attractor state, but also might include more complicated behaviors like cycles.

    Examples: Equilibrium concentration of a chemicals in solution. Thermodynamics as equilibrium stat-mech.

  2. Mean Dynamics. This is the entrypoint for most analysis of a reaction network: we translate the network into a differential equation related the total populations or concentrations. This is called “Rate Equation”, which is an ODE in species counts . We can then analyze this ODE to determine its asymptotic behaviors, stability, bifurcations, etc., which can tell us things at level 0.

    Examples: The Lotke-Volterra and epidemic examples above. Quantum mechanics in the semiclassical limit.

  3. Stochastic Dynamics. At this much-lower level of abstraction, we can translate the network into the time-evolution law for probability distributions over states with specific “occupation numbers” for each species. This is a far-more granular description than the mean-dynamic level, and there could be intermediate levels. Generally this description of the system is far less tractable, but in some ways it is more amenable to analysis, and as we’ll see can be related to the Rate Equation at level 1.

    Examples: the Shrodinger equation.

  4. Histories, Timelines, or Sequences of Reactions. At a lower-level still would be a characterization of the state as a particular sequence of reactions, with which we can associate some probability-weight or “amplitude”. The weighted sum of all possible timelines we expect to leads to the higher-level characterizations of the system. But this level of analysis is particularly suited to questions like: what is the probability for a transition between specific starting and ending states?

    Examples: a diagram in a path integral, in QFT. I also think of “programs”, in the sense of a sequence of expressions in an abstract language. Specific sequence of mutations in genetics.

  5. Micro-scale Dynamics. At the lowest level of analysis we have the interactions of individual particles. At this level the reaction network doesn’t tell us anything; in fact this is the level where you might calculate the rate constants used at higher levels of analysis by computing scattering cross sections from the underlying dynamics. Or perhaps we can use statistical-mechanics methods to find a density-of-states or partition function.

    Examples: Hamiltonian mechanics of interacting particles in a gas. Individual vertexes in QFT.

Most of the theory here will concentrate on the three highest levels of analysis. I spell all of this out because, first, I think it will be a good way to organize these ideas, but in particular because, for me, the most interesting thing about reaction networks is as a model system for understanding the relationships between the different levels of abstraction.

With this we will begin at “Level 1”: the mean dynamics, as expressed by the Rate Equation.

Level 1: The Rate Equation

Anatomy of a Rate Equation

A reaction network is most simply seen as encoding a differential equation called the Rate Equation of the network.

Here is the example from above again:

, , and are the names of species of this network. We’ll use for the set of all species; here . (Note that in a chemical system species might be multi-atom molecules like —this is still a single species.)

We translate into the following set of coupled diff-eqs:

are the names of the four reactions, while are their rate constants. Note that in this example and represent two different reactions relating between the same sets of species, but in reverse directions.

When we want to iterate over reactions, we’ll index by . will stand for set of reactions. So in this example we have . The rate constant of reaction is a positive real number .

In the differential equation, the variables are the “concentrations” or “populations” of each of the species in . (Chemists might represent these as concentrations as or .) We use capitals for concentrations because we will later use at the next level of analysis down.

Concentrations are always non-negative real numbers—we think of the rate equation variables as representing the time-dependence of averages or expectations rather than integer numbers of molecules or organisms.

The rate of a reaction depends on the product of the concentrations of its inputs only, e.g. our first reaction occurs with rate . This is called the law of mass action.

The coefficients like represent the change in the concentration of a species when a reaction occurs. So our first reaction removes two units of and adds one unit of , but doesn’t change the number of species. since appears on both sides of the reaction.

The specific combinations of species appearing on either side of each reaction are the complexes of the network. These are, conceptually, the “clusters” which have to come together for the reaction to occur. will denote the set of complexes:

We will index complexes by a lower-case . Each reaction has a source complex and a target complex —think of these as the two ends of an edge of a graph. We can index into these complexes to find the component of species like . Here:

  • , so , , and the full tuple is .
  • , so and the full tuple is .

The difference , in terms of species, we will call . This is the change in the exact species population when reaction occurs once. Here .

We could try to write the Rate Equation as a matrix equation:

…but this won’t be very useful because the matrix isn’t square—instead it has dimension , because the rates depend on the complexes, and there can be a lot of these in general. You could also write it as a diff-eq relating complexes to complexes, but you would have to denote the species-in-the-complexes in some other way. So, neither of these approaches will be useful.

The general formula for a rate equation is a sum over reactions, where are the “target” and “source” of reaction , and additionally index the species.

We will sometimes use the symbol to represent the change in species due to a single reaction. This is just the output complex minus the input complex, with each interpreted as a vector over species. In terms of the rate equation is

Example: Solving a Rate Equation

For any particular network you could attack the corresponding Rate Equation with the methods of standard differential equations. In general we can’t hope to find exact solutions; the goal of such analysis is usually to make “level 0” statements characterizing equilibria and long-time behavior. But it will be helpful to see the general shape such an analysis takes with a worked example.

As a demonstration of what it takes to analyze a particular Rate Equation, I’ll look at the epidemic model from above, where, again represents the “healthy population”, is “sick”, and is “dead”.

This leads to the following system of equations:

The deltas are and , and the input complexes are and .

We can also write this as a matrix equation where the input is a vector of complexes and whose output is a vector species of:

To solve this, it helps to note that both processes conserve the total population. Hence we can define a conserved quantity whose time derivative is the sum of the three diff-eqs above, and therefore is zero. This also implies that that second matrix representation on complexes is a stochastic matrix, with columns summing to 1.

Now to solve it. This system is simple enough that it admits an exact solution by way of the following trick. We observe that the and equations are not independent, because

Then the solutions for and differ only by a constant:

And we can make use of the fact that to solve for exactly:

We now define parameters to make this dimensionless. If we define , we can rearrange:

And now we define constants , , and to write this in a dimensionless form:

We arrive at a fairly simple first-order differential equation, which is most easily analyzed by observing that always has an intercept of , and is shaped like a distorted kind of upward parabola. The minimum of this part of the equation occurs where , or ; hence . For this minimum is to the left of ; for it is to the right:

Then the full system will have a zero (a fixed point) wherever the horizontal line crosses this curve. Note that is always , being the inverse of the initial “healthy” fraction.

Apparently:

  • When , meaning , is a fixed point: a healthy population can’t suddenly get sick.
  • When or , the model develops an “attractor” fixed point —an eventual “death toll”—and while the value of the attractor increases slowly as a function of the initial condition (through , which isn’t really independent of ), it increases very quickly as a function of the “rate of transmission” relative to the rate at which sick patients die off ; represented by .

From here you could go on to calculate the location of the fixed point as a function of the parameters and the initial conditions, or the timescale of convergence to the fixed point, or you could modify this comically-simple model to include behaviors like “getting better” and “quarantining”. But it is adequate to demonstrate the typical behavior of reaction networks: fixed points appear and move around the parameter space as a function of the rate constants.

Tuple Notation

All the species indices in the rate equation are not particualrly easy to read. Some of the literature chooses to clean this up by omitting the species index entirely, asking us to remember which terms refer to species (which obey special rules) and which do not. I want to take a different approach and represent indexing-over-species as an overbar . For example, in the example above. You can still index into a barred quantity like .

A tuple over species will act like a vector as , but we won’t assume it has any “vectorial” properties besides those we explicitly define mention here.

Tuples can be added or subtracted, and it will occasionally be useful to take a dot-product:

We’ll also define operations on tuples which do not have vector analogs. Most operations which would not be defined on vectors we implement by applying component-wise, then multiplying all of the components of the result:

Note that the last line is a succinct way to write a multinomial coefficient , if . We could also write this as .

We will need a couple more non-standard operations as well.

First we have the “falling power”, which works for integers or tuples:

The last line indicates that the falling power could be related to a multinomial coefficient with both arguments tupled, but this seems a little too wacky, so we won’t use it.

Note that is a product of terms where all but one being are slightly less than . So it’s reasonable to read it as representing a number “somewhat smaller than ”.

Next we have the scalar product of more-than-2 tuples, which we write as a chain of dot-products:

And lastly we implement inequality between tuples, which can be seen as a shorthand for “logical AND” or as a product of indicator functions. We won’t attempt to find a notation for “OR”.

With all of this notation, and , let’s try to write the general Rate Equation concisely.

The Rate Equation on column vectors looks like this:

We want to write this as a “tuple equation”, just as we do for vectors, representing a system with one equation for each tuple index . Then the r.h.s. expression will have to be evaluated in each equation. For a two-species system the above stands for a system of two equations, each containing a term which has its own internal tuple structure:

We immediately encounter a weakness of the notation, unfortunately—we would like to write this term as , but we will need to indicate somehow that these tuples are not to participate in the system of equations. I will account for this with absolute-value symbols like . So I’ll instead use only when an expression inside a tuple equation might be ambiguous. In general this might collide with any actual absolute values, but we won’t encounter any of these; nearly all of our tuples will be positive anyway.

An alternative to all of this would be to use an Einstein-like notation, with the rule that tuple components multiply for any tuple index appearing on only one side of the , while those that appear on both sides of lead to a tuple of equations. We’d have:

I’m going to stick with tuples, though—we’ll get a lot of use out of them.

Our concise Rate Equation is therefore:

But in fact the tuple notation will not be all that useful for rate equations—it will serve us better at the next level of abstraction.

Level 2: The Master Equation

Stochastic Dynamics

In one sense the Rate Equation is a direct translation of the information in a reaction network. But in most cases this should be thought of as an approximation of the underlying physical system, in a few ways:

  1. The rate constants are either estimated from measurements, or, if they’re determined from the underlying physics, are probably approximations which only capture the highest-order interactions
  2. The law that reactions occur with rate is itself only approximately true
  3. The initial conditions are only approximate measurements and therefore represent a peaked distribution of species numbers , which is in actuality discrete rather than continuous, and may be quantum-mechanical

We could imagine trying to “undo” the abstraction along any of these axes. Here we’ll take the first two as givens—as assumptions of the “reaction network” class of models, essentially—and only broaden our view along the third axis.

Instead of concentrations, we’ll now consider the time-evolution of a probability distribution over the exact particle numbers.

A “state” will now be a distribution over the discrete species numbers, represented by an -tuple . We’ll use for the total number of particles. The equation governing the evolution of such a distribution is called a “Master Equation” of the system.

This is a “stochastic” dynamical system, but will turn out to closely resemble a quantum-mechanical wave function, with the Master Equation corresponding to the Shrodinger Equation. 1

Following that analogy, we can represent a full state by an expression which looks like a wave function:

Note that in the stochastic case, the coefficients are positive real numbers rather than complex amplitudes, and we normalize this distribution with rather than . We may as well think of the coefficients as probabilities:

Our “Master Equation” will represent the time evolution of this distribution by a Hamiltonian-like operator:

The solution can be found for a given initial condition as a time-evolution operator which is just the matrix exponential of the generator :

Now we’ll build it. First we need to represent our reactions in the occupation-number basis. We’ll use off the same example from the beginning of this essay:

Each reaction converts the term , where, recall, : it adds to the targets and removes from the sources. Spelled out, this is:

How do we represent the rate of a reaction? Reaction will convert state to state at a factor proportional to:

  1. The prior probability, of which is
  2. The rate constant 2
  3. The number of ways to choose all of the reaction’s inputs out of all the candidates in state , which is the tuple “falling power” introduced above:

We also have a condition that , meaning contains enough of each species in for the reaction to actually occur. But this is already implied by the definition of the falling power: if .

Then the total rate of reaction on state is:

Our time-evolution generator will consist of one of such term for each initial state and for each reaction. Each should link the initial state to the final state , which we represent by an additional factor , where the first indicator selects the state which is related to by exactly one occurence of reaction , and the second removes the same weight from the initial . This erases all of the probability mass from , giving it to , but note that, in almost all cases, another instance of the same reaction will also be adding probability mass back the state . (Here I wonder what can be said about those states for which the inflow and outflow rates are equal, or if it would be useful to represent the system as the difference between inflows and outflows for each state.)

This form implies that our generator matrix does not alter the total probability. is an “infinitesimal stochastic matrix”, whose columns each sum to 0; this turns out to imply that the exponential will be a stochastic matrix whose columns sum to 1. In quantum mechanics, this corresponds to the fact that anti-Hermitian operators are generators for Unitary transformations. For example, is the generator of time-evolution: .

The full Master Equation is:

We can isolate the generator of time-evolution ; it is a matrix representing the contribution of state to state :

The Master Equation on Generating Functions

Now, rather than representing our state by a sum over occupation-number basis elements , we could instead represent it by a “probability generating function” (p.g.f.) over monomials , like this:

You could motivate this in a few ways:

  • Out of a general love for generating functions (like my own)
  • Or, by analogy to the Rate Equation , which involved powers of a species tuple as well, perhaps hoping to derive the Rate from the Master (we’ll try this in a bit)
  • Or, by observing that falling powers , which we used above to count the “number of ways of choosing in the reaction inputs ”, corresponds exactly to the coefficients you get when taking derivatives of the monomial3:

Guided by this observation, we’ll take our quantum-mechanical analogy even further by defining a “creation” operator and an “annihilation” operator for each species . The effect of these operators on a monomial is to increase or decrease by 1 and multiplying by a factor representing the number of ways of doing so—a falling-power for but just 1 for any power of . (I’m not using the exact QM notation , because these are not “Hermitian conjugates”. I never cared for that anyway.)

The implementation of in the g.f. representation will be , as we have just seen, while the implementation of must be multiplication by :

We’ll write each of these as a tuple over all species :

We can also define a number operator which counts the number of particles of . It’s given by—what else?—

The operators don’t commute; their commutator is4:

Applying multiple s followed by the same number of s returns you to the same power you started with, but with a coefficient of a falling power:

So we can also define the “falling power of the number operator ”:

… and that is everything we need to represent the Master Equation as a time-evolution operator in the generating function representation:

This is a sum across reactions , which we could write as . The term will act on a state of definite species number like:

Read this as:

  1. First, consume the inputs from the input state ”, which gives the state times the original probability and a coefficient representing the probability of this reaction occuring.
  2. Then, in parallel:
    • move to the new state and add all the probability
    • subtract all of the probability from the original state

For comparison, here is the original original Master Equation, with expanded into :

I prefer the operator version. There’s one fewer to think about, and it’s easier to read the operator equation as a “process” linking to . And in some cases we’ll be able to sum the target g.f. series into a nice function of , which will make the derivatives in very easy to use.

Finally, in either representation, we can implement time-evolution by exponentiating this thing and acting on an arbitrary state:

Time Evolution as a Path Integral

Now we’ll briefly dip our toes into the next level down the ladder of abstraction.

One advantage of the operator formulation is that it lends itself to a path-integral-like interpretation, as follows. Observe that is a sum over reactions, like . Then, when we exponentiate this operator we get:

Carrying out the powers leads to terms with sequences of operators like with some coefficient, which we read as representing the rate at which that sequence of transitions occurs in the time . When everything’s expanded, the action of time-evolution is like taking a sum-over-timelines, or a sum-over-all-possible-sequences of transitions. For some initial state , each will be associated, in the final state , with a sum over all sequence which could produce it, each in proportion to:

  • the number of ways to choose candidates for those sequences: a product of terms .
  • the probability that those reactions occur independently, which gives a product of factors for each reaction , representing the underlying hypothesis that all reaction occur at independent rates over time.
  • a correction for each sequence of reactions, which (I think) avoids an overcounting which arises because permuting the sequence of reactions is equivalent to permuting the times at which those reactions occur

For example, if our network looks like this:

Then we have two “processess” and . The series expansion will have some term in it like , representing “one reaction followed by one ” (acting on a state the right). Then if our initial state has 3 “1” particles and 1 “2” particle, which in the g.f. notation is , then this sequence will take us to the state : two “1”s combine to give a “11”; then their product combines with the “2” to give a “21” and a “1”.

For this sequence the combinatorial term would be for the first reaction, there being 6 ways to choose 2 of 3 particles (which considers the particles as distinguishable point) and for the second.

We can draw diagrams for three of these combinations; the other 3 choose the same two inputs into but in the opposite order.

Observe that in each case one of the “1” particles passes through untouched, being affected by only the identity term of instead of any of the interactions.

These are something like Feynman diagrams. They’re mostly a conceptual aid at this level, but they could be useful when analyzing small- systems. If you want to determine the probability of, say, beginning in a state of and ending in the same state, you could formulate a combinatorial series of diagrams representing all the sequences of reactions which meet this constraint: (no change) + (one reaction and the reverse) + (higher order terms). The exact series would depend on the available set of reactions. It would be instructive to come up with exact Feynman rules for some model reaction network.

Once again the most interesting result might be the light this can shed on quantum-mechanics through the analogy. It has always felt to me that QM was a tangled mess of concepts which, individually, are not all that complicated. Once you have an elementary example of each, the full theory ought to “factor”, with the particulars of quantum physics entering at clearly-delineated points. This could at least be an aid to pedagogy; of course any improvement to pedagogy can also improve the reach of pedagogy. If we dare hope for more, we might aim to better “compress” the body of physics knowledge better in the minds even of experts, letting all see further and more-easily grasp subtleties than in the current presentation. If!




  1. In quantum mechanics one might arrive at the same thing by first considering some underlying single-particle wave functions; then restricting to only anti-symmetric or symmetric combinations (a Fock Space), and then switching to the occupation basis. We won’t run into those symmetrization questions because we are not trying to say anything about spatial wave functions. (I imagine quantum mechanics would come out cleaner by starting with the occupation numbers. Spatial wave functions are the complicated part. These occupation-number equations will not be difficult to write down, as we’ll see.)

  2. Here we are effectively modeling the dynamics of this stochastic system as though particles encounter each other at some fixed global rate, probably proportional to the overall density, with 2-particle interactions occurring , 3-interactions , etc. The rates of individual reactions then depend on which combinations of particles participate in the interaction. Obviously a realistic chemical model would have some global “temperature” parameter determining the rate at which interactions occur globally, but we will ignore that in the interest of generality; effectively we are rolling it into the definitions of the rate constants . If you want to split out, you could add a fourth factor to this list .

  3. Apparently, the coefficient in a derivative was always counting the number of ways to choose which copy of to remove—a fact which I may have learned in calculus class, but have long since forgotten.

  4. Note the quantum-mechanical analogy. For , the uncertainty principle is just this same identity . It almost looks like the in in Q.M. doesn’t need to be there—it has the effect of making operate correctly on functions of the form , where it gives . To me, this looks like yet another feature of Q.M. which could be “factored out” of the theory—I need to be keeping a list.