The Partition Function
Table of Contents
Part 1: The Partition Function
Multiplicity
Statistical mechanics begins with the postulate that the macroscopic behavior of an isolated system depends only on its entropy, defined as the logarithm of the “multiplicity” of states:
Measurable thermodynamic quantities like pressures and heat capacities derivatives are then found by taking derivatives of
Thus the system is completely characterized by the functional form of
This
For two independent systems considered together, the joint count of states will be
So I’ll go on calling
We can visualize the multiplicity of three independent systems together
Maybe you constructed this system by taking products of the smaller systems. Once you have
For completeness, we should also mention that disjunct—mutually exclusive—states will add their multiplicities:
We restate all of this in the language of probabilities. If we postulate that a system of specified
This “uniform distribution of states” is called the “microcanonical ensemble”, but I’ll avoid the term in this essay. In fact I intend to avoid discussing probabilities at all, preferring to allow
For any joint distribution, picking a single value for
The heat reservoir via generating functions
Let’s now look at the standard “heat reservoir” derivation of the partition function. We imagine a system
where
We’ll want to have a few facts about g.f.s on hand:
- The g.f.s above are written as sums over energies
, but they would come to the same thing if we summed over individual states : , since - The sum
can be taken over the actual spectrum of system , since the coefficient of any term not in the spectrum would just be zero. - We can use a g.f. to count the total states over all
by evaluating at : . This just reduces the g.f. to a sum of across all states. - We can convert a g.f. to a “probability generating function” (“p.g.f”) by normalizing it:
. - We can also plug in other values for
to assign a “cost” to each unit of . This is best illustrated with the binomial series as seen in the following examples - Plugging in
counts the total number of states, which is - Plugging in
give a distribution, representing the probability of getting heads in coin flips: . - Plugging in any other
, will give a different distribution. This works because . Any assignment of values that make the g.f. equal to 1 can be interpreted as a p.g.f. - Plugging in
marks the two variables as indistinguishable, reducing the series to . - Other expressions like
can usually be given some interpretion in terms of of weights, costs, indistinguishability, etc.
- Plugging in
Now, if our system and reservoir do not interact, we can track their energies separately in a two-variable joint g.f., which is just the product
Clearly the series would factor; each term for a given power of
If our system and reservoir can interact, by exchanging energy, then we need to represent this by using the same cost-tracking variable in each series. Now the terms mix, and we can rewrite the product as a sum over total energies:
… and we observe that the product of two g.f.s is the g.f. of the convolution of their series:
Note that each
So this convolution is just a way of “reindexing” the joint series by the combined energies
Linearizing the reservoir
Now that we have our joint g.f., we’ll add the assumption that the reservoir
The typical version of this argument linearizes around
where
We can now rewrite the term
We see that, no matter which
Now we insert the linearization back into the g.f.:
… and there appears the partition function
The effect of our linearization has been to characterize the influence of the external reservoir
(This is a lot like linearizing the gravitational potential at the surface of the earth as
as the multiplicity of the combined system
Let’s now try to interpret the partition function
We can read the terms
as the product of multiplicities of three systems. The first is just
We can visualize the joint distribution of the
The rectangles widths grow according to
Their joint multiplicity
The “third” system
We can also read a physical interpretation of
For each
In other words:
In still other words:
as a Laplace transform
If we write the definition of
That g.f.s are very nearly Laplace transforms isn’t too surprising, since multiplication of g.f.s corresponded to the convolution of underlying functions, which is one of the defining properties of the Laplace transform.
From this perspective the
Thus the linear approximation for the reservoir makes the joint state
as an information cost
One more interpretation: we arrived at
We can first observe that any g.f. like
Then we have also set the scaled cost to
We shouldn’t read too much into this: this does not necessarily imply that
But it feels safe to read the assignment
Part 2: The Free Energy
Postulates for the Physical State
We have so far describing our the system in terms of probabilities except for a few interpretative remarks. The only physical content is the form of the function
We have also not discussed equilibrium. The partition function
Three choices of postulate are common:
- (Mode Postulate) The system is simply found in the state of maximal of
, i.e. the mode, which we’ll call . - (Mean Postulate) Or, the system is found to be in the mean-energy state of this series, which we’ll call
. - (Distribution Postulate) Or, the system is found in all possible states with uniform probability over the system and reservoir combined. This is:
In terms of the system
Each postulate above answers the same question: how do we make physical predictions of thermodynamic properties from only a description of the multiplicities of states of a system
None of these postulates are inherently correct, except inasmuch as they reproduce experimental results.
We can make some progress on picking among the postulates by observing how they relate to each other. The distribution in postulate 3 will have as its mode and mean the exact values
We can illustrate this with a simple system of
For some temperature
The sharpness of these peaks justifies the “thermodynamic limit”: for large
(This last observation is essentially the Central Limit Theorem: the sum of many identical distributions tends to converge to a Gaussian. We could call this Gaussian approximation “postulate 4”, but we won’t need it.)
But I’m not going to take this limit yet. I will instead examine the three postulates separately, and apply the thermodynamic limits separately to each one, so we can clearly see where different definitions of the same concepts arise, and which simplifying limits lead to their familiar forms.
1. Free Energy in the Modal State
Where is the extremal energy
where
Then the extremal value of
I call this “thermodynamic” because a single formula like this can serve as an entry point into classical thermodynamics, which makes predictions about the properties of materials by taking different derivatives of the Free Energy (which generally would be a function of
Note how the extremization has consumed the
This relationship expresses
Strictly speaking, the above equation is not how the Legendre transform is usually written. The standard form would be
You can also see this as a Legendre transform of
Note that we have still asserted nothing about equilibrium or probabilities! Our expression simply represents the value of the series
This
Let’s now try to interpret it.
First, note that
If we think of “turning on” a reservoir suddenly, the free energy of a state
The thermodynamic free energy
In effect we’ve transformed from an isolated-system version of postulate 1 to an interacting version, i.e. from:
- (isolated) the system
is found in the state of maximum entropy for a given energy
to:
- (interacting) the system
is found in the state of minimum free energy for a given temperature , which is the state of maximum entropy for the larger system .
Note that we’ve now strung together two approximations: first we linearized the reservoir, then, in adopting postulate 1, we have assumed that a system described by a series
But the l.h.s of the second line is just the partition function
We can read this one of two ways:
- either we are “approximating
up” to encompass all of the other states - or, we are “approximation the partition function
down” to forget about all states besides the extrema.
In the very-large-
Finally, the above expression for the large-
2. Free Energy in the Mean State
Now onto postulate 2: approximating the system by its mean energy. We don’t need a probabilitistic interpretation to do this; a series has a mean too:
(That
This
- We start with
and increase to its final value, representing: a. that initially , energy is free and all states are equally accessible; each energy corresponds to a number of states —so the “isolated system” we considered originally is an “infinite temperature” system. Then we view the reservoir as turning the temperature down from , or putting a finite “cost” on units of energy. b. or that initially , which would seem to represent a sort of “classical limit” akin to . - Or, our system
is initially isolated at energy . Then we introduce a reservoir at exactly the temperature , and the exchange of energy allows other states than to become accessible.
The two cases starting with
The latter case is more interesting: before the reservoir, the system was a specific state with entropy
(Note we’re assuming
The last line brings us one approximation away from the thermodynamic equation
We can now define a “mean” Free Energy
- either we choose
, the exact entropy of the expanded system , but only holds approximately, - or, we choose
, and only holds approximately.
In either case we get
- either all states have the mean energy,
- or, the contributions of all states other than the mean energy vanish:
for
I find it helpful to see that
3. Free Energy of a Distribution of States
Finally we consider the third postulate: that the physical state of the system can take on any of the accessible values
We can take for granted that the conclusions of the previous two sections apply to this distribution:
- the mode of this distribution is
and we can define - the mean of this distribution is
and we can define - if we take approximate the entire distribution as a delta function, then the above two equations hold exactly, and so does
, and the whole system can in some respects be treated as possessing only a single state of multiplcity , entropy , and energy .
To consider our distribution as a distribution, we’ll bring in the Shannon entropy:
The Shannon Entropy of the Boltzmann distribution is:
Clearly this is just another expression for
We also note the fact that the Shannon entropy of any uniform distribution over
admits interpretation as the Shannon entropy of either: - The Boltzmann distribution
- Or of a uniform distribution over the
states at the mean energy.
- While
is the Shannon entropy of a uniform distribution over the accessible states in the joint system .
Turning on the reservoir appears to have “smeared” the fixed state
If we express the Shannon entropy of a uniform distribution of
which we can read as describing the entropy of two equivalent subdivisions of the system:
- choosing one of
total states of . - choosing a state
of systsem with probability , then for each choosing one of the states of the reservoir .
If we had computed the Shannon entropy of the distribution of energies
In the second line we’ve written this in terms of the “full” entropy
If we break this up differently, writing
This now expresses two ways of viewing the states of the combined system
So we see that the Free-Energy equation
We can also express this in terms of the “relative entropy”, “information gain”, or “K-L divergence” between two distributions:
Then:
- The term
can be read as , the information gained by moving from a uniform distribution over all states to a Boltzmann distribution. - The difference of the two granularities of entropy above,
is a relative entropy , which we can read as the information gained by grouping the energies—marking states indistinguishable—according to . - We can view
as the divergence , where is a distribution localized at any single point . Thus the Shannon entropy of a uniform distribution represents the information to be gained by going from that entire uninformative uniform distribution to any single state.
Hence we can express the Shannon entropies entirely as differences of divergences, capturing the fact that entropy is only ever define relative to some finer-grained description of a system.
The first line represents the identification of
In summary: when we finally apply the probabilistic interpretation of our distribution of states, we find that the Shannon entropy of this distribution is
- I expected the Shannon entropy of the distribution to equal
, I think, but in hindsight this would be impossible because (or rather ) is the entropy of a uniform distribution over all of the joint states of , distributed as rather than of system alone distributed as . - When we translate
into information-theoretic terms, it turns out to be nothing but the definition of relative entropy.
4. Translating the Energy Scale
We have seen three derivations of the same expressions for Free Energy, arising from three postulates as to the physical state of our system.
One final derivation is as follows. If we have some very simple system with a single state with multiplicity
But this is the same as the Free Energy relationship, written as
as the entropy before the shift as the multiplicity after the shift, which will apparently equal as the entropy after the shift, i.e. , which equals . as the new zero-energy.
It seems this relation is the only sensible way for entropy to transform under a change of energy scale. We can write these as a transformation law between
In this form we can clearly see that, when the effect of a reservoir or a change of energy is to transform each energy’s multiplicity
I interpret this as follows. If you initially describe a system at a fine-grained scale where all states
- the effect of turning on a heat reservoir at temperature
, which adds Boltzmann factors to each state, then summarizing the system by a single total … - …must be, to lowest order, the same as the effect of summarizing the system by a single
, and mean energy , then turning on the reservoir and adding a Boltzmann factor to the single state
There is a “commutator” of this sequence of operations—all the terms we drop in a thermodynamic limit—but again, to lowest order, the requirement that the result transform self-consistently with an energy scale implies there is no other way for these two operations to behave!
Conclusions
We see, in conclusion, that the same equation
- How the modal entropy is affected by a reservoir
- How the effective entropy of the state of mean energy
is affected by a reservoir - How the Shannon entropy of the state at energy
is altered by a reservoir - Or, how the entropy of any single state is altered by a change of energy scale
In each of the first three cases we are approximating our large-
It is also remarkable to me that no probabilities are needed to produce the results of thermodynamics.
Part 3: General Techniques
When studying stat-mech for the first time, many of its tools—partition functions, generating functions, free energies—are encountered on an ad hoc basis in some vaguely historical order, and wind up feeling like a bag of magic tricks with no overall logic to them. Each separate encounter has a separate name; it is as if we gave separate names all of countless different Fourier Transforms encountered in a physics education. There are in fact only a handful of techniques being used over and over, and one of my goals in spelling out these derivations has been to disentangle them. We’ve seen in particualr that
The following few sections will spell out some of the less-familiar general techniques of which the standard stat-mech arguments are specific examples.
Maximum-Entropy Estimation of Distributions
If you take for granted the probabilistic postulate, you can derive the partition function and free energy formulas via another route. We consider the following constrained optimization problems:
- Maximize the Shannon Entropy
w.r.t. a known mean energy and the constraint that the probabilities are normalized . - Or, maximize the multinomial coefficient
over a set of occupation numbers , subject to constraints and .
The second problem arises when considering stat-mech as arising from an imaginary ensemble of states sharing energy (a trick to make the probabilities palatable to frequentists, I suspect). In the large-
—an observation which helps to interpret the Shannon entropy, but won’t shine any light on the present situation.
I’ll sketch the first constraint-optimization derivation to illustrate the method; the full details can be found in any stat-mech textbook. We add Lagrange multipliers for the constraints to get a Lagrangian function for the problem:
Here
The condition
The normalization (
which sets
Then the average energy constraint (
Finally, evaluating
Interestingly, we find that the thermodynamic entropy
I never learned it in my physics education, but this derivation is a completely general technique—called maximum-entropy—which can be used to estimate a probability distribution from knowledge of some set of
The same derivation with
The last line expresses the extremal entropy as a Legendre transform of
This is exactly how one arrives at a “grand canonical partition function”: a second constraint on the total particle number
Just about all of the properties of the stat-mech
Note the two first-derivative relationships in the first line are how we actually perform the Legendre tranforms
These formulas produce most of the zoo of thermodynamic equations like
We can go on to express identities pertaining to the distribution
These properties derive from the fact that
In any case, these identities give measures of dispersion and fluctuations in stat-mech, among them the fact that
For more, see Jaynes. The takeaway from this section is simply that partition functions arise generally when estimating a distribution from knowledge of its expectations like
is a Legendre Transform of Shannon Entropy
In the maximum-entropy derivation we just saw, we ended up finding a Legendre duality between
The outermost maximization represents the Legendre transform between
This will be easier to interpret if we take a detour to look at the relationship between Lagrange multipliers and Legendre Transforms.
If you have some constrained optimization problem “maximize
If
A more general constraint function like
This still looks a like “evaluating a function with two Legendre transforms”, with the thorny details of the constraint surface now stashed away in the definition of
This line of thinking also suggests it is sensible to take a Legendre transform w.r.t. any (suitable) function on the space
Now returning to our Lagrange-multiplier expression for
We can read this as:
- Legendre transform
w.r.t. its zeroeth moment - Legendre transform
back to also evaluate this at (this transform doesn’t have a in the above expression because it has been evaluated.) - Then, Legendre the average energy
- Then the final transformation
sets the value of the average energy
If we view
Clearly you could rotate to a basis that isolates the
One question arises: can these planes fail to intersect for some value of
It’s curious that the
Another curiosity: this same line of thinking implies that we can break out the
This thing is perhaps even more natural than
… except that this is really
That’s probably enough about this, although my mind wanders off in a few more directions. I find the equivalence of Lagrange Multipliers and Legendre transforms to be really interesting. It came up in my classes now and then, but I’ve never seen it spelled out as a general relationship between the two techniques, and nobody ever mentioned you could Legendre-transform w.r.t. any constraint function—probably because it’s either equivalent to a reparameterization in the one-to-one case, or very complicated in any other case.
is a few kinds of generating functions at once
We have mentioned that
If you stare at this for a minute you’ll realize that
So
We have to divide out
Since I’ve spelled out the
Finally we have the series
Note that these are all just normal Taylor series! They will generalize in the usual way to multiple dimensions, so if
Factoring a Delta Function with Lagrange Multipliers
When we originally linearized
Perhaps we can do better? Here’s another argument. You could write the term for a given
Now, we often reach for Lagrange multipliers to implement the concept of “constraints”. Le’ts try to translate the
where the
On this equation, the condition that
If instead we translate the
then the constraint says
I find this clarifying: it tells us to think of
We can distill a general technique: if you want to analyze the behavior of
which encodes the constraint as the condition
References
- Jaynes “Probability Theory: the Logic of Science”
- “Making Sense of the Legendre Transform”, arXiv
- “What is Entropy?”, John C Baez, ebook
Footnotes
-
Here I’m plotting
on the axes, not ; the values are labeling the bins. If the distribution were bounded, this would be like plotting against the quantile of rather than itself, or against the “position on ‘s c.d.f.”. Basically this is a copula plot, a concept I only learned about when I was trying to figure out what this thing I was drawing was. ↩ -
I am led to contemplate whether the energy going into any particular mode in our system
, e.g. a wave mode with energy , can itself be seen as an information cost, but one which doesn’t depend on the ambient temperature, being quantized and unable to exchange freely in units smaller than itself. I often think of the canonical momentum as the “information cost of changing Lorentz reference frames”; a bound state of a wave must change its frame at every coordinate within the boundary at once in order to be consistent with the B.C.s; this quantizes it. But, if this view is right, at root all of the energy is just “configuration entropy”… ↩ -
This makes me think it should be possible to add a third expression representing the complete coarse graining of the system down to a single state of energy
, perhaps with an error term capturing the fact that, unlike the first two lines, this alters the system. But I can’t quite see how to do it. ↩
Comments
(via Bluesky)