Statistical mechanics is intriduced as the unified theory explaining thermodynamics—but nobody understands thermo. Derivations will arrive at some formula for a Helmholtz Free Energy, or Gibbs, or some specific heat—but so what? These things are all too arbitrary and disconnected to compress easily into memory, and the derivations make use of innumerable seemingly-arbitrary partial derivatives: getting from one expression to another amounts to navigating a labyrinth.

I. The Potentials

First, for reference, I’ll write out the mess of objects under discussion: a zoo of different “potential functions”, and their differentials. Each is given with its typical “Legendre Transform” relationship to the energy—much more on this after. The examples are not meant to be meaningful as formulas, but they will serve to demonstrate the kinds of relations we’re talking about here.

Name Arguments Expression Differential Ideal Gas Example
Internal Energy \(U(S, V, N)\) \(S^{-1}_{(U)}\) (see below) \(dU = TdS - PdV + \mu dN\) \(U = \frac{3N}{4\pi} {(\frac{h^3 N}{V})}^{2/3} \exp{[\frac{2S}{3Nk} - \frac{5}{3}}]\)


\(S\) is the entropy. \(N\) is the particle number and \(\mu\) is the chemical potential (think “energy to add a particle, increasing \(N\)).

Note that \(U\) and all of its arguments are extensive quantities, proportional to the size of the systems. Consequently its derivatives \(T, -P, \mu\) are all intensive, like “densities”. This property perhaps makes \(U\) a natural starting point from which to derive other potentials, but it’s impractical to actual use \(U\) because \(S\) is usually not measurable.

Next we have three potentials which relate to the entropy/temperature and pressure/volume variables.

Name Arguments Expression Differential Ideal Gas Example  
Helmholtz Free Energy \(A(T, V, N)\) \(A = U - TS\) \(dA = -SdT - PdV - \mu dN\) \(A = NkT( \ln{[\frac{N}{V} \Lambda^3]} - 1)\)

with \(\Lambda(T) = \frac{h}{\sqrt{2 \pi m k T}}\)
 
Enthalpy \(H(S, P, N)\) \(H = U + PV\) \(dH = TdS + VdP - \mu dN\) \(H = \frac{3N}{4\pi} {(\frac{h^3 P}{kT})}^{2/3} \exp{[\frac{2S}{3Nk} - \frac{5}{3}}]+ NkT\)  
Gibbs Free Energy \(G(T, P, N)\) \(G = U - TS + PV\)
\(G = A + PV\)
\(G = H - TS\)
\(G \stackrel{*}{=}\mu N\)
\(G = -SdT + VdP - \mu dN\) \(G = NkT \ln{[\frac{P\Lambda^{3}}{kT}]}\)  


Think of Helmholtz as the energy in a closed container (fixed \(N, V\)) where you can control the temperature—obviously this is a more useful notion than a function of \(S\) itself. Gibbs additionally gives control over the external pressure—appropriate for a vial of liquid open to the air.

Enthalpy I don’t understand.

The \(*\)-marked relation is only true when \(U\) is a function of no other variables.

Next we alter the \(N/\mu\) relation:

Name Arguments Expression Differential Ideal Gas Example
Landau Free Energy \(\Pi(T, V, \mu)\) \(\Pi = TS - U + \mu N\)
\(\Pi = -A + \mu N\)
\(\Pi \stackrel{*}{=}PV\)
\(d\Pi = SdT + PdV + N d\mu\) \(\Pi = \frac{V}{\Lambda^3}kT e^{\frac{\mu}{kT}}\)


Landau Free Energy represents an “open” system which can exchange particles with its environment at a fixed chemical potential \(\mu\).

Now an example of a system that’s not a gas or fluid, to see the generality of the method:

Name Arguments Expression Differential Example
(Magnetic) Internal Energy \(U(S, M)\)   \(dU = TdS + BdM\)  
(Magnetic) Helmholtz Free Energy \(A(T, M)\) \(A = U - TS\) \(dA = SdT + BdM\)  
Magnetic Free Energy \(F(T, B)\) \(F = U - TS - HM\)
\(F = A - HM\)
\(dF = -SdT - M dB\)  


Here we started with a different expression for the energy \(U(S, M)\), which is a function only of \(S\) and a magnetization \(M\) arising in some system in response to an external magnetic field \(B\). This demonstrates how these potentials definitions actually vary with the system under consideration, and \(A\) shows how the transformations don’t really care which extra variables come along for the ride.

Name Arguments Expression Differential Ideal Gas Example
Entropy \(S(U, N, V)\) \(S = U^{-1}_{(S)}(U, N, V)\) \(dS = \frac{1}{T}dU + \frac{P}{T}dV - \frac{\mu}{T}dN\) \(S = Nk(\ln{[\frac{V}{Nh^3} {(\frac{4\pi m E}{3N})}^{\frac{3}{2}}]} + \frac{5}{2})\)

or, though this uses a \(T\):

\(S = Nk [\ln{\frac{V}{N{\Lambda(T)}^3}} + \frac{5}{2}]\)


And finally there is the entropy \(S\) itself, which is related to the above potentials not via a Legendre transform but as the inverse of \(U(S, V, N)\) w.r.t. its first argument.

(There doesn’t appear to be a great notation for “inverse of a function w.r.t. a single argument”—a glaring omission from mathematics, I think. See this Math Overflow post; the elementary example is the relationship between \(x^y = z\), \(\log_x z = y\), and \(\sqrt[y]{z} = x\). The ForeXiv reference above uses Sussman’s notation which here would be \(S(U, N, V) = \mathcal{V}_1 U(S, N, V)\).)

Note that \(S\) does not have units of energy, but of \(\frac{[\mathrm{energy}]}{[\mathrm{temperature}]}\). And note that, while “solving” a large expression \(U(S)\) for \(S\) may be complicated, solving \(dU = TdS - PdV + \mu dN\) for a local \(dS\) is quite simple.

II. Legendre Transforms

Obviously all of those definitions are a mess—now to make sense of them.

What we are usually taught is that those expressions like \(A = U - TS\) amount to a Legendre transform from \(U(S)\) to \(A(T)\). That the signs don’t quite adhere to the normal identity \(G(s) = xs - F(x)\) we can chalk up to historical accident—apparently chemists preferred for \(U\) to appear with a positive sign in every expression, and flipped the signs of everything else accordingly. Also, it’s not clear at all whether the expressions for thermo potentials are really true, except in an extreme-\(N\) limit: one would normally expect to have \(U = \int dU\)!

We will need the right sense of what a Legendre transform really is. Unfortunately, the way these are usually taught conveys no “sense” at all. Sometimes these derivations are accompanied by strangediagrams of tangent lines to a function \(F\), with the value \(G(y)\) indicated as an intercept somewhere. No intuition arrives—why those lines? Why do we care?

The “transform” in qeuestion turns an \(F(x)\) into a \(G(s)\) as:

\[G(s) \to \max_x (xs - F(x))\]

which for suitable \(F\) can be written without the \(\max\) (or \(\min\))

The ForeXiv reference [1] finally offers a cogent explanation of the Legendre Transform (“Making Sense Of…” [2] comes up short in this respect.)

The key insight is this: to take the Legendre transform of a convex function \(F(x)\), you:

  • take a derivative \(F(x) \to \frac{dF}{dx} = f(x)\)
  • invert the derivative \(f(x) = s \to x(s) = f^{-1}(s)\),
  • reintegrate to give a new function \(G(s)\) in the same space as the original \(F\):
\[G(s) = \int^{s} {f}^{-1}(s') ds'\]

That we’re “inverting then derivative” doesn’t completely determine the form of the Legendre transform, but we can deduce the rest of the Legendre formula graphically:

Evidently \(F + G = xs\). Both areas can be parameterized just as easily by \(s\), so we can write:

\[G(s) = s x(s) - F(x(s))\]

In this form, a Legendre transform is clearly an involution, because the inner part of the transform is just a function inverse. The inverse transform produces the original function by reparameterizing both regions by \(x\) again:

\[F(x) = xs(x) - G(s)\]

This will sometimes be written symmetrically, with the understanding that one parameterizes all terms by either \(x\) or \(s\) to solve for one function or the other:

\[F(x) + G(s)= sx\]

The explicit procedure to evaluate one of these is:

  1. Find \(f(x) = \frac{dF}{dx}\)
  2. Invert \(f(x) = s\) to get \(x(s) = f^{-1}(s) = {(F'})^{-1}(s)\)
  3. Plug \(x(s)\) into \(G(s) = x(s)s - F(x(s))\). Equivalently, evaluate \(G(s) = \int_{s(x_0)}^{s(x_1)} (s(x_1) - f(x))dx\), which corresponds more clearly to the graph above.

One actually performs these operations so rarely that it’s easy to never learn how to do one!

A few notes:

  • For convex functions, the transform “conserves” all of the information in the function, as can be seen in the diagram above. Thus if the function has multiple parameter \(f(x, y)\), taking a Legendre transform can be thought of as a reparameterization of one of its arguments in of terms its derivative. It’s as if we had a blackbox with an inlet that says “takes \(x\)s at a rate of \(f'\)”, and we swapped it to now take \(f'\)s at a rate of \(x\).” It “wants” to be written like \(f(x, y) \to f(\tilde{x}, y)\), but this collides with our normal notation—the new function probably doesn’t have the same functional form.
  • We’re not thinking about the lower bounds of integration, but all three terms in the Legendre formula \(F + G = xs\) are really integrals:
\[\int^x f(x')dx' + \int^s g(s') ds' = \int^{xs} d(x's')\]

It all works out as long as all three integrals are taken over the same region in \((x, s)\) space, as can be seen graphically.

  • We can also see this as “integration by parts” \(\int u dv = \int d(uv) - \int v du\), except from a perspective here the integrated function \(F = \int u dv\) is principle rather than the integrand \(u\). In fact it may make sense to think of Legendre as a transform of differentials \(dF \to dG\): \(dF(x) = s(x)dx \to dG(s) = dF - d(sx) = -x(s) ds\)

Legendre transforms tend to arise when working with energies, whose absolute values are not meaningful. Hence it makes sense to think of the differentials as the “real” relationships, while the integrated values are only determined relative to some “reference frame”.

  • Sometimes one sees an expression with a \(\sup\) or \(\inf\) in it, which is needed when Legendre-transforming non-convex function to project into the smaller space; I’ll skip this.
  • The “gesture” of a Legendre transform is: unwrap—invert—rewrap. We differentiate to expose the derivative, flip the graph, then integrate again. Because it’s just an inversion; it’s an involution on its natural domain (convex functions). Because we discard information in the unwrap step (differentiation throws away constants), we would normally have a free parameter on the rewrap step (the lower bound of integration) but we have to choose this to readd the constant term discarded at the beginnning, such that the whole operation is an involution. This makes \(F, G\) corresponds to the two parts of the same square in \(x, s\) space.

    Compare to a matrix inverse, which could be implemented as: rotate to a diagonal basis—invert—unrotate: also an involution on the set of invertible matrices, and here again one unrotates into the original basis to readd the information that was discarded, such that the combined operation is basis-independent.

    Contrast with a Fourier transform, which is not an involution; instead the forward/inverse Fourier transforms have the senses of “rotate” and “unrotate”.

Some examples:

\[\begin{align*} F(x) & = \frac{ax^2}{2} & G(s, y) & = \frac{s^2}{2a} \\ F(x, y) &= \frac{ax^2}{2} + b(y) + c & G(s, y) &= \frac{s^2}{2a} - b(y) - c \end{align*}\]

Note the constant terms flips signs, and that a term not involving \(x\) is a constant from the perspective of the transform.

\[\begin{align*} F(x) &= \ln x &G(s) &= 1 + \ln s \end{align*}\]

\(\ln{x}\) maps to \(\ln{x}\), up to some constants, because its derivative \(\frac{1}{x}\) is its own inverse. This arises in the transformation between \(A \leftrightarrow G\) for the ideal gas.

III. Conventional Thermodynamics

Now we’ll tidy all of those thermodynamic functions. (Some of this section is based on ref [2], but that paper doesn’t quite clarify things enough.)

Note again that a Legendre transform operates on a single argument of a function at a time. Let’s look at what happens when you transform one argument followed by another. Let \(F(a, b)\) function, with \(\alpha, \beta\) its derivatives w.r.t. \(a, b\), such that:

\[dF = \alpha \: da + \beta \: db\]

Then we can either transform \(a \to \alpha\), \(b \to \beta\), or both. The following diagram shows what we get via each path:

\[\begin{matrix} F(a, b) & \to & F(\alpha, b) & = & a\alpha - F \\ \downarrow & & \downarrow & \\ F(a, \beta) & \to & F(\alpha, \beta) & \\ = & & & = & \\ b \beta - F & & & & a \alpha + b \beta - F \end{matrix}\]

Clearly you get 2x2 different functions. And we see that you can transform many variables at once by doing \(F(\vec{x}) \to \vec{x}\cdot\vec{s} - F\).

At this point an annoying bit of pedantry comes up which will help to clarify the thermo situation. The above diagram shows what you get if you view the final doubly-transformed function \(F(\alpha, \beta)\) as “the original \(F\) twice-transformed.” But if you stop after one transformation, say \(F(\alpha, b)\), give that a new name \(G(\alpha, b) = F(\alpha, b)\), and then forget where it came from and transform the second variable \(G(b) \to H(\beta)\), you get:

\[\begin{align*} H(\alpha, \beta) & = b\beta - G(\alpha, b)\\ & =b\beta - (a\alpha - F)\\ & \ne F(\alpha, \beta) \end{align*}\]

Which is right? Well, both are: you can transform the function \(G\) just as easily as \(-G\) and you’ll get two different results. The cleanest fix here is to name \(G = -F\) instead, but the real point is that the signs of each \(a\alpha\), \(b\beta\) term need not be the same. (This makes me wonder about the classical-mechanic transform \(\vec{v} \to \vec{p}\)…)

This approach will act as a map of the thermo potentials. The different combinations of transforms of the three arguments \(U(S, N, V)\) will form a cube, though not every corner of this will have a name.

\(A\), \(H\), and \(G\) transform the \(S/T\) and \(P/V\) variables. We can draw the square (one face of the larger cube):

\[\begin{matrix} U(S, V, N) & \to & -A(T, V, N) & = & TS - U \\ \downarrow & & \downarrow & \\ -H(S, P, N) & \to & -G(T, P, N) & \\ = & & & = & \\ (-P)V - U & & & & TS + (-P)V - U = (-P)V - A = TS- H \end{matrix}\]

It looks rather arbitrary! The only explanation I can see for the minus signs on \(A, H, G\) is that all are defined so \(U\) enters with a positive sign. \(P\) has a negative sign everywhere because \(V\) has the opposite sense of the other arguments of \(U\): higher \(S\) or \(N\) represents a greater internal energy, but greater \(V\) means less energy—less compression.

If we instead were to take \(-A\) as the starting point, all the signs would come out exactly as in our original schematic:

\[\begin{matrix} -A(T, V, N) & \to & U(S, V, N) & = & TS - (-A) \\ \downarrow & & \downarrow & \\ G(T, P, N) & \to & H(S, P, N) & \\ = & & & = & \\ PV - (-A) & & & & TS + PV - (-A) \end{matrix}\]

Then we have the face spanned by \(S/T\) and \(N/\mu\):

\[\begin{matrix} U(S, V, N) & \to & -A(T, V, N) & = & TS - U \\ \downarrow & & \downarrow & \\ & \to & \Pi(T, V, \mu) & \\ & & & = & \\ & & & & TS + \mu N - U = \mu N - A \end{matrix}\]

The lower left function doesn’t appear in my stat-mech book, but could easily be defined. \(\Pi\), unlike the other potentials, comes out with the “proper” sign as a double-transformation of \(U\).

In all they make a cube:

The signs indicate what you would get if you derived every potential via “\(G = xs - F\)” transforms starting from \(U\).

Finally we can draw a face for the magnetic energies:

\[\begin{matrix} U(S, M) & \to & -A(T, M) & = & TS - U \\ \downarrow & & \downarrow & \\ & \to & -F(T, H) & \\ & & & = & \\ & & & & TS + MH - U \end{matrix}\]

IV. Dimensionless Thermodynamics

What about entropy \(S\)?

As detailed above, entropy is not a Legendre-transform of any of these potentials; instead it is obtained by inverting \(U\) w.r.t. one of its arguments.

The “Making Sense Of…” paper ([2]) suggests it would be more intuitive to use Legendre transforms starting from the entropy, and they suggest a dimensionless entropy \(\mathcal{S} = S/k = \ln \Omega\). In this approach the duality between \(S\) and \(T\) would instead be a duality between inverse-temperature \(\beta = \frac{1}{kT}\) and energy \(U\) because \(\beta = \frac{\partial {\mathcal S}}{\partial U}\). (They use \(E\); I’ll skip this for simplicity.)

We’re then free to introduce dimensionless analogs of all of the potentials (\(\eta\) is a dimensionless pressure \(\beta P\)):

\[\begin{align*} \mathcal{A}(\beta, N, V) &= \beta A = \beta U - \mathcal {S}\\ \mathcal{G}(\beta, N, \eta) &=\beta G = \beta U + \eta V - \mathcal{S}\\ \end{align*}\]

These two “transformed entropies” are easy to relate to back to the normal “transformed energies” because \(T\) is accessible from \(S\) as easily as from \(U\) (\(T = \frac{dU}{dS} = {(\frac{dS}{dU})}^{-1}\)). \(H\) is weirder, because ordinarily it a function of \(S\) so it cannot be reached from it by Legendre transform, only by a function inverse. Instead we can imagine a dimensionless enthalpy \(\mathcal{H}(U, N, \eta)\), or a dimensionless Landau Free Energy in terms of \(\frac{\partial \mathcal{S}}{\partial N}\). I won’t spell everything out, but I tried the calculations for the ideal gas example and, indeed, all the potentials turn out simpler than their energy-based analogs; all are simple \(\ln\)s.

We could create a second cube of “transformed entropies” starting from \(S\). Mostly this approach is only useful to clarify the relationship of \(S\) to everything else—and, somehow, the knowledge that there is a clean way to do this is some relief for my frustration at the version of thing I actually had to learn.

References

  1. Forexiv on Legendre Transforms
  2. Making Sense of the Legendre Transform, which doesn’t go far enough.
  3. Pathria & Beale