Entropy and Generating Functions
Table of Contents
- 1. Counting States?
- 2. Shannon?
- 3. Shannon again?
- 4. Composing Entropies
- 5. Counting Again.
- The Entropy of a Generating Function
- In Summary
1. Counting States?
The expression
My immediate instinct is to say that
So “plugging in 1 to count the total states” is not the same operation as “entropy”; they only coincide when all states are distinct. The “information” ought to make use of the fact that
2. Shannon?
The Shannon entropy of the distribution represented by the fully-expanded g.f.
This is not a normalized distribution; its sum is—what else?—
So we get for the normalized distribution, with
These are just two ways of writing the entropy of the distribution
3. Shannon again?
But now another idea comes to mind. The Shannon entropy just discussed is arguably not the actual distribution represented by the g.f.
Note this is just the earlier entropy
4. Composing Entropies
And one more idea. Doesn’t entropy add across independent systems? A product of g.f.s represents a product of systems, so shouldn’t
Here it turns out I’m just confusing a few things. We have to distinguish between pointwise multiplication of distributions vs. the Cartesian product.
Pointwise multiplication is like a dot product:
The Cartesian product of two sets
and this thing is simply the sum of the entropies
So my confusion comes from the fact that
We can add to this list of composition laws the entropy of the sum of two disjoint sets:
- Can the form of
be derived from this fact alone? - Is there a clear relationship between the way
acts on between mixture-products and how it acts on non-disjoint unions?
If the input sets are unnormalized, with totals
That last term is the entropy of a single number,
5. Counting Again.
In a couple of places we have also encountered the Shannon entropy of a single number, like the
The Entropy of a Generating Function
Well. Those are the different ideas that come to mind when I wonder about the “information of a g.f.”.
Apparently, however we define the entropy of a g.f., it cannot depend only on the value of the g.f., since the
It must defined relative to the thing we’ve decided we’re counting in the g.f.—it needs to be affected by the way we choose to group terms. For example, for
It may be that, for a choice of how to group the terms, you can mark this grouping with a specific set of variables
For the binomial, the standard way to mark terms would be by powers of
This is not too revealing, but I see a few things:
- Some kind of chain rule?
- Something that looks like a translation operator
, or a Taylor series. - We could get rid of the
s with a , or with some other kind of g.f. - You could e.g. take an entropy w.r.t. one variable of a two-variable g.f., like
. Or w.r.t. both at once. - We could ask what happens if you plug in different values than
, or leave this unevaluated. Say, . - This entropy is now an analytic function of
, or of , perhaps its derivatives mean something. - This ought to allow us to define an entropy relative to any parameter, like
, even if the polynomial in that expression is not well formed. Well, not quite: . Hum. What am I reaching for here?
If instead we mark terms with
In Summary
That’s enough musing. I’ll summarize the progress I’ve made on untangling my confusions:
- We cannot simply take
of a g.f. at to get its “information”; this is a measure of information but it is not the “entropy of the series encoded by the g.f.”. - We can define a Shannon entropy on a g.f. as a certain functional, where the choice of “which terms we are defining the entropy with-respect-to” is baked into the form of the entropy functional itself, though there might be some “natural” choice for simple g.f.s. This correctly makes the entropy decrease as we set variables equal to each other, like
. We find that the entropy of some g.f. ranges from a maximum of , which is like the sum of all its coefficients, down to if all states are identified with each other. - In general the entropy of a g.f. is only defined w.r.t. some choice of terms, and you could always plug in more fine-grained terms like
and then take an entropy w.r.t. , which would be even larger.
And regarding the composition laws of entropies:
derivation property holds for pointwise multiplication of distributions, where the acts like a dot product between terms of different . This holds for numbers; the pointwise-multiplication form just acts on each numeric term separately. As a bit of an abuse of notation we can write also write this with a : . This doesn’t mean much of anything on g.f.s. derivation property holds for the Cartesian products of distributions, where the outside of are now the sums , etc. This amounts to the previous case for one-element distributions, i.e. numbers. Cartesian products correspond to the product of g.f.s. in unrelated variables- The derivation property on products reduces to
for normalized distributions. holds for the disjoint union of sets/distributions, which corresponds to a sum of g.f.s. in different variables.- There might be a relationship between the action of
on mixed products (i.e. g.f. convolutions) and non-disjoint unions.
Comments
(via Bluesky)