Drawing Dimensions
I. Vectors
Behold a vector:
On the left is the regular orthonormal basis in 3 dimensions, and on the right are the components of a vector called \(\mathbf{a}\) in this basis. Simple enough.
If there are more than three dimensions, we can’t draw a vector “in perspective” in this way. Instead, let’s collapse the extras onto a single axis:
The double line represents “one or more dimensions”—the rest of the \(N\) dimensions, for any \(N\).
Maybe you even decide to forget how many dimensions are in \(\mathbf{a}_{3{.}{.}N}\), “reducing” the information in those axes into a single scalar \({a}_{3{.}{.}N}\). This will then give a 3 dimensional space as in the first diagram. There are various ways to perform this reduction, but an obvious choice is the one that preserves the vector’s length. \(\vert \mathbf{a} \vert= \sqrt{a_1^2 + a_2^2 + a_{3{.}{.}N}^2}\) will be unchanged if the “reducing” operation takes the remaining dimensions to their length \(\mathbf{a}_{3{.}{.}N} \to \vert \mathbf{a}_{3{.}{.}N}\vert\), since this has \({\vert\mathbf{a}_{3{.}{.}N} \vert}^2 = a_3^2 + \ldots + a_N^2\). But that isn’t the only option; you could, for example, project to a single one of the \((N-2)\) dimensions, such as \(\mathbf{a}_{3{.}{.}N} \to a_3\).
Or you go in reverse: maybe you thought you were working in 3 dimensions, and then: more pop out! You “unreduce” one dimension into a whole \((N-2)\)-dimensional subspace. This clearly adds information, so you have some choice in how you do it. This choice will in turn determine how operations on the original vector extend to the unreduced vector—what, for example, will happen to a rotation which previously took the “1” dimension into the “3” dimension? How should this map to the larger \(3{.}{.}N\) dimensions? It could rotate into “3” only (the inverse of “projection”), or it could map to all of the new dimensions “equally”, i.e. into the vector \((0, 0, \frac{\sqrt{a_3}}{\sqrt{N-2}}, \frac{\sqrt{a_3}}{\sqrt{N-2}}, ...)\), which has the same length as the original \(a_3\) but spread evenly over all \(3{.}{.}N\) dimensions (making this the inverse of the reduction to a length). Or something else!
Now, if you have one vector \(\mathbf{a}\), you can decompose some other vector \(\mathbf{b}\) into components parallel and perpendicular to \(\mathbf{a}\).
Here I’ve adopted a few conventions:
- \(\perp a\) and \(\parallel a\) are subspaces, and will not be typeset in boldface. These are labeled in the diagram as lines without arrows, while vectors have arrows at the end. Their “negative” halves are not depicted (what would negative \(\perp a\) mean?)—but it might be useful to depict these in other instances.
- A double line is again used for a subspace of greater than one dimension. Here, in \(N\) dimensions, \(\perp a\) will be \(N-1\)-dimensional.
- On the left, the scalar projection \({b}_{\parallel a}\) and rejection \({b}_{\perp a}\) are shown. The “sides” of \(\mathbf{b}\) are labeled something like their lengths, but \({b}_{\perp a}\) should be thought of as containing \(N-1\) components.
- On the right are shown the vector projection \(\mathbf{b}_{\parallel a}\) and rejection \(\mathbf{b}_{\perp a}\).
- These projections and rejections are written to suggest that they are operations between \(\bf{b}\) and the subspaces \(\perp a\) and \(\parallel a\), rather than between \(\bf{b}\) and the vector \(\bf{a}\) itself. This is helpful because it avoids having to make reference to any particular basis on the subspace \(\perp a\);
The same decomposition as an equation is:
\[\begin{align} \mathbf{b} &= \mathbf{b}_{\parallel a} + \mathbf{b}_{\perp a} \\ &= \frac{\mathbf{b}\cdot\mathbf{a}}{ {\vert \mathbf{a} \vert}^2} \mathbf{a} + \frac{\mathbf{b}\cdot\mathbf{a}_\perp}{ {\vert \mathbf{a}_\perp \vert}^2} \mathbf{a}_\perp \end{align}\]More conventions: \(\mathbf{a}\) is a vector. \(\mathbf{a}_{\perp}\) could only be a specific vector in \(N=2\) dimensions, and even then there’s no particular reason to choose any particular vector on the subspace \(\perp a\). In more than two dimensions, we will take \(\mathbf{a}_\perp\), \(\mathbf{b}\cdot\mathbf{a}_\perp\), and \(\frac{\mathbf{b}\cdot\mathbf{a}_\perp}{ {\vert \mathbf{a}_\perp \vert}^2} \mathbf{a}_\perp\) to mean “whatever they need to” for the above to make sense. a matrix, perhaps, or an oriented area. Once we’ve defined \(\mathbf{a}_\perp\), we can just as easily start over by projecting \(\mathbf{b}\) onto that. Then we’d call the \(\mathbf{b}_{\parallel(a_\perp)}\) term the “projection” (onto an \((N-1)\)-dimensional space) and \(\mathbf{b}_{\perp (a_\perp)}\) the rejection. We should get the same result:
\[\mathbf{b} = \mathbf{b}_{\parallel (a_\perp)} + \mathbf{b}_{\perp (a_\perp)} = \mathbf{b}_{\perp a} + \mathbf{b}_{\parallel a}\]So whatever \(\mathbf{a}_\perp\) means, it ought to able to play the role of “projection” and “rejection” equally well.
The left diagram above depicted a vector in terms of its two components \((b_{\parallel a}, b_{\perp a})\). The latter component could be taken to stand for \(N-1\) components at once, or could represent a “reduction” to a single scalar as discussed earlier. With this convention we could “draw” an \(N\)-dimensional vector in a plane, and at least preserve the apparent “orthogonality” of the components parallel to and perpendicular to \(\bf{a}\).
Let’s now throw out the rule that “right angles represent orthogonal dimensions”. Instead, for the rest of this post, we’ll take every “half-axis” to represent an entire dimension, and orthogonal to all the rest, no matter what angle they’re drawn at. Double-lines will represent multiple dimensions collapsed into a single half-axis. We can then fit more than two dimensinos in a single diagram. Here’s a 10-dimensional space and a vector \(\mathbf{a}\) (which is zero on dimensions 5 through 9). I’m encoding the absolute values of the projections onto each axis as a dotted line.
Is this anything? I’m not sure. There’s no way to draw the “vector itself” with any kind of arrow; we can only depict its components as a kind of polygon, and then only up to their signs and up to the choice of “projection” on any multi-dimensions (the \(3,4\) dimension here.) In another basis, the sum of squared components would be unchanged; of course one basis is the one with only a single component, in which case the vector would have no “polygon” associated with it—but I do wonder if anything can be said about this polygon that is invariant under basis changes and the arbitrary orientations of the axes in the diagram.
This kind of diagram might be more sensible to depict probabilities, since these can’t be negative anyway. Or perhapsI am hoping to keep open the option of using different “reductions” rather than only using “length” \(a_{3,4} = \vert \mathbf{a}_{3,4} \vert\), but perhaps it would be more sensible to label the graph with a boldface vectorial component if so.
In this new convention we can read the very first diagram of \(x,y,z\) components as 3 half-axes splayed out, instead of as a drawing in perspective:
We could also omit the axes, and simply depict the vector itself with variable-length sides, which could either be scalar components or vector projections:
II. Matrices
We can do something similar with a matrix. Suppose you have some rank-10 real matrix \(A\), which can be diagonalized as follows:
\[A = \begin{pmatrix} \lambda_1 \\ & \lambda_2 \\ & & \lambda_{3,4} R_{2 \times 2} & \\ & & & \lambda_{6{.}{.}9} \mathbf{I}_{4 \times 4} \\ & & & & 0 \\ \end{pmatrix}\]That is, this matrix:
- scales its first two eigendimensions by \(\lambda_1, \lambda_2\) respectively.
- rotates dimensions 3 and 4 into each other, while scaling by \(\lambda_{3, 4}\).
- scales dimensions 5 through 9 by a common factor \(\lambda_{6{.}{.}9}\).
- annihilates dimension 10.
This can be visualized as follows:
This could be read as a vector along the lines of the previous section, but I don’t think that interpretation would be very meaningful. Instead this should now be thought of as a simply standing for the diagonal representation of the matrix itself.
The first two eigenvalues are the biggest, so we could approximate this matrix by only its first two “principle components”, i.e. by zeroing all but the first two eigenvectors:
This isn’t quite a normal “principle component analysis”, but it’s a similar idea. The action of \(A'\) on a given input vector \(\mathbf{x}\) will deviate from that of the original \(A\) in some way that depends on \(\mathbf{x}\)’s components along the zeroed dimensions:
\[(A - A') \begin{pmatrix} x^1 \\ x^2 \\ \mathbf{x}^{3{.}{.}10} \end{pmatrix} = \begin{pmatrix} 0\\ 0\\ \mathbf{\lambda}_{3{.}{.}10} \cdot \mathbf{x}^{3{.}{.}10} \end{pmatrix}\]We could therefore characterize the same PCA as a reduction of the vector space to a three-dimensional space, where the first two dimensions are the principle eigendimensions and the third applies some choice of “reduction” operation, which in some sense characterizes the error of the approximation:
The action of \(A\) in the “reduced” space would then be
\[A \begin{pmatrix} x^1 \\ x^2 \\ x^{3{.}{.}10} \end{pmatrix} = \begin{pmatrix} \lambda_1 x^1 \\ \lambda_2 x^2 \\ \lambda_{3{.}{.}10} x^{3{.}{.}10} \end{pmatrix}\]At this point I’m speaking vaguely—I’m not clear how to make this “joint reduction” precise.
One thought is that the value of \(\lambda_{(3{.}{.}10)}\) could be taken to be the Frobenius norm of \(A' - A\), i.e. the squares of the zeroed eigenvalues:
\[{(\lambda_{3{.}{.}10})}^2 = {\Vert A' - A\Vert}_{F} = {(\lambda_{3,4})}^2 +4{(\lambda_{6{.}{.}9})}^2 + 0^2\]Then \(x^{3{.}{.}10}\) might be defined to the norm of these components
\[{(x^{3{.}{.}10})}^2 = {(x^3)}^2 + \ldots + {(x^{10})}^2\]I’m not sure, though, how to argue that this might be the “right” answer for the corresponding reduction of \(\mathbf{x}\). Other options might be an eigenvalue-weighted-average over the remaining dimensions, but this makes reference to \(A\), or just the mean square component of \(\mathbf{x}\).
Another option for the “reduction” operation might be to define \(\lambda_{3{.}{.}10}\) as the determinant of the minor of \(A\) on the 3…10 dimensions. (In the present example this would be zero if any of the eigs are zero, so this might be more applicable to some class of matrices.) Then the corresponding reduction on \(\mathbf{x}\) might be a geometric mean, a median-absolute-value, or perhaps the norm again. Or perhaps the only natural thing is to consider the action on \(N-2\) vectors at once. Not sure.
Or we could approximate even further, by reducing the matrix down to a single dimension:
Some good candidates for the “one-dimensional reduction” of a matrix are:
- the determinant, or geometric mean, of eigenvalues
- the trace
- the Frobenius norm
- the sum of absolute values of eigenvalues
- the largest eigenvalue, or perhaps its absolute value
This is really the starting point for this whole line of thinking: it seems that there are multiple sensible ways to “reduce” a linear operator to a single number. Each ought to be able to be thought of as operations on the vector space instead on the operator, and it should be possible to “partially apply” any of them to give successive “approximations” to the original operator. So what I’m seeking here is a kind of “unified framework” in which to understand a number of disparate linear algebra concepts.
Now, when we mapped \(\mathbf{a}_{3\ldots N}\) down to one dimension \(a_3\), there was an obvious way to do that so that \(\mathbf{a}_{3..N}^2 = a_3^2 + \ldots + a_N^2\). But there’s no reverse operation that brings all that information back: you could unfold the scalar \(a_3\) into any particular vector in your new dimensions 3..N, but there is way to distinguish any of the new dimensions from each other unless you also specify that structure.
Or, consider rotating around dimension 2 (rotating dimension 1 into 3), before and after the unfolding. Before, \(\mathbf{e}_1\) mapped to \(\mathbf{e}_3\). After, does it map to any particular vector? To all vectorsin 3…N? To an equivalence class of vectors? All of these will work, but I think the most sensible and least opinionated target is to map to the volume element on the unfolded space:
\[a_3 \to a_3 \mathbf{e_3} \wedge \mathbf{e_4} \wedge \ldots \wedge \mathbf{e_N}\]This looks like like the reverse of what we just did with the matrix \(\mathbf{A}\), so we can run that backwards to see where to go from here. We arrived at \(\det{A}\) by forgetting the dimensions of the matrix; therefore, anywhere it it appears, we can restore the matrix’s full dimensions pulling eigenvalues out of the \(\det\):
\(\det{\mathbf{A}}\) could of course be the determinant of many matrices—knowing it belonged to \(\mathbf{A}\) amounts to a choice of how to unpack it—the same choice we would have to make with our \(\mathbf{a}_3\). Some applications might be indifferent to the choice; others might depend on the specific choice, or that the same choice is made each time an unfolding occurs.
(Of course you can imagine folding 10 dimensions into 2, or 3, or unfolding 2 into 4—the choices go up.)
All of this isn’t so strange: we do this to get \(\mathbb{C}\) from \(\mathbb{R}\) all the time. “Actually this \(r\) is 2D”:
And there are different ways you can do it. The obvious one is to identify \(r\) with \(r + 0i\), but it could also become any other vector. Or you could do something weirder, with whatever properties it entails: maybe you take \(r\) to the set of all the complex numbers of radius \(r\) (with the sign of \(r\) encoding something), or to the volume element \(1 \wedge i\).
This is all to say that “folding” and “unfolding” dimensions are underspecified. I am not necessarily talking about \(r \to r + 0i\) or \(\mathbf{a} \to \vert \mathbf{a} \vert\); these diagrams don’t care how you do it.
III. Trajectories
Here’s another typical application. You start with the trajectory of a particle moving in 3D:
We can’t see the time axis, so let’s fold the spatial axes into a single dimension:
Schematically this makes sense, but how could we actually reduce the trajectory \(a(t)\) to something meaningful this diagram? The “natural” way from above, \(\mathbf{a} \to \vert \mathbf{a}\vert\), wouldn’t know about any spatial rotations; movement on the surface of a sphere would stand still, so it would be forgetting the speed. Projections down to one dimension have the same problem: they lose translations and velocities in the other dimensions. (If the trajectory doesn’t vary along the lost dimension, then go right ahead. The expression for the reduced trajectory will only contain the fixed value—radius \(R\), say.)
The only way to project down to 1D while retaining the exact speed of the particle is to simply integrate the speed:
\[\int \vert \mathbf{v}(t) dt = \int \vert \frac{d\mathbf{a}}{dt}(t) \vert dt = \int \frac{ds}{dt}dt = \int\limits_{a(t)} ds = s[a(t)] = s(t)\]This gives us the arc length \(s\) as a function of \(t\).
This \(s[t]\) trajectory can never decrease, so if it starts at zero it can never even be negative—like \(\vert \mathbf{a}\vert\) it’s more like half of a dimension. We could map some property like the curvature of \(a\) to a sign, but to remain in 1D we’d have to throw out some other information.
IV. Physics
Flipping the axes and it looks like a Minkowski diagram:
For a physical particle the shaded area of the above diagram is inaccessible: a particle at the origin would have to move faster than light to reach those positions at those times. Equivalently: a movement in space \(dr\) requires at least \(dt = \frac{dr}{c}\) change in time. Of course, we can conceive of those coordinates—nothing stopping us from saying “there’s an object at \(r=10, t=0\)”. But for the moment let’s focus on the particle.
The dimensions of a particle trajectory exhibits this dependence, which turns up so often that we might try to find a notation for it:
The arrow from one dimension to another I read as “space depends on time” or “time precedes space”, meaning that “a change the spatial coordinate always requires a change in the time coordinate”. The reverse doesn’t hold: a change in time could be accompanied by a change in space, but it doesn’t have to be. It doesn’t mean “causes”, because you wouldn’t say “time causes space”. Instead it means that spatial coordinates can be causally related if they can be identified as the dependent variabels of an independent time coordinate.
We can get a little quantum-mechanical about this. Let’s also plot the phase of a quantum wave. A wave function advances its phase along any single trajectory by the action \(S[a]\) of the path: the mass-energy of the particle \(\times\) the proper time it experiences in its rest frame, in units of Planck’s constant \(h\).
\[\begin{align} \psi \to e^{-i \phi}\psi &= e^{-i\int d \phi} \psi_0 = e^{iS[a]/\hbar}\psi_0 \\ d\phi &= \frac{mc^2}{\hbar}d\tau = 2\pi \frac{mc^2}{h}d\tau \end{align}\](There’s a \(2\pi\) because it’s a phase, it’s like a unit conversion.)
In some other reference frame parameterized by \(t\) the action is found from a Lagrangian:
\[-i \phi = i\frac{i}{\hbar}S[a(t)] = \frac{i}{\hbar}\int Ldt = \frac{i}{\hbar}\int (-mc^2\frac{d\tau}{dt})dt = -\frac{i}{\hbar} \int (\mathbf{p}\cdot d\mathbf{x} - Edt)\]Hence changes in space and time both cause changes in phase, by amounts proportional to the momentum and energy respectively (the two components of \(m d\tau\) as seen in the \(\mathbf{x}, t\) frame). Great. So:
In the left diagram the arrow from \(t\) passing through “phase” is meant to apply to to “phase” as well; this will come in handy later. So read this as: “time and space precede phase” or “phase depends on space and time”; a change in phase requires either an increase in either time or space (which still depends on time).
The right diagram collapses space and time into a single dimension. Then: a change in phase requires a change in spacetime. But you can contrive a change in spacetime which results in no change in phase. All you need is \(\mathbf{p} \cdot d\mathbf{x} - H dt = 0\). That’s going to be hard though, because that quantity (in special relativity) is
\[(\mathbf{p} \cdot \mathbf{v} - H)dt = Ldt = -m\sqrt{c^2-v^2}\]so it’s only \(0\) at the speed of light. For our lone massive particle, this is also inaccessible, so these phase and spacetime dimensions aren’t actually independent. Something must have been “unfolded” incorrectly somewhere, which leaves us with two dimensions that represent the same thing. Might as well collapse them further:
All of that was the phase along a single trajectory, but if you propagate multiple trajectories and let them recombine they can cancel out again. This is where you go off and take a full path integral, counting all the ways they can recombine to cancel out, which leads to the rest of physics. For this you do need to track the phase and spacetime dimensions separately, as far as I know.
\(Ldt\) above is just \(-mc^2\frac{d\tau}{dt}\), which is doing the same thing as the arc length \(\frac{ds}{dt}\) in our earlier example; \(-m\sqrt{c^2 - v^2}\) is just an arc length. Just like with arc length, we’ve projected a whole trajectory down to a single \(t\)-dependent function. And the speed of this curve is constant (it equals \(mc^2\)); only its angle in \(d\mathbf{x}, dt\) can change. So this reduction has thrown out some information that, in fact, wasn’t changing anyway, which makes it a natural choice. We’re left with an \(mc^2\) representing the fixed speed. There’s a Noether theorem in here: any invariant of the system can be folded out cleanly; the resulting \(L\) depends only on the constant value (\(m\)); similarly any parameter which can be folded out must be conserved.
Hence \(Ldt\) is what we get when we “fold” a trajectory down to a single one-dimensional expression. We get two handles with which to compare to other systems (mass and time) along with a bunch of unit conversions (mass to energy to action to phase).
If we decide to expand our system in terms of any other parameters, \(Ldt\)’s functional dependence on those parameters will encode a lot of information about the system; hence we can expand \(Ldt\) in terms of \(d\mathbf{x}\). But \(L\) itself must be indifferent to the specific coordinate system we use.
This is just like how our determinant \(\det \mathbf{A}\) contains much of information about \(\mathbf{A}\) itself. But it’s expansion in terms of eigenvalues only knows about the set of eigenvalues, not their order, nor the vectors they correspond to. Getting information out of the folded objects \(\det \mathbf{A}\) or \(L\) requires adding parameters and indicating how they relate to each other: \(\det \mathbf{A}(\lambda_n)\) tells us more than the value of \(\det\mathbf{A}\) alone; same for \(L(\mathbf{x}, \mathbf{v})\).
V.
TODO. There will be much more here.