Division I

Introduction
Scalar / Scalar
Scalar / Vector
Vector / Vector
Vector / Matrix I: Orthogonal Matrices
Vector / Matrix II: Non-orthogonal Matrices
- Invertible Matrices
- Non-Invertible Matrices
Closing Remarks

Introduction

The opposite of multiplication is division, right?

This series of posts will attempt to develop a non-standard notation for linear and exterior algebra by making extensive use of “division”. I’m not sure if this is actually a good idea, so the reader should beware.

Scalar / Scalar

If and are numbers, solve for :

There are three cases:

Case 1: is the number .

Case 2: … unless is zero, in which case and there’s no solution.

Case 3: … unless is also zero, in which case and any number is a solution.

Easy. Now let’s talk about vectors.

Scalar / Vector

If and are vectors in , solve for :

Three cases again:¹

No solution, unless we’re in one dimension. The equation fixes the component of onto to be , but we’re free to add any perpendicular vector without changing the answer.
… unless is zero, in which case even there are no solutions.
… unless is also zero, in which case any -vector is a solution.

Note (2) and (3) are the same special cases as in scalar division.

For (1), the set of solutions is a “generalized inverse” of the function “dot product with a vector ”, which I’ll write

This can be thought of in a few ways:

a) as a set of solutions
b) or as a “standard” solution plus any single element from the orthogonal subspace
c) or as a function to solutions, depending on the choice of element in the orthogonal subspace:

Of these, (a) is most commonly seen, but in this post I’ll prefer (c).

My preferred notation will be as follows. The full inverse be represented by , and depending on the context can have any of the above three interpretations as the full affine set, a standard part plus a homogenous set, or a function to a single value. A “fraction” or “division” notation will always stand for the “standard” or “natural” part of the inverse. We then use for the “orthogonal” or “complement” term:

This full solution will be called either a “general solution” or a “generalized inverse”. The “standard” term might also be called the “principal value” or “adjoint”. itself will be taken to represent the subspace perpendicular to , and this notation writes as a projection indicating that it is a member of that space. When it’s unambiguous I’ll drop the subscript from , and I’ll frequently omit the argument from itself. So the short version of the general solution is:

The “division” notation is the main point of all this. This “generalized inverse” with a “division” for a “standard part”, will be our 4th “prototype” case of division:

Case 4: Division gives a solution only up to the addition of a term which multiplies to zero, for whatever definition of “multiplies” we’re currently using.

This is non-standard, reader beware, but my aim is to take it as far as I can, as a way of unifying what would otherwise be a number of distinct concepts.

How should “division by a vector” work, then? It is apparently equivalent to , so we have:

The two copies of in "" simply cancel out. It’s right to think of this as just times ; multiplies against a thing to measure “how many copies of are in it”. It might be convenient to omit the dot product and just write . Then we can even write this as . Then we can extend this to multiplication by a multiple of in the obvious way:

and

But we’ll want to be careful about assuming these act like regular fractions in any other ways—these “fractions” will only have the specific properties we name.

We’ll take on the case of vectors not parallel to in the next section.

The standard part is technically better thought of as a “dual vector”, which we can associate to the corresponding vector through a scalar product. At the current level of sophistication they will be equivalent, but I will use the name as it is more suggestive of the meaning of this thing—I’m interested in making the simple cases simple, not in full generality.

The remainder is another vector in the same space as . It can be given “coordinates” on the -dimensional space if we choose a basis.

In 2D we can span the space with the single vector , rotated 90 degrees from . Then the inverse has a single scalar coordinate :

Note that I’ve used for the subspace, but with a subscript for the vector itself. Note also that, although would seem to be the obvious choice for a second basis vector, it is exactly as suitable as any other vector in for the purposes of parameterizing the complement term; only the value of will change accordingly—in fact, can simply be thought of as being defined by division .

In dimensions we need basis vectors, which I will call , reserving for a basis on the entire space. We can put these in an matrix ; then the vector will have coordinates in this basis, which can be written as a matrix multiplication:

I’ll use the symbol for the -dimensional component vector , distinguishing it from the -dimensional vector . Then the full inverse is:

If the same basis is used for and , filling in the last dimension with , we could also write the entire equation in components and leave out entirely:

This demonstrates that the “generalized inverse” acts like a combination of:

regular division on the single dimension , with solution .
zero divided by zero on the other dimensions, with the free parameter representing “any solution” to these divisions.

This is rather imprecise, though: could just as easily be interpreted as in the first dimension and in one of the others, and in this view would have no solution.

Before we move on, I’ll note one common example that works like this: the integral is the inverse of a derivative operator but only up to an integration constant :

The constant function therefore is the “orthogonal complement” of , considered as a linear operator on functions.

Vector / Vector

If and are vectors in , solve for :

Three basic cases:

If and are parallel, then is the ratio between their lengths. So if , then . Easy.
… unless , in which case there’s no solution.
… unless , in which case any is a solution.

But if isn’t parallel to , we don’t get any of the above. So we’ll add a 5th case:

Case 5: Division can be defined to give a standard part but is not a “generalized inverse” of multiplication. Instead a remainder is left over.

The remainder case is exactly what occurs for scalar division on integers. If in our original scalar example we had taken all three variables to represent integers, then an inverse would’t necessarily exist. Instead you can either choose a “best” answer, for which a good choice would be floor division , or you can use floor division but additionally include a “remainder” to get an “exact” solution:

We will want to think of the remainder here as being “part” of ; like a data structure with two fields for “standard part” and “remainder”. Then this data structure is effectively an inverse of multiplication despite no longer being a “number”.

Returning to the case of vectors, can be divided into a sum of a “projection” parallel to and a “rejection” perpendicular to :

Evidently the parallel part can be seen as the “best” answer for “division”, with the perpendicular part the “remainder”.

Rather than a floor symbol , we’ll again use the division notation of the previous section. multiplies other vectors like , so the division of vectors can stand for the scalar projection itself. Once again the fraction notation will be able to stand for a “standard part” of a generalized inverse:

In the scalar-over-vector case, multiplying by again gave —this was a true inverse, if not a unique one. Now it’s not a true inverse; instead multiplication by only gives the vector projection (again omitting the in the second term):

This will be a general pattern: division-and-multiplication produce a projection. Apparently we can sort of treat this like a fraction and move the multiplied vector into the numerator, but this will only work for vectors parallel to the denominator.

In two dimensions we can do the same for the rejection term, letting us write the decomposition of the whole vector in the basis :

In higher dimensions we’ll have to use a matrix in place of , which we’ll come to shortly.

You can think of division-as-projection as measuring “the number of times goes into ”. I imagine calculating this by measuring the length of one stick by measuring with another. Note that projection defined this way is a dimensionless ratio. It should not be confused with “scalar component of along the unit vector ”, which I will call and which would be written in the present notation as

The rule “division means projection” turns out to be exactly what is needed to interpret a standard derivative as a fraction in spite of the usual warnings to the contrary:

Here are partial derivatives and is to be thought of as a vector in a space of displacement basis vectors , although in this case the coefficients are not “dimensionless”. Then projects onto the displacement, which makes this a “total derivative”; the terms account for the basis vectors being non-orthogonal.

This does not work for partial deriatives , though. We’ll come back to those.

I introduced the projection notation as a natural extension of “division with remainder”, but there are a couple of other ways of looking at it.

The first is to expand in a basis . Then the original equation looks like:

This is clearly now a case of “dividing by zero”. Furthermore these are equations in one unknown, so it could hardly be solved in any basis. So the “standard part” handles division by zero by simply ignoring it, while the general solution with a remainder keeps track of all the “no solutions” to those divisions-by-zero.

Secondly, we could expand the space in which lives. This will be our 6th “prototypical” case:

Case 6: Division is defined as an exact inverse, but within a larger space of solutions.

Of course this can get out of hand—you could in principle expand the space to anything. The trick is to expand it as little as possible while still getting a division operation that’s well-defined.

For the present example, if we let be a matrix , we get matrix equation (multiplying from the left for simplicity) and would always have at least one solution unless .

The equation places two constraints on : it should simultaneously map and also . alone has to produce both of these, since it’s all we have to work with. For and a basis of this constrains to the form

but this is apparently more degrees of freedom than are actually needed. Two will do, and an obvious choice is to write as a sum of a “pure scaling” and a “pure rotation”; then there is a single solution

which is a true inverse of multiplication:

This is not the only choice of two “basis” matrices out of which to construct the inverse, but these are particularly nice ones. If we call and , this is just the division of complex numbers: for , complex division would be:

The first term is obviously the projection analogous to . The second term is a little harder to see, but if we take , the numerator is the analog of .

In more than two dimensions, any matrix which rotates into would suffice as a solution, so the generalized inverse would also have free parameters, as in the scalar-over-vector case. The full space of matrices is more parameters than we need: for any pair of , we can solve by applying an appropriate combination of the identity and rotation within the plane spanned by the two vectors, while taking parameters on the rest of the space, which themselves could represent any combination of scalings and rotations we like. That is, in a basis of , a general solution is

One can take this line of thinking much further, but we’ll turn back.

Now, if we can decompose a vector into a projection and rejection by dividing and multiplying,

we can surely decompose it into an orthonormal basis in the same fashion:

This will work even if the basis vectors are not unit-length, since the lengths divide out, but it won’t work if they’re non-orthogonal.

The above expression looks something like a single “division and multiplication” on the entire basis at once, i.e.

But to assign a meaning to this in generality we need the matrix inverse, which brings us to the next section.

Vector / Matrix I: Orthogonal Matrices

is a matrix. Solve for :

We’ve bumped into the matrix inverse three times already:

when we wanted to put coordinates on the term in the generalized inverse
when “expanding the space of solutions” of to get
just now when considering the components of a vector in an arbitrary basis:

The third case is the simplest, so let’s go in with that in mind. And I’ll start by considering the simplest case of a matrix of orthogonal basis vectors.

Invertible Matrices

For now, as the matrix with orthogonal basis vectors as columns. Then the solution vector will represent “the coordinates of in ”:

The scalar and vector components of can therefore be found by projecting onto each column one at a time (but only because matrix has orthogonal columns!):

This means we can immediately write down the matrix inverse ; it is simply the matrix where each row is the inverse of one of the original columns :

The matrix shown is therefore the inverse . With it, we can write the decomposition of in this basis as a projection-and-multiplication, just as if it was a single vector:

That is the simplest case: the matrix is perfectly invertible, so its “generalized inverse” is exactly equal to its standard part and the therefore is the identity. This is akin to case (1) from the scalar-over-scalar example.

We can also convert the inverse-vectors into regular vectors:

The denominators of these fractions are the diagonal elements of , because—again—the columns are orthogonal:

so we can also write in “numerator” form:

Thus we have , which is (the simplest form of) the “Moore-Penrose pseudoinverse”, normally written . I much prefer the division notation, and I find the symmetry between these the matrix and vector notations to be quite clarifying.

Before we move on to non-orthogonal columns, let’s look at how the cases of (2) “division by zero” and (3) “zero-over-zero” arise for matrices.

Overdetermined Matrices

“Dividing by zero” is most easily seen in the extremely simple example of a diagonal matrix which has only nonzero diagonal values:

This gives equations of the form

which can of course be solved, and equations of the form

which can’t be solved in general, unless the happen to also be zero.

(This is also the general case: any matrix can be transformed into this form via a “singular value decomposition”, although the vectors will be rotated and will appear with different components.)

When talking about matrices this case is called “overdetermined”, and it arises either when is square but has a rank of less than , or when is non-square of shape , that is, maps , with . The earlier case of “vector-over-vector” division was a special case with .

We can view the overdetermined case in a few equivalent ways:

represents distinct equations in unknowns (which is the source of the word “overdetermined”)
Or, is not in the span of the columns of and therefore cannot be assigned “coordinates” in the basis of its columns.
Or, the rows of , each of which is a constraint of the form , amount to contradictory constraints on the vector .

But of course we’ll still try to define a “standard part” of the inverse in the overdetermined case. We’ll treat it like “division with a remainder” and define a “fraction” such that:

For the diagonal example this is simply:

The “rejection” or “remainder” term is just the projection of onto the final dimensions where gives zero.

In the above example I’ve set to zero, but in fact these can be free parameters because will kill these dimensions anyway. We therefore get a term in addition to a remainder:

In all we have:

I’ve used two different symbols and , since, while we didn’t have to distinguish for one-dimensional vectors, for matrix the vectors and will live in different spaces. The first is the “kernel” or “null space” of vectors which go to zero under , while the latter is the “cokernel” of vectors which are not in the image of (and are therefore the kernel of .) So we could also call the “free parameter” and “remainder” by the names “kernel term” and “cokernel term”, for some symmetry.

The following diagram sketches the full picture: maps its row-space² to its column-space invertibly, maps its kernel to zero, and maps nothing to its cokernel. The double lines stand for “a one-or-more-dimensional subspace perpendicular to all the rest”, so the row-space is the complement of the kernel and column-space is the complement of the cokernel.

It’s just as easy to define the standard part of the inverse for the earlier example of a matrix of “orthogonal basis vectors”. If the first columns of comprise a basis, but the rest are zero, then the standard part of the inverse is

This again inverts each basis vector separately

and we can write this in “numerator form” as shown earlier, which is the pseudoinverse:

The “standard part” notation is conveniently ignoring the zeros in , meaning that expression is well-defined.

Given this we can easily write the rejection/remainder/cokernel term as well; it is the term usually appearing in the pseudoinverse equation:

Underdetermined Matrices

What about the case of “zero divided by zero”? This typically occurs when is non-square, , with . Then amounts to equations in unknowns and is called “underdetermined”.

An equivalent case can arise for square matrices when the rank is less than , but only when applying the inverse to a specific vector such that the two (equivalent) conditions hold:

The span of ‘s columns is less than the full space, but is in this span.
The row-constraints are duplicated constraints on (rather that being inconsistent with each other.)

In either case an inverse will exist, but there will only be “free parameters” (akin to ), and no remainder. Mixed cases are also possible: can have components outside the span of (and thus no solutions) while also being zero on some dimensions eliminated by (and thus having infinite solutions on those subspaces).

The simplest underdetermined case is a diagonal matrix:

It will be easier to write these in block notation, so the above is equivalent to:

The “standard part” of the inverse is clearly

and the generalized inverse is

which is just the component representation of

The “mixed case” can be seen on the simple example of a diagonal matrix, where it looks like

Here is invertible on dimensions, has no solutions on the next dimensions, and will have free parameters. Just as in the overdetermined case, the expression here is not actually a “generalized inverse” of . We still need a remainder/cokernel term:

The underdetermined case on “orthogonal basis vectors” works exactly the same as the diagonal case, so I’ll skip it and we’ll move on to non-orthogonal matrices.

Vector / Matrix II: Non-orthogonal Matrices

is a matrix. Solve for :

I’ll now assume that the columns of aren’t orthogonal, but we’ll take to be square and full rank, so it is invertible.

Invertible Matrices

Conceptually, the simplest way to understand the general matrix inverse makes use of the wedge product and some exterior algebra, which I must assume the reader is familiar with, since this post is long enough already. (Ultimately the point of this series of posts is to simplify exterior algebra to the point where it is blends smoothly into elementary geometry, but this isn’t the place to get into all that.)

We begin by writing out the above matrix multiplication in a way that suggests an interpretation as “the coordinates of in the basis of the columns of ”, i.e.

We can then “solve for” any single by wedging both sides of this expression with all of the other columns of . The “star index” (a non-standard notation) represents the ordering of columns-other-than- such that gives determinant times the -volume: . This can also be seen as the “th” wedge power of the matrix acting on the complement of the basis vector , written ), where is the “Hodge Star” operator which gives the -dimensional volume orthogonal to a vector.

All of the other terms for because the column appears twice in wedge product. The only one that survives is the term:

At this point the left and right sides are both -volumes, so the division of the two gives the answer:

This is “Cramer’s rule” in a slightly esoteric notation. The denominator is equal to the determinant of times the -volume . The overall sign won’t change if both and are transposed into the position originally occupied by , which means we can interpret the numerator as “the determinant of if column is replaced by ”, which gives the usual Cramer argument.

The above expression can be interpreted for now as a ratio of two scalar areas, but is in fact a “standard part” of a division of two -dimensional volumes, which we’ll do more with in the next post in this series. A more-technically-correct equation in scalars can be found by applying the wedge-product identity :

Then the volume divides out, and we can rewrite as to get

The matrix inverse “standard part” is therefore:

This expression is analogous to or , except that now we have wedge powers of on top and on the bottom.

We can also identify

as the “dual basis vector” to itself, whose scalar product with another vector gives its component in the basis of .

I don’t find the final inverse expression to be very enlightening, though. The clearest expression is

which says: to find the component of in the basis of the , calculate the projection of the -volume “-wedged-with-all-the-other-columns” onto the “wedge product of all the columns”.

This can be understood visually with the aid of the diagams following. I’ll consider for simplicity but still work in dimensions, representing the dimensions by a self-explanatory “double line” notation; the argument works exactly as if there was only one other dimension.

As can be seen by studying this diagram, it is not the case that is merely the projection of onto , that is, , because all the other can have components along as well. That projection is just , which is labeled in the image. Instead needs to supply the component of that only can: the component orthogonal to all the other columns . We therefore choose a value of such that its projection onto gives . Then (which should be seen as a matrix multiplication of coordinates with the other columns) is chosen to supply the rest of ; clearly this does have a component along .

If we had first determined and then , we would have drawn the same triangle but rotated by 180 degrees around the origin to the other side of .

It is somewhat surprising to me that the two requirements ” has to supply the component of perpendicular to all the other vectors” and ” has to supply the component of perpendicular to ” are consistent with each other. It feels like these ought to be circular or should fail to comprise all of or something.

The next diagram depicts the same argument in terms of areas:

I’ve now written for brevity. The areas and are proportional to the rejections and respectively, by a common ratio . These can be thought of for now as equations in areas, but can also be understood as geometric division of volumes by vector, as we’ll see in the next post. The component is then equal to the ratio between these rejections—it is the number such that , which is equivalent to the first diagram’s argument that ” has to supply the component of perpendicular to all the other vectors.”

So we’ve found that the general case of the matrix inverse has a standard part that is not simply

but the Cramer expression (using )

This, by the way, is exactly the same as the distinction between a total derivative and a partial derivative , which can be seen by these side-by-side with the projection of onto a single column :

This suggests first that the act of taking a “partial derivative” with respect to , which we normally think of as “vary while holding the other coordinates fixed”, can be seen as applying the “dual basis vector” of to (though it would be nicer to write this with or to act on ) rather than a mere projection , and secondly that the partial derivative ought to admit a Cramer-type formula

The denominator will clearly give a Jacobian determinant, but I’m not sure what to make of the numerator. I suppose it will turn out to be times a perfectly nice -vector in the space of the , but the “mixed units” on that thing look funny to me.

Non-Invertible Matrices

What about the cases of over- or under-determined matrices? In either case we know from the preceding section that will be invertible on its image, so if has rank then we should be able to sort its columns such that the first are linearly independent and write down Cramer-type formula for components of the ,

where now stands for the complementary set of columns of with respect to (rather than ).

We can also see this as starting from the view of as the components of

and then wedging both sides of this expression with only columns of , since after this point the r.h.s. will be zero for all . This will be enough to isolate the coefficients , but not uniquely, as there would be different combinations of columns which could isolate the same but with different l.h.s. expressions. To avoid this, the solution should be constructed only for of the components at a time.

In either case the generalized inverse will then take free parameters on the remaining dimensions, corresponding to equations of the form or .

We can go a little further and use Cramer’s rule to write an explicit basis on the subspace . Each of the columns is known to not be linearly independent from the first , so each can be written as a linear combination of the first columns by applying Cramer again, for example, . The above Cramer expression gives a component vector for in terms of the first columns . This implies that the vector

lies in the subspace , since

The vectors of this form constitute a basis on the space , and a general can therefore be decomposed in this basis as

which is clear enough, although not particularly useful.

The takeaway is that Cramer can be used for over/undetermined matrices by applying it only to some maximal set of independent columns, and then handling the free parameters and remainder terms exactly as before.

Something about this approach feels unsatisfying, though. If the original Cramer formula worked and let us define a standard part of the matrix inverse, it feels like it shouldn’t “stop working” just because some of the terms become zero. If we work on this simplest overdetermined case of a diagonal matrix

then the Cramer denominators will be products of the diagonal elements and zeroes, while will be product of diagonal elements, again with a bunch of zeroes.

It feels as though these zeroes could be “cancelled out” to produce the rank- Cramer expression—as if each ought to be “tagged” with the dimension it refers to, . Then the “same” zeroes would cancel while different zeroes still give a free parameter; then of course the first components would all share the same full set of zeroes as the denominator and would survive while the other would die.

I’m not sure if this notion can be made precise. I think a better view might be to think of the Cramer formula in areas as a “reduced” description of the full matrix , but as one that has been reduced too far for such cancellations to work; all the scalars in each dimension have already been made “equivalent to each other”. itself on the other hand is too fine-grained a description to make the calculation of components easy. I then wonder if there might exist an intermediate description that represents with exactly the right amount of information such that the zeroes can be “cancelled” meaningfully:

I suppose the “diagonal” representation I’ve been using (which as noted can be seen as the SVD of the original matrix) might be a candidate, but that doesn’t feel quite like the thing I’m looking for. But this has gone on long enough, so let’s leave it for now.

Closing Remarks

In total we have toured the following things which look like “division”.

gave us scalar-over-scalar division:
required a generalized inverse, which we could write as a scalar-over-vector division and a “free parameter”: ,
led to vector-over-vector-division as a notation for the projection , and also required a “remainder”
led to vector-over-matrix division which could be identified with the “pseudoinverse”, and which required both “free parameters” and “remainders” in general. And we also saw how this could be expressed in terms of wedge powers as a Cramer type formula, in general written as .

Six “cases” came up:

An exact solution
Division by zero, which had no solution
Zero-divided-by-zero, which permits any solution
The free parameter, which was required when (2) or (3) red along some dimensions of the problem
The remainder, which was required when (2) occurred along some dimensions of the problem
And “moving to a larger space of solutions”, which we touched briefly when solving vector-over-vector division with rotations.

All of this is really preliminary work for a larger project—my aim at this point has been to get my thoughts in order. There are two issues in particular which I did not take on in this post:

How complex numbers arise when diagonalizing matrices, which can be seen as an instance of “case 6”. I mostly avoided this by considering everything from the view of SVD, but in general it will be interesting to see how complex numbers arise when trying to invert purely-real systems.
Matrices requiring Jordan Normal form. Whatever I once knew about this I have long since forgotten, but my understanding is that these too can be diagonalized by a “case 6”-type maneuver, but now by adding “dual numbers” with to the number system.

That is surely enough for now, though. The next post in this series will be an attempt to express the basic constructs of exterior algebra as “divisions”; my hope is that some of these wind up appearing so elementary that they hardly deserve having their own names, but we’ll see.

For a thorough exploration of this kind of division, see my post on Elementary Linear Algebra. ↩
I am being fairly imprecise here by pretending vectors may be identified with their duals. doesn’t, in fact, do anything to its row space; does. ↩

Division I

Linear Algebra

Table of Contents

Introduction

Scalar / Scalar

Scalar / Vector

Vector / Vector

Vector / Matrix I: Orthogonal Matrices

Invertible Matrices

Overdetermined Matrices

Underdetermined Matrices

Vector / Matrix II: Non-orthogonal Matrices

Invertible Matrices

Non-Invertible Matrices

Closing Remarks

Comments