Saturday, October 13, 2012

The Complex Numbers, Quaternions and Split-Complex Numbers

Recently it was asked on /r/math on reddit what the quaternions were and where they came from and I figured this was a great opportunity to write a blog post. I will briefly cover what complex numbers are and then jump into the inception of the quaternions. I will also have a brief discussion at the end on the split-complex (or hyperbolic) numbers since they are related to the complex numbers and so are related to the quaternions.

The complex numbers arose in the context of solving polynomials of degree two and higher, specifically when solving equations like $x^2+1=0$ or, when written in a slightly more tangible form, $x^2=-1$. (Note: historically, complex numbers really saw their inception in cubic equations by Cardano and others.) It is easy to see that no such real valued $x$ can solve this equation because we know that when $x\in\Bbb R$, $x^2\ge 0$. (This proof is fairly straightforward using basic axioms of real numbers so I will not do it here.) Despite this fact, mathematicians wanted some way to factorize - solve - the above equation. Enter the imaginary number $i$. It is clear from the above argument that $i$ is most definitely not a real number.

Undoubtedly you have seen this mathematical object in some way, shape or form. $i$ was defined to solve exactly such an equation, i.e. $i^2=-1$. Why that particular equation? Why not $ i^2 = -2 $? The answer is by analogy: we know that the solution to $ x^2 = 1 $ is $1$ and $-1$, which are fundamental units of the real numbers in a way. For the mathematically mature readers, $1$ is the unity for the ring of real numbers. Intuitively, this suggests that solving $ x^2 = -1 $ should give us an element for a new set of numbers that plays a similar role to $1$ for the real numbers in some way. Additionally, if we looked at a definition not of the from $ i^2 = -1 $, we would have to worry about factors and it would make the construct that much more taxing. This new set of numbers, deemed the "imaginary" numbers (I abhor this moniker, but it is what we are stuck with historically so I will use it), are numbers of the form $ ai$, where a is a real number, i.e. rescalings of the unit $ i $, just like with $1$ in the real number case.

There are some issues with the set of imaginary numbers; namely, it is not closed under multiplication, i.e. if you multiply two imaginary numbers you do not get an imaginary number. In fact, if you multiply two imaginary numbers together you get a real number. (Check it yourself!) If we wish to create a set that includes the imaginary numbers that is closed under multiplication, we must include the real numbers and doing so gives us the complex numbers. A complex number $ z $ is of the form $ z = a + bi $, where $ a, b \in \mathbb{R} $, and $ a $ is called the real part of $ z $ and $ b $ is called the imaginary part of $ z $. It turns out that the product of two complex numbers is a complex number and the sum of two complex numbers is a complex number.

In fact, the complex numbers make a field. Since one can multiply complex numbers, one can also divide them. This fact is the reason for why complex analysis is vastly different from calculus on $ \mathbb{R}^2 $ (I will make a brief post about this in the near future because it is worthwhile to discuss). Thus the complex numbers are very nice and it turns out that every polynomial equation has factors that are complex numbers. This is the fundamental theorem of algebra and it can be proved in a wealth of ways using complex analytic techniques. If this were the only use of complex numbers, it might not warrant an attempt at generalizing them but it is only one of the many uses of complex numbers, and, more generally, and complex analysis. Complex analysis is a very beautiful subject has a wealth of utility in mathematics and physics and therefore it is very natural is to look at how complex numbers can possibly be generalized to develop a more general analysis. The logic being that perhaps a more general structure will be as beautiful and useful, if not more so. This generalization is exactly the quaternions.

The quaternions sprung out of an idea that Sir William Rowan Hamilton had. He was one of the many people that worked on reformulating classical mechanics with the notion of least action; namely, he worked on translating Lagrangian mechanics into an energy-based formulation and this reformulation mechanics bears his name. I will not refer to the Cayley-Dickson construction when developing the quaternions because I feel like one loses the insight into their development. The Cayley-Dickson construction came after and is used to further generalize the quaternions to what is known as the octonions and sedenions.

Initially what Hamilton set out to do was to generalize the complex numbers to three dimensions. Going from two dimensions (the complex plane) to three dimensions seems completely reasonable. There are two ways to do this: introduce another real number (think $(1,1,i)$ in "vector" notation) or introduce another imaginary number (think $(1,i,j)$ in "vector" notation). The first is a bit underwhelming and it doesn't capture the true nature of what Hamilton wanted to do. What he really desired was to have two imaginary units $ i $ and $ j $ so that any number in this new vector space would be written as $ z = a + bi + cj $, where $ a, b, c \in \mathbb{R} $. Further, Hamilton wanted $ i $ and $ j $ to be completely independent quantities that satisfied $ i^2 = j^2 = -1 $. In order to have a proper generalization of complex numbers, we would like to be able to multiply two of these three-tuples. If we were to multiply two of these three-tuples we would have the following:

$$ z_1z_2 = (a_1 + b_1i + c_1j)(a_2 + b_2i + c_2j).$$

Assuming the distributive property and associativity of multiplication we get the following expression for $ z_1 z_2$:

$$ z_1z_2 = (a_1a_2 - b_1b_2 - c_1c_2) + (a_2b_1 + a_1b_2)i + (a_1c_2+a_2c_1)j + b_1c_2ij + c_1b_2ji. $$

The problem becomes how does one then define the product $ ij $ and likewise $ ji $? If we want this space to be closed, we need both $ ij $ and $ ji $ to either be a real number, $i$ or $j$ or some linear combination thereof. Let's assume that $ ij = a + bi + cj $ and see if there are any glaring issues with this.

Notice that I did not assume that $ij = ji$ above because the multiplication may not be commutative. Let's take the previous expression and multiply on the left by $i$ and see what we get. $ iij = -j = -b + ai + cij $. However we ended up with $ ij $ again so let us substitute our expression for $ ij $ into the previous expression to get $ -j = (ac - b) + (a + bc)i + cj $. Equating coefficients we have that $c = -1$, which forces $a+b = 0$ and $a-b = 0$ which implies that both $a$ and $b$ are $0$. This tells us that $ij = -j$. However if we multiply by $j$ on the right on both sides of the equation we have that $i = -1$ which cannot possibly be true. As I am sure you have guessed, trying to define the product $ji$ will also lead to contradictions. So we see that within this framework, multiplication of these three-tuples causes contradictions.

What then can be done to salvage this idea? Hamilton eventually came to the conclusion that a third - yes a third - imaginary number $k$ needs to be added to the set. The constraints on the imaginary numbers are thusly: $ i^2 = j^2 = k^2 = -1 $ and $ ijk = -1 $, with the requirement that $i$, $j$ and $k$ be independent imaginary numbers. We also require that multiplication be associative and that the distributive property holds. From this one can easily see that $ij = k$, $jk = i$ and $ki = j$. It can also be shown that the imaginary numbers anti-commute, i.e. $ij + ji = 0$ (and so on). With these definitions, we have established a space that is closed under addition and multiplication.

The notation $i$, $j$ and $k$ seems very suggestive and may remind you of vectors in $ \mathbb{R}^3 $. In fact, vector calculus sprung out of quaternions. Hamilton battled fiercely to keep quaternions relevant in mathematics but eventually vector notation would dominate. However, quaternions would subtlely reemerge in a way in 1928. If you notice above, we have the cyclic relation that $ij = k$, $jk = i$ and $ki = j$. These are actually the right hand rule for cross products and so cross products can be represented with quaternions.

If $ (x_1, y_1, z_1), (x_2, y_2, z_2) $ are vectors in $ \mathbb{R}^3 $, then their cross product is given by

$$ (y_1z_2 - y_2z_1, -x_1z_2 + x_2z_1, x_1y_2 - x_2y_1).$$

If we rewrite these as quaternions with $0$ real component, associate the first component with $i$, second with $j$ and third with $k$ and multiply them we have

$$(-x_1x_2 - y_1y_2 - z_1z_2) + (y_1z_2-y_2z_1)i + (-x_1z_2 + x_2z_1)j + (x_1y_2 - x_2y_1)k.$$

If the real part is ignored, the cross product formula is exactly recreated. The keen observer might also note that the dot product is embedded in the real part (with a $-$ sign thrown in). It then becomes clear that quaternions and vectors in $ \mathbb{R}^3$ are closely-related.

Quaternions may seem a bit silly and cumbersome and merely mathematical toys but they are very useful in doing rotations with graphics. Using regular matrices, gimbal lock arises but - for a reason unbeknownst to me - quaternions allow one to get around this issue with ease.

Quaternions also arise when one considers the Klein-Gordon equation in relativistic quantum mechanics. The Klein-Gordon equation was the first relativistic wave equation that was derived - in fact, Schrodinger first derived it but abandoned it for the equation that now bears his name - but it appeared to have some deep philosophical issues. Namely, it appeared to allow for negative probabilities. This presumed issue arose from the fact that there exists a second derivative with respect to time in the equation. Paul Dirac identified this "issue" and sought out a solution. His solution would be to "factor" the Klein-Gordon equation into a product of operators acting on what are now referred to as spinors (which is the constructive way to arrive at spin states in quantum mechanics). In doing so, he ended up with matrices that satisfied a very specific set of identities, namely the ones satisfied by the quaternions (up to a scale factor). His equation would come to be known as the Dirac equation. In this context we say that the Pauli matrices are isomorphic to the quaternions because they have the same structure with respect to multiplication after some manipulation.

The quaternions are a very rich structure which is somewhat unappreciated by mathematicians and physicists alike. However they receive more notice than the split-complex numbers. The split-complex (or hyperbolic) numbers come from considering the equation $x^2 = 1 $ and finding solutions to it. Of course $1$ and $-1$ are solutions but it is posited that there exists another solution $j$ such that $j^2 = 1$ and $j$ is neither $1$ or $-1$ ($j$ is also not complex in general because the only complex numbers you can square to get $1$ are $-1$ and $1$). The split-complex numbers can be used to represent Minkowski spacetime in $1+1$ dimensions. These can also be generalized further to include three hyperbolic numbers (called the split-quaternions) and can be used to describe Minkowski spacetime in $3+1$ dimensions - three spacial dimensions and one time dimension.

This ends the post. The brief digression into the hyperbolic numbers was merely to expose the reader to them because it is a very strange and cool idea, just like the complex numbers. While the complex numbers, quaternions, hyperbolic numbers and split-quaternions are rich objects in abstract algebra, they each find their way into physics as a means to describe the world around us. There is much more that could be said about them in the language of abstract algebra but it can become overly dense and dry for the casual reader and so I shall end the discussion here. My next post will be a shorter one - I hope - about the nature of calculus on $ \mathbb{R}^2 $ and why it is so different from calculus on $\mathbb{C} $.

4 comments:

  1. This is incredible, and you have no idea how much I appreciate you taking the time out to write this.

    ReplyDelete
    Replies
    1. I'm glad you enjoyed it. I actually learned a lot while writing this post because I had to convince myself first why one couldn't have only two complex numbers and why it was natural to introduce a third. I'm trying now to find topics that challenge me to think a bit and not just regurgitate information one can find in a textbook. In fact my next post will (hopefully) be one of this kind. I'll be looking at differentiable functions on what are known as division rings with bilinear forms and will tie into my post on complex differentiable and split-complex differentiable functions.

      Delete
  2. How did c squared = -1 become c = -1 when you computed -j from iij?

    ReplyDelete
    Replies
    1. Apologies for the delay as I did not get any email notifications on my end, that was a typo on my part. Good spot. The analysis is mostly unharmed, so I'll leave it.

      Delete