Appendix: Mathematical Miscellany

1. Vector Products

The dot and cross products are often introduced via trigonometric functions and/or matrix operations, but they also arise quite naturally from simple considerations of Pythagoras' theorem. Given two points a and b in the three-dimensional vector space with Cartesian coordinates (a_x,a_y,a_z) and (b_x,b_y,b_z) respectively, the squared distance between these two points is

S² = (a_x - b_x)² + (a_y - b_y)² + (a_z - b_z)²

If (and only if) these two vectors are perpendicular, the distance between them is the hypotenuse of a right triangle with edge lengths equal to the lengths of the two vectors, so we have

S² = (a_x² + a_y² + a_z²) + (b_x² + b_y² + b_z²)

if and only if a and b are perpendicular. Equating these two expressions and canceling terms, we arrive at the necessary and sufficient condition for a and b to be perpendicular

a_xb_x + a_yb_y + a_zb_z = 0

This motivates the definition of the left hand quantity as the "dot product" (also called the scalar product) of the arbitrary vectors a = (a_x,a_y,a_z) and b = (b_x,b_y,b_z) as the scalar quantity

a × b = a_xb_x + a_yb_y + a_zb_z

At the other extreme, suppose we seek an indicator of whether or not the vectors a and b are parallel. In any case we know the squared length of the vector sum of these two vectors is

S² = (a_x + b_x)² + (a_y + b_y)² + (a_z + b_z)²

We also know that S = |a| + |b| if and only if a and b are parallel, in which case we have

S² = |a|² + 2|a||b| + |b|² = (a_x² + a_y² + a_z²) + (b_x² + b_y² + b_z²) + 2|a||b|

Equating these two expressions for S², canceling terms, and squaring both sides gives the necessary and sufficient condition for a and b to be parallel

(a_xb_x + a_yb_y + a_zb_z)² = (a_x² + a_y² + a_z²)(b_x² + b_y² + b_z²)

Expanding these expressions and canceling terms, this becomes

2a_xb_xa_yb_y + 2a_xb_xa_zb_z + 2a_yb_ya_zb_z = (a_xb_y)² + (a_xb_z)² + (a_yb_x)² + (a_yb_z)² + (a_zb_x)² + (a_zb_y)²

Notice that we can gather terms and re-write this equality as

(a_xb_y - a_yb_x)² + (a_zb_x - a_xb_z)² + (a_yb_z - a_zb_y)² = 0

Obviously a sum of squares can equal zero only if each term is individually zero, which of course was to be expected, because two vectors are parallel if and only if their components are in the same proportions to each other, i.e.,

which represents the vanishing of the three terms in the previous expression. This motivates the definition of the cross product (also known as the vector product) of two vectors a = (a_x,a_y,a_z) and b = (b_x,b_y,b_z) as consisting of those three components, ordered symmetrically, so that each component is defined in terms of the other two components of the arguments, as follows

a ´ b = [(a_yb_z - a_zb_y), (a_zb_x - a_xb_z), (a_xb_y - a_yb_x)]

By construction, this vector is null if and only if a and b are parallel. Furthermore, notice that the dot products of this cross product and each of the vectors a and b are identically zero, i.e.,

a × (a ´ b) = a_x(a_yb_z - a_zb_y) + a_y(a_zb_x - a_xb_z) + a_z(a_xb_y - a_yb_x) = 0

b × (a ´ b) = b_x(a_yb_z - a_zb_y) + b_y(a_zb_x - a_xb_z) + b_z(a_xb_y - a_yb_x) = 0

As we saw previously, the dot product of two vectors is 0 if and only if the vectors are perpendicular, so this shows that a ´ b is perpendicular to both a and b. There is, however, an arbitrary choice of sign, which is conventionally resolved by the "right-hand rule". It can be shown that if q is the angle between a and b, then a´ b is a vector with magnitude |a||b|sin(q) and direction perpendicular to both a and b, according to the right-hand rule. Similarly the scalar a× b equals |a||b|cos(q).

2. Differentials

In Chapter 5.2 we gave an intuitive description of differentials such as dx and dy as incremental quantities, but strictly speaking the actual values of differentials are arbitrary, because only the ratios between them are significant. Differentials for functions of multiple variables are just a generalization of the usual definitions for functions of a single variable. For example, if we have z = f(x) then the differentials dz and dx are defined as arbitrary quantities whose ratio equals the derivative of f(x) with respect to x. Consequently we have dz/dx = f '(x) where f '(x) signifies the partial derivative śz/śx, so we can express this in the form

In this case the partial derivative is identical to the total derivative, because this f is entirely a function of the single variable x.

If, now, we consider a differentiable function z = f(x,y) with two independent variables, we can expand this into a power series consisting of a sum of (perhaps infinitely many) terms of the form Ax^myⁿ. Since x and y are independent variables we can suppose they are each functions of a parameter t, so we can differentiate the power series term-by-term, with respect to t, and each term will contribute a quantity of the form

where, again, the differentials dx,dy,dz,dt are arbitrary variables whose ratios only are constrained by this relation. The coefficient of dy/dt is the partial derivative of Ax^myⁿ, with respect to y, and the coefficient of dx/dt is the partial with respect to x, and this will apply to every term of the series. So we can multiply through by dt to arrive at the result

The same approach can be applied to functions of arbitrarily many independent variables.

A simple application of total differentials occurs in Section 3 of Einstein's 1905 paper "On the Electrodynamics of Moving Bodies". In the process of deriving the function t(x',y,z,t) as part of the Lorentz transformation, Einstein arives at his equation 3.1

where I've replaced his "t" with t₀ to emphasize that this is just the arbitrary value of t at the origin of the light pulse. At this point Einstein says "Hence, if x' be chosen infintesimally small," and then he writes his equation 3.2

Various explications of this step have appeared in the literature. For example, Miller says "Einstein took x' to be infintesimal and expanded both sides of [3.1] into a series in x'. Neglecting terms higher than first order the result is [3.2]." To put this differently, Einstein simply evaluated the total differentials of both sides of the equation. For any arbitrary continuous function t(x',y,z,t) we have

Since the arguments of the first t function on the left hand side of 3.1 are all constants, we have dx' = dy = dz = dt = 0, so it contributes nothing to the total differential of the left hand side. The arguments of the second t function on the left are all constants except for the t argument, which equals

so we have

It follows that the total differential of the second t function is

Likewise the total differential of the t function on the right hand side of 3.1 is

So, equating the total differentials of the two sides of 3.1 gives

and dividing out the factor of dx' gives Einstein's equation 3.2.

3. Differential Operators

The standard differential operators are commonly expressed as formal "vector products" involving the Ñ ("del") symbol, which is defined as

where u_x, u_y, u_z are again unit vectors in the x,y,z directions. The scalar product of Ñ with an arbitrary vector field V is called the divergence of V, and is written explicitly as

The vector product of Ñ with an arbitrary vector field V is called the curl, given explicitly by

Note that the curl is applied to a vector field and returns a vector, whereas the divergence is applied to a vector field but returns a scalar. For completeness, we note that a scalar field Q(x,y,z) can be simply multiplied by the Ñ operator to give a vector, called the gradient, as follows

Another common expression is the sum of the second derivatives of a scalar field with respect to the three directions, since this sum appears in the Laplace and Poisson equations. Using the "del" operator this can be expressed as the divergence of the gradient (or the "div grad") of the scalar field, as shown below.

For convenience, this operation is often written as Ñ ², and is called the Laplacian operator. All the above operators apply to 3-vectors, but when dealing with 4-vectors in Minkowski spacetime the analog of the Laplacian operator is the d'Alembertian operator

4. Differentiation of Vectors and Tensors

The easiest way to understand the motivation for the definitions of absolute and covariant differentiation is to begin by considering the derivative of a vector field in three-dimensional Euclidean space. Such a vector can be expressed in either contravariant or covariant form as a linear combination of, respectively, the basis vectors u₁, u₂, u₃ or the dual basis vectors u¹,u²,u³ , as follows

A = Aⁱ u_i = A_i uⁱ

where A_i are the contravariant components and Ai are the covariant components of A, and the two sets of basis vectors satisfy the relations

u_i × u^j = d _i^j u_i × u_j = g_ij uⁱ × u^j = g^ij

where g_ij and g_ij are the covariant and contravariant metric tensors. The differential of A can be found by applying the chain rule to either of the two forms, as follows

(1a)

(1b)

If the basis vectors u_i and uⁱ have a constant direction relative to a fixed Cartesian frame, then du_i = duⁱ = 0, so the second term on the right vanishes, and we are left with the familiar differential of a vector as the differential of its components. However, if the basis vectors vary from place to place, the second term on the right is non-zero, so we must not neglect this term if we are to allow curvilinear coordinates.

As we saw in Part 2 of this Appendix, for any quantity Q = f(x) and coordinate xⁱ we have

so we can substitute for the three differentials in (1) and cancel dxⁱ to give a relation between partial derivatives

(2a)

(2b)

If we now let Aⁱ_j and A_ij denote the projections of the ith components of (2a) and (2b) respectively onto the jth basis vector, we have

and it can be verified that these are the components of second-order tensors of the types indicated by their indices (superscripts being contravariant indices and subscripts being covariant indices). If we multiply through (using the dot product) each term of (2a) by uⁱ, and each term of (2b) by u_i, and recall that uⁱu_j = d ⁱ_j, we have

(3a)

(3b)

For convenience we now define the three-index symbol

which is called the Christoffel symbol of the second kind. Although the Christoffel symbol is not a tensor, it is very useful for expressing results on a metrical manifold with a given system of coordinates. We also note that since the components of u_i×u^j are constants (either 0 or 1), it follows that ś (u_i×u^j)/ś x^k = 0, and expanding this partial derivative by the chain rule we find that

Therefore, equations (3) can be written in terms of the Christoffel symbol as

(4a)

(4b)

These are the covariant derivatives of, respectively, the contravariant and covariant forms of the vector A. Obviously if the basis vectors are constant (as in Cartesian or oblique coordinate systems) the Christoffel symbols vanish, and we are left with just the first terms on the right sides of these equations. The second terms are needed only to account for the change in basis with position of general curvilinear coordinates.

It might seem that these definitions of covariant differentiation depend on the fact that we worked in a fixed Euclidean space, which enabled us to assign absolute meaning to the components of the basis vectors in terms of an underlying Cartesian coordinate system. However, it can be shown that the Christoffel symbols we've used here are the same as the ones defined in Section 5.4 in the derivation of the extremal (geodesic) paths on a curved manifold, wholly in terms of the intrinsic metric coefficients g_ij and their partial derivatives with respect to the general coordinates on the manifold. This should not be surprising, considering that the definition of the Christoffel symbols given above was in terms of the basis vectors u_j and their derivatives with respect to the general coordinates, and noting that the metric tensor is just g_ij = u_i×u_j. Thus, with a bit of algebra we can show that

(5)

in agreement with Section 5.4. We regard equations (4) as the appropriate generalization of differentiation on an arbitrary Riemannian manifold essentially by formal analogy with the flat manifold case, by the fact that applying this operation to a tensor yields another tensor, and perhaps most importantly by the fact that in conjunction with the developments of Section 5.4 we find that the extremal metrical path (i.e., the geodesic path) between two points is given by using this definition of "parallel transport" of a vector pointed in the direction of the path, so the geodesic paths are locally "straight".

Of course, when we allow curved manifolds, some new phenomena arise. On a flat manifold the metric components may vary from place to place, but we can still determine that the manifold is flat, by means of the Riemann curvature tensor described in Section 5.7. One consequence of flatness, obvious from the above derivation, is that if a vector is transported parallel to itself around a closed path, it assumes its original orientation when it returns to its original location. However, if the metric coefficients vary in such a way that the Riemann curvature tensor is non-zero, then in general a vector that has been transported parallel to itself around a closed loop will undergo a change in orientation. Indeed, Gauss showed that the amount of deflection experienced by a vector as a result of being parallel-transported around a closed loop is exactly proportional to the integral of the curvature over the enclosed region.

The above definition of covariant differentiation immediately generalizes to tensors of any order. In general, the covariant derivative of a mixed tensor T consists of the ordinary partial derivative of the tensor itself with respect to the coordinates x^k, plus a term involving a Christoffel symbol for each contravariant index of T, minus a term involving a Christoffel symbol for each covariant index of T. For example, if r is a contravariant index and s is a covariant index, we have

It's convenient to remember that each Christoffel symbol in this expression has the index of xk in one of its lower positions, and also that the relevant index from T is carried by the corresponding Christoffel symbol at the same level (upper or lower), and the remaining index of the Christoffel symbol is a dummy that matches with the relevant index position in T.

One very important result involving the covariant derivative is known as Ricci's Theorem. The covariant derivative of the metric tensor is g_ij is

If we substitute for the Christoffel symbols from equation (5), and recall that

g_ij g^ij = d ⁱ_j

we find that all the terms cancel out and we're left with g_ij,k = 0. Thus the covariant derivative of the metric tensor is identically zero, which is what prompted Einstein to identify it with the gravitational potential, whose divergence vanishes, as discussed in Section 5.8.

5. Notes on Curvature Derivations

Direct substitution of the principal q values into the curvature formula of Section 5.3 gives a somewhat complicated expression, and it may not be obvious that it reduces to the expression given in the text. Even some symbolic processors seem to be unable to accomplish the reduction. So, to verify the result, recall that we have

andq² - (2m)q - 1 = 0

where m = (c- a)/b. The roots of the quadratic in q are

and of course qq' = - 1. From the 2nd equation we have q² = 1 + 2mq, so we can substitute this into the curvature equation to give

Adding and subtracting c in the numerator, this can be written as

Now, our assertion in the text is that this quantity equals (a+c) + b. If we subtract 2c from both of these quantities and multuply through by 1 + mq, our assertion is

(a-c) + bq = (1+mq) [ (a-c) + b ]

Since q = m + the right hand term in the square brackets can be written as

bq - bm, so we claim that

(a-c) + bq = (1 + mq)[ (a-c) + bq - bm ]

Expanding the right hand side and cancelling terms and dividing by m gives

b = q(a - c) + bq(q - m) = q(a - c) + bq(m - q')

Now we multiply by the conjugate quantity q' to give

bq' = (c - a) - b (m - q')

The quantities bq' cancel, and we are left with m = (c - a)/b, which is the definition of m. Of course the same derivation applies to the other principle curvature if we swap q and q'.

Section 5.3 also states that the Gaussian curvature of the surface of a sphere of radius R is 1/R². To verify this, note that the surface of a sphere of radius R is described by x² + y² + z² = R², and we can consider a point at the South pole, tangent to a plane of constant z. Then we have

Taking the negative root (for the South Pole), factoring out R, and expanding the radical into a power series in the quantity (x² + y²) / R² gives

Without changing the shape of the surface, we can elevate the sphere so the South pole is just tangent to the xy plane at the origin by adding R to all the z values. Omitting all powers of x and y above the 2nd, this gives the quadratic equation of the surface at this point

Thus we have z = ax²+ bxy + cx² where

b = 0

from which we compute the curvature of the surface

as expected.

6. Odd Compositions

It's interesting to review the purely formal constraints on a velocity composition law (such as discussed in Section 1.9) to clarify what distinguishes the formulae that work from those that don't. Letting v₁₂, v₂₃, and v₁₃ denote the pairwise velocities (in geometric units) between three co-linear particles P₁, P₂, P₃, a composition formula relating these speeds can generally be expressed in the form

f(v₁₃) = f(v₁₂) + f(v₂₃)

where f is some function that transforms speeds into a domain where they are simply additive. It's clear that f must be an "odd" function, i.e., f(-x) = -f(x), to ensure that the same composition formula works for both positive and negative speeds. This rules out transforms such as f(x) = x², f(x) = cos(x), and all other "even" functions.

The general "odd" function expressed as a power series is a linear combination of odd powers, i.e.,

f(x) = c₁ x + c₃ x³ + c₅ x⁵ + c₇ x⁷ + ...

so we can express any such function in terms of the coefficients [c₁,c₃,...]. For example, if we take the coefficients [1,0,0,...] we have the simple transform f(x) = x, which gives the Galilean composition formula v₁₃ = v₁₂ + v₂₃. For another example, suppose we "weight" each term in inverse proportion to the exponent by using the coefficients [1, 1/3, 1/5, 1/7,...]. This gives the transform

f(x) = x + x³/3 + x⁵/5 + ... = atanh(x)

leading to Einstein's relativistic composition formula

atanh(v₁₃) = atanh(v₁₂) + atanh(v₂₃)

From the identity atanh(x) = ln[(1+x)/(1-x)]/2 we also have the equivalent multiplicative form

which is arguably the most natural form of the relativistic speed composition law. The velocity parameter p = (1+v)/(1-v) also gives very natural expressions for other observables as well, including the relativistic Doppler shift, which equals, and the spacetime interval between two inertial particles each one unit of proper time past their point of intersection, which equals p^1/4 - p^-1/4. Incidentally, to give an equilateral triangle in spacetime, this last equation shows that two particles must have a mutual speed of = 0.745...

Return to Table of Contents

ĐĄĐ°ĐšŃ ŃĐżŃĐ°Đ˛ĐťŃĐľŃŃŃ ŃĐ¸ŃŃĐľĐźĐžĐš uCoz