Tensors, Contravariant and Covariant

5.2 Tensors, Contravariant and Covariant

One of the most important relations involving continuous functions of multiple continuous variables (such as coordinates) is the formula for the total differential. In general if we are given a smooth continuous function y = f(x¹,x²,...,xⁿ) of n variables, the incremental change dy in the variable y resulting from incremental changes dx¹, dx², ..., dxⁿ in the variables x¹, x², ... ,xⁿ is given by

where � y/� xⁱ is the partial derivative of y with respect to xⁱ. (The superscripts on x are just indices, not exponents.) The scalar quantity dy is called the total differential of y. This formula just expresses the fact that the total incremental change in y equals the sum of the "sensitivities" of y to the independent variables multiplied by the respective incremental changes in those variables. (See the Appendix for a slightly more rigorous definition.)

If we define the vectors

then dy equals the scalar (dot) product of these two vectors, i.e., we have dy = g� d. Regarding the variables x¹, x²,..., xⁿ as coordinates on a manifold, the function y = f(x¹,x²,...,xⁿ) defines a scalar field on that manifold, g is the gradient of y (often denoted as Ñy), and d is the differential position of x (often denoted as dx), all evaluated about some nominal point [x¹,x²,...,xⁿ] on the manifold.

The gradient g = Ñy is an example of a covariant tensor, and the differential position d = dx is an example of a contravariant tensor. The difference between these two kinds of tensors is how they transform under a continuous change of coordinates. Suppose we have another system of smooth continuous coordinates X¹, X², ..., Xⁿ defined on the same manifold. Each of these new coordinates can be expressed (in the region around any particular point) as a function of the original coordinates, Xⁱ = Fⁱ(x¹, x², ..., xⁿ), so the total differentials of the new coordinates can be written as

Thus, letting D denote the vector [dX¹,dX²,...,dXⁿ] we see that the components of D are related to the components of d by the equation

This is the prototypical transformation rule for a contravariant tensor of the first order. On the other hand, the gradient vector g = Ñy is a covariant tensor, so it doesn't transform in accord with this rule. To find the correct transformation rule for the gradient (and for covariant tensors in general), note that if the system of functions Fⁱ is invertible (which it is if and only if the determinant of the Jacobian is non-zero), then the original coordinates can be expressed as some functions of these new coordinates, xⁱ = fⁱ(X¹, X², ..., Xⁿ) for i=1,2,..,n. This enables us to write the total differentials of the original coordinates as

If we now substitute these expressions for the total coordinate differentials into equation (1) and collect by differentials of the new coordinates, we get

Thus, the components of the gradient of g of y with respect to the Xⁱ coordinates are given by the quantities in parentheses. If we let G denote the gradient of y with respect to these new coordinates, we have

This is the prototypical transformation rule for covariant tensors of the first order. Comparing this with the contravariant rule given by (2), we see that they both define the transformed components as linear combinations of the original components, but in the contravariant case the coefficients are the partials of the new coordinates with respect to the old, whereas in the covariant case the coefficients are the partials of the old coefficients with respect to the new.

The key attribute of a tensor is that it's representations in different coordinate systems depend only on the relative orientations and scales of the coordinate axes at that point, not on the absolute values of the coordinates. This is why the absolute position vector pointing from the origin to a particular object in space is not a tensor, because the components of its representation depend on the absolute values of the coordinates. In contrast, the coordinate differentials transform based solely on local information.

So far we have discussed only first-order tensors, but we can define tensors of any order. One of the most important examples of a second-order tensor is the metric tensor. Recall that the generalized Pythagorean theorem enables us to express the squared differential distance ds along a path on the spacetime manifold to the corresponding differential components dt, dx, dy, dz as a general quadratic function of those differentials as follows

Naturally if we set g₀₀ = -g₁₁ = -g₂₂ = -g₃₃ = 1 and all the other g_ij coefficients to zero, this reduces to the Minkowski metric. However, a different choice of coordinate systems (or a different intrinsic geometry, which will be discussed in subsequent sections) requires the use of the full formula. To simplify the notation, it's customary to use the indexed variables x⁰, x¹, x², x³ in place of t, x, y, z respectively. This allows us to express the above metrical relation in abbreviated form as

To abbreviate the notation even more, we adopt Einstein's convention of omitting the summation symbols altogether and simply stipulating that summation from 0 to 3 is implied over any index that appears more than once in a given product. With this convention the above expression is written as

Notice that this formula expresses something about the intrinsic metrical relations of the space, but it does so in terms of a specific coordinate system. If we considered the metrical relations at the same point in terms of a different system of coordinates (such as changing from Cartesian to polar coordinates), the coefficients g_mn would be different. Fortunately there is a simple way of converting the g_mn from one system of coordinates to another, based on the fact that they describe a purely localistic relation among differential quantities. Suppose we are given the metrical coefficients g_mn for the coordinates x^a, and we are also given another system of coordinates y^a that are defined in terms of the x^a by some arbitrary continuous functions

Assuming the Jacobian of this transformation isn't zero, we know that it's invertible, and so we can just as well express the original coordinates as continuous functions (at this point) of the new coordinates

Now we can evaluate the total derivatives of the original coordinates in terms of the new coordinates. For example, dx⁰ can be written as

and similarly for the dx¹, dx², and dx³. The product of any two of these differentials, dx^m and dxⁿ, is of the form

(remembering the summation convention). Substituting these expressions for the products of x differentials in the metric formula (5) gives

The first three factors on the right hand side obviously represent the coefficient of dy^ady^b in the metric formula with respect to the y coordinates, so we've shown that the array of metric coefficients transforms from the x to the y coordinate system according to the equation

Notice that each component of the new metric array is a linear combination of the old metric components, and the coefficients are the partials of the old coordinates with respect to the new. Arrays that transform in this way are called covariant tensors.

On the other hand, if we define an array A^mn with the components (dx^m/ds)(dxⁿ/ds) where s denotes a path parameter along some particular curve in space, then equation (2) tells us that this array transforms according to the rule

This is very similar to the previous formula, except that the partial derivatives are of the new coordinates with respect to the old. Arrays whose components transform according to this rule are called contra-variant tensors.

When we speak of an array being transformed from one system of coordinates to another, it's clear that the array must have a definite meaning independent of the system of coordinates. We could, for example, have an array of scalar quantities, whose values are the same at a given point, regardless of the coordinate system. However, the components of the array might still be required to change for different systems. For example, suppose the temperature at the point (x,y,z) in a rectangular tank of water is given by the scalar field T(x,y,z), where x,y,z are Cartesian coordinates with origin at the geometric center of the tank. If we change our system of coordinates by moving the origin, say, to one of the corners of the tank, the function T(x,y,z) must change to T(x-x₀,y-y₀,z-z₀). But at a given physical point the value of T is unchanged.

Notice that g₂₀ is the coefficient of (dy)(dt), and g₀₂ is the coefficient of (dt)(dy), so without loss of generality we could combine them into the single term (g₂₀+g₀₂)(dt)(dy). Thus the individual values of g₂₀ and g₀₂ are arbitrary for a given metrical equation, since all that matters is the sum (g₂₀+g₀₂). For this reason we're free specify each of those coefficients as half the sum, which results in g₂₀ = g₀₂. The same obviously applies to all the other diagonally symmetric pairs, so for the sake of definiteness and simplicity we can set g_ab = g_ba. It's important to note, however, that this symmetry property doesn't apply to all tensors. In general we have no a priori knowledge of the symmetries (if any) of an arbitrary tensor.

Incidentally, when we refer to a vector (or, more generally, a tensor) as being either contravariant or covariant we're abusing the language slightly, because those terms really just signify two different conventions for interpreting the components of the object with respect to a given coordinate system, whereas the essential attributes of a vector (or tensor) are independent of the particular coordinate system in which we choose to express it. In general, any given vector or tensor can be expressed in both contravariant and covariant form with respect to any given coordinate system. For example, consider the vector that represents the position of point P relative to point O as shown below.

We should note that when dealing with a vector (or tensor) field on a manifold each element of the field exists entirely at a single point of the manifold, with a direction and a magnitude, rather than imagining each vector to actually extends from one point in the manifold to another. (For example, we might have a vector field describing the direction and speed of the wind at each point in a given volume of air.) However, for the purpose of illustrating the relation between contravariant and covariant components, we are focusing on simple displacement vectors in a flat metrical space.

Figure 1 shows an arbitrary coordinate system, with the axes X1 and X2, as well as the contravariant and covariant components of the position vector P with respect to these coordinates. As can be seen, the jth component of the "contravariant path" from O to P consists of a segment parallel to jth coordinate axis, whereas the jth component of the "covariant path" consists of a segment perpendicular to all the axes other than the jth. This is the essential distinction (up to scale factors) between the contravariant and covariant ways of expressing a vector or, more generally, a tensor.

By the way, it may seem that the naming convention is backwards, because the "contra" components go with the axes, whereas the "co" components go against the axes. Historically these names were given on the basis on the transformation laws that apply to these two different interpretations. In any case, we're stuck with the names.

It's also worth noting that if our coordinate system is "orthogonal" (meaning that the coordinate axes are mutually perpendicular), then the contravariant and covariant interpretations are identical (up to scale factors). This can be seen by imagining that we make the angle w in Figure 1 equal to 90 degrees. Since we almost always use orthogonal coordinates, we can say that we're essentially using both contravariant and covariant coordinates all the time, because in such a context the only difference between them (at any given point) is scale factors. Bear in mind that "orthogonal" doesn't necessarily imply "rectilinear". For example, polar coordinates are not rectilinear, i.e., the axes are not straight lines, but they are orthogonal, because as we vary the angle q we are always moving perpendicular to the local radial axis. Thus the metric of a polar coordinate system is diagonal, just as is the metric of a cartesian coordinate system, and so the contravariant and covariant forms at any given point differ only by scale factors (although these scale factor may vary as a function of position). Only when we consider systems of coordinates that are not mutually perpendicular do the contravariant and covariant forms differ (at a given point) by more than just scale factors.

It's worthwhile to examine this in some detail, to understand how the representations of vectors in different coordinate systems are related to each other. Consider a vector x^u in a flat 2-dimensional space. This vector is defined by the components x¹ and x² based on an orthogonal Cartesian coordinate system as shown below

Figure 2

How would we determine the direction and length of the vector? The answer may seem obvious, because we've been given the vector's components, but those components can be interpreted in different ways. The most common approach is to begin at the origin of the coordinate system and move x¹ units in the direction parallel to the X1 axis, and then x² units in the direction parallel to the X2 axis. Assuming we know the "units" in which the given components are expressed, this gives us a unique definition of the vector.

However, there are other possible interpretations. For example, we could begin at the origin and move x¹ units in the direction perpendicular to the X2 axis, and then x² units in the direction perpendicular to the X1 axis. When applied to an orthogonal coordinate system such as the one shown in Figure 2 this interpretation is obviously equivalent to the previous one, because the X1 axis is perpendicular to the X2 axis, and vice versa. But suppose we were using a "skewed" coordinate system such as the one shown below:

Figure 3

where "w" is the angle between the two (positive) axes. In this case there's a difference between the directions "parallel to the X1 axis" and "perpendicular to the X2 axis", so our two interpretations lead to two different vectors. Which one is "correct"? The answer is clearly a matter of convention, but it turns out that both of these interpretations can be very useful. The first leads to the contravariant form of a vector, while the second leads to the covariant form. It's interesting to consider the relationships between these two methods of defining a vector (or, more generally, a tensor).

Notice that applying the second interpretation to the skewed coordinate system of Figure 3 is equivalent to applying the first interpretation to the alternate coordinate system shown below

Figure 4

where this X2 axis is perpendicular to the previous X1 axis, and this X1 axis is perpendicular to the previous X2 axis. The angle between the positive X1 and X2 axes in Figure 4 is w', where w + w' = p . Conversely, applying the second interpretation to the system of Figure 4 is equivalent to applying the first interpretation to the system of Figure 3. For this reason the two coordinate systems in Figures 3 and 4 are called "duals" of each other.

In each of the three coordinate systems shown above, how would we compute the length of a vector given its components? Obviously the length "s" of a vector in the coordinate system of Figure 1 is given by the ordinary Pythagorean theorem

s² = (x¹)² + (x²)²

On the other hand, the length of a vector in terms of the coordinates in Figure 3 is given by

s² = (x¹)² + 2cos(w) x¹x² + (x²)²

and the length of a vector in terms of the coordinates in Figure 4 is

s² = (x¹)² + 2cos(w') x¹x² + (x²)² = (x¹)² - 2cos(w) x¹x² + (x²)²

In general the squared length of an arbitrary vector X on a (flat) 2-dimensional surface can be expressed in the form

s² = g₁₁ x¹x¹ + g₁₂ x¹x² + g₂₁ x²x¹ + g₂₂ x²x²

where the coefficients g_uv are the components of the "metric tensor". This tensor is always symmetrical, meaning that g_uv = g_uv, so there are really only three independent elements. With Einstein's summation convention we can express the preceding equation more succinctly as

s² = g_uv x^u x^v

From the preceding formulas we can see that the metric tensor for the orthogonal coordinate system in Figure 2 is just

whereas for the skewed coordinates of Figure 3 the metric is

and for the coordinates of Figure 4 we have

The determinants g, g', and g" of the above three matrices are 1, sin(w)², and sin(w)² respectively. We will find that the inverses of the metric tensors are also very useful, so let's use the superscripted symbol g^uv to denote the inverse of a given g_uv. The inverse metric tensors for the coordinate systems of Figures 2, 3, and 4, respectively are

Now let's consider a vector x^u whose components relative to the orthogonal axes of Figure 2 are x¹, x². If we let x'^u denote this same vector expressed in terms of the skewed coordinates of Figure 3, it's not hard to show that

Also, letting x"^u denote the same vector expressed in terms of the coordinate system of Figure 4, we have

Now lets create a new vector called x_v by "multiplying" the vector x^u by the corresponding metric tensor

x_v = g_uv x^u

Remember that summation is implied over the repeated index u, whereas the index v appears only once (in any given product) so this expression applies for any value of v. Thus it represents the two equations

If we carry out this multiplication we find

so the components of x_u are identical to the components of x^u for the orthogonal coordinate system of Figure 2. However, if we perform this same operation based on the skewed coordinates of Figure 3 we have

x'_v = g'_uv x'^u

which gives

Notice that these components are exactly proportional to the components of x"^u, and the scale factor is . Similarly, if we construct the vector x"_v by the operation

x"_v = g"_uv x"^u

we have

which are exactly proportional to the components of x'^u, again with a scale factor of . Clearly the coordinate systems of Figures 3 and 4 have a reciprocal relationship, which is why they are called "duals" of each other. Of course, the orthogonal system of Figure 1 is self-dual, with a scale factor of = 1.

The vectors with superscripted indices, x^u, x'^u, and x"^u , are called contravariant vectors, meaning they are defined by the "first interpretation" based on the coordinate systems of Figures 2, 3, and 4 respectively. The entities with subscripted indices, x_u, x'_u, and x"_u are the covariant versions of those three contravariant vectors. Notice that we can convert from the contravariant to the covariant versions of a given vector simply by multiplying by the covariant metric tensor, and we can convert back simply by multiplying by the inverse of the metric tensor. These operations are called "raising and lowering of indices", because they convert x from a superscripted to a subscripted variable, or vice versa.

We've seen that the coordinate systems in Figures 2 and 3 have a reciprocal relationship, i.e.,

which means that the covariant version of a given vector is just a scaled version of that vector's contravariant representation based on the dual coordinate system. In the special case where the determinant of the metric tensor is 1, the scale factor drops out and we can say that the contravariant and covariant versions of a vector are really both just ordinary contravariant representations of the same vector based on mutually dual coordinate systems.

We've also seen that the (squared) length of a vector is given by

s² = g_uv x^u x^v

Notice that we essentially "lower" the indices on both components to produce the scalar magnitude. Not surprisingly, we can also express the length of the same vector by "raising" the indices of its covariant components, making use of the inverse metric tensor

as follows

s² = g^uv x_u x_v

Of course, in view of the fact that x_u = g_uv x^u it follows that we also have

s² = x_u x^u

Many other useful relations can be expressed in this way. For example, the angle q between two vectors a and b is given by

cos(q ) = g_uv a^u b^v

These techniques immediately generalize to any number of dimensions, and to tensors with any number of indices, including "mixed tensors" that have some contravariant and some covariant indices. In addition, we need not restrict ourselves to flat spaces or coordinate systems whose metrics are constant (as in the above examples). Of course, if the metric is variable then we can no longer express finite interval lengths in terms of finite component differences. However, the above distance formulas still apply, provided we express them in differential form, i.e., the incremental distance ds along a path is related to the incremental components dx^j according to

(ds)² = g_uv dx^u dx^v

so we need to integrate this over a given path to determine the length of the path. These are exactly the formulas used in 4-dimensional spacetime to determine the spatial and temporal "distances" between events in general relativity.

As noted above, we can raise and lower indices to create mixed tensors, i.e., tensors that are contravariant in some of their indices and covariant in others. Also, for any given index we could generalize the idea of contravariance and covariance to include mixtures of these two qualities in a single index. This is not ordinarily done, but it is possible. Recall that the contravariant components are measured parallel to the coordinate axes, and the covariant components are measured normal to all the other axes. These are the two extreme cases, but we could define components with respect to directions that make a fixed angle relative to the coordinate axes and normals. The transformation rule for such representations is more complicated than either (6) or (8), but each component can be resolved into sub-components that are either purely contravariant or purely covariant, so these two extreme cases suffice to express all transformation characteristics of tensors.

Return to Table of Contents

Сайт управляется системой uCoz