Meeting Probabilities

Meeting Probabilities

Given n independent random variables, each evenly distributed over the interval 0 to 1, the probability that all n are within q of each other (for any q < 1) is

This can be derived in several different ways. For example, we can divide the unit interval into k equal segments, and note that the probability of n randomly selected points all falling within j consecutive segments corresponds approximately to the probability that all n points fall within q = (j/k) of each other. This correspondence becomes exact in the limit as k goes to infinity (holding q constant). Equation (1) can also be derived from a geometrical point of view. Given a unit "cube" in n dimensions, equation (1) represents the fraction of the cube's content ("volume") consisting of points with orthogonal coordinates [x₁, x₂, ..., x_n] such that |x_i - x_j| < q for all i,j. (See The Shape of Coincidence for more on the geometrical aspects of this equation, and its relation to the rhombic dodecahedron.)

One possible generalization of this is to allow different tolerances on the different events. For example, suppose each of n people are to arrive at a certain location at some randomly chosen time between 1:00 PM and 2:00 PM, and each person will wait a certain amount of time before leaving. Say, for example, with n = 2 people, one can wait for w₁ and the other can wait for w₂ (both expressed as fractions of the total time interval). What is the probability that they will meet? Geometrically it's easy to see that this is just given by the are of the shaded region in the unit square shown below:

The shaded area equals to total square area minus the two excluded triangles, so we have

More generally, for n people with waiting times w₁, w₂, .. , w_n, we just need to evaluate the integral of 1 over the following ranges of the arrival times t₁, t₂, .., t_n:

Obviously this becomes extremely complicated as n increases. To illustrate, consider the case n = 3. We need to divide the problem into 12 separate double integrals over t₁ and t₂ in order to specify fixed integration limits for each one. Letting w₁ denote the smallest waiting time, and w₃ the largest, these 12 regions are illustrated below as the regions denoted by A through L.

Within each of these regions we need to integrate the difference between the high and low integration limits for t₃. The integration limits and integrands are listed below

Evaluating all twelve of these integrals and simplifying gives the result

For example, with 3 people who can wait w₁ = 1/4, w₂ = 1/3, and w₃ = 1/2 (15 minutes, 20 minutes, and 30 minutes, respectively) the probability of all three of them being at the location simultaneously during the hour is exactly 1435/5184, which is about 0.27681.

Naturally both the formulas for n = 2 and n = 3 reduce to equation (1) when all the waiting times are equal. It's also worth noting that the formula for n = 3 is not symmetrical in the three waiting times, because the boundaries of the 3-dimensional solid representing the meeting region in phase space depend on a set of min and max selections, so the result depends on the ordering of the parameters, with w₁ the smallest and w₃ the longest waiting time.

As n increases the direct integration method for arbitrary and unequal waiting times rapidly becomes much more complicated, because the waiting times must be truncated whenever someone arrives with less than his waiting time remaining before the end of the interval, and a similar truncation occurs at the beginning of the hour. Geometrically, we're dealing with the volume of an extremely complex n-dimensional asymmetrical polyhedron whose volume can't generally be given by a simple formula, except in the highly symmetrical case when all the d values are equal, or when n is very small (like the cases of n = 2 and n = 3 given above).

To derive a more efficient method of determining the probabilities for larger values of n, we may begin by considering a similar questions whose answers can immediately be given by fairly simple formulas with arbitrary n. Suppose we eliminate the boundary truncation in the original problem by wrapping the 1-hour interval around in a circle of unit circumference, and stipulating that the jth person is present for a continuous span of exactly w_j on that interval, and the location of each w_j is uniformly and independently distributed around the circle. Now we can ask for the probability that n spans of length w₁, w₂, ...,w_n will have some common overlap. Let's also assume that the d values are all fairly small relative to the entire interval, so that w₁ + w₂ + ... + w_n is less than 1. (This allows us to avoid complications due to wrap-around.)

Consider first the discrete case, i.e., suppose the circumference of the circle is divided into N equal increments, and the n randomly placed spans have discrete lengths of m₁, m₂, .., m_n increments respectively. The total number of possible arrangements of the spans is Nⁿ, so we need to determine how many of those contain at least one increment of overlap between all the spans. Obviously each configuration with overlap can be translated into N different positions, so we really just need to find the number of distinct intrinsic overlap configurations of these n spans. If we fix one particular increment on the circle and require that all the spans contain that increment, then the number of arrangements that satisfy this requirement is obviously the product m₁m₂...m_n. However, this counts the intrinsic arrangements with two increments of overlap twice, because of translation. Likewise it counts the arrangements with three increments of overlap three times, and so on. Thus the product of lengths represents an overestimation of the number of intrinsic arrangements with overlap.

But suppose we fix two particular consecutive increments on the circle and require that all the spans contain both of them. The number of arrangements that satisfy this requirement is obviously the product (m₁ - 1)(m₂ - 1)...(m_n - 1). This counts the arrangements with two increments of overlap only once, and it counts the arrangements with three increments of overlap twice, and so on. Therefore, if we subtract this product from the previous product, the result counts the number of arrangements with any number of increments of overlap exactly once. Hence the number of such arrangements is

It follows that the probability of overlap for these n spans placed randomly around the unit circle is

For example, with n = 3 the result is

Obviously if we hold the ratios w₁ = m₁/N, w₂ = m₂/N, and w₃ = m₃/N constant and increase N, all the terms in the numerator of degree less than two drop out, and we're left with the result for the continuous case

In general for n spans of lengths w₁, w₂, .., w_n randomly distributed uniformly around a circle of unit circumference, with the condition that the sum w₁+ w₂+ ... + w_n is less than 1, the probability of some common overlap among all n of the spans is

Now let's return to our original problem, to find the probability of meeting between n people with distinct waiting times in a fixed interval. Again we can start with the discrete case, and then go over to the continuous case at the end. We divide the unit interval into N segments, and let the integer m_j for j = 1,2,..,n denote the waiting times (corresponding to Nw_j), arranged so that 1 Ł m₁ Ł m₂ Ł ... Ł m_n Ł N. We apply the same reasoning as in the case of the circle, except that now each fixed segment of overlap must be treated separately, because the segments near the starting time are truncated by the boundary. For example, the first waiting interval m₁ has only one position that includes the first segment, and it has only two positions that overlap with the second segment, and so on. Only when we reach segments more than m_j from the boundary are there m_j intersecting positions. In general, the number of positions of the kth waiting interval that intersect with the jth segment is min(j,m_k). Hence, instead of having N times the product of the m_k values, we must evaluate the sum over j from 1 to N of the products of the min(j,m_k) values. Of course, as in the case of the circle, this represents an over-estimate, because if counts each configuration multiple times. As before, this is corrected by subtracting the products of the quantities [min(j,m_k) - 1].

Consequently, the probability of meeting for n people with waiting times m₁, m₂, ..., m_n is given by

After expanding the products, the terms of degree n cancel, leaving only terms of degree n-1 and lower. Bringing 1/N^n-1 in from the leading factor, and letting wj and t denote the ratios m_j/N and j/N respectively, the resulting expression is purely a function of the w_j and the variable t, plus terms divided by some power of N. The latter vanish in the limit as N goes to infinity, so only the terms of degree n-1 remain. (These are comprised of the n products of n-1 elements of the set min(j,m_j), j = 1,2,..,n.) In the continuous limit, the summation over the index j can be evaluated as an integration over the variable t. Noting that dj = Ndt, the remaining power of 1/N in the leading factor is cancelled, and we have in the continuous limit

where s_n-1(x₁,x₂,..,x_n) denotes the sum of all products of n-1 arguments. Splitting up this integral into the ranges from 0 to w₁, from w₁ to w₂, and so on, we can eliminate the min functions and write this as a sum of elementary integrals. To simplify the expressions, it's convenient to define the following symbols for the kth and (k-1)th symmetric functions of the k smallest waiting times:

with p₀ = 1 and s₀ = 0. Also, let w₀ = 0 and w_n+1 = 1. In terms of these parameters we can write

Evaluating the integrals, this becomes

In view of the identity p_j+1 = p_j w_j+1, the argument of the first summation can be written as

Hence these terms constitute a telescoping sequence, with the net sum p_n. Therefore, we have the result

Notice that this formula reduces to equation (1) if all the waiting times are equal. Also, it's easily verified that this formula gives the results previously derived for n = 2 and n = 3. For another illustration, consider the case n = 4. Using this expression, the probability of all parties meeting is

Inserting the explicit expressions for the symmetric functions, and simplifying, this gives the formula

To give the explicit formula more directly, the summation in equation (2) can be made more explicit if we collect terms by w_j. This gives

Making use of the identity s_j+1 = w_j+1s_j + p_j, the summation splits into a sum over the p_j and a sum over the s_j as follows

The argument of the first summation when j = n-1 is simply w_np_n-1 = p_n, so this cancels with the leading p_n term. Also, since s₁ = p₀ = 1, we can bring the term involving w₁ into the first summation, shift the indices, and combine one power of w_j with p_j-1, to give the result

(For comparison, recall that the circular case gave simply P_n = s_n.) Each s_j represents j terms, so the total number of terms (i.e., individual products) in P_n is

However, even though the number of terms increases as the square of n, the computation of P_n can be completed in O(n) steps, because the s_j quantities can be computed recursively in O(n) steps.

Return to MathPages Main Menu

ĐˇĐ°ĐąŃ‚ ŃĐżŃ€Đ°Đ˛Đ»ŃŹĐµŃ‚ŃŃŹ ŃĐ¸ŃŃ‚ĐµĐĽĐľĐą uCoz