Integrating the Bell Curve (Asymptotic Series)

We frequently need to calculate the area under the small tail of
the Gaussian distribution G(x) when x is large.  Unfortunately, 
there is no exact "closed form" expression for the value of this 
integral over an arbitrary range.  There are, however, various 
techniques for evaluating this integral.  In particular, this is 
a classical example of how divergent series (or "asymptotic series") 
can be useful.

The Normal (Gaussian) density function G(x) is

                           exp{ (-t^2)/2 }
                  G(t)  =  ----------------
                             sqrt(2pi)

Writing down the Taylor's series expansion of G(t) and integrating term 
by term is one simple way of deriving an expression for the integral 
over a specific range.  For example

     x
    /                1      /     x^3      x^5      x^7         \
    |  G(t)dt  = --------- | x - ------ + ------ - ------ + ...  |
    /            sqrt(2pi)  \    3*2*1!   5*4*2!   7*8*3!       /
   t=0

The first ten terms of this series give the area under the curve from
t = 0 to 1 with a precision of ten significant digits.  However, to
achieve the same precision up to t=3 requires 25 terms, so this
approach is not very practial for determining areas under the normal
curve far from the mean.

One interesting approach (leading to the concept of asymptotic series) 
is to make use of the fact that, although G(t) cannot be integrated 
in closed form, there are functions that asymptotically approach G(t) 
in certain ranges and that DO have closed form integrals.  For example, 
consider the two functions

                 /     3  \                     /     15  \
    L(t) = G(t) | 1 - ---  |       U(t) = G(t) | 1 + ----  |       (1)
                 \    t^4 /                     \     t^6 /

As t becomes large, these functions obviously converge on G(t), with 
L(t) approaching from below and U(t) from above.  These functions have 
nice closed-form integrals, so they can be used to provide bounds on 
the integral of G(t).  Let A(x) denote the integral of G(t), i.e., the
area under the "tail" of the Normal curve from t=x to t=+oo.  The
integrals of L and U gives the following lower and upper bounds,
respectively:

        1     1           A(x)         1     1     3
       --- - ---    <    ------   <   --- - --- + ---
        x    x^3          G(x)         x    x^3   x^5

We can refine these further by noting that a characteristic of G(t) is 
that almost all the area from x to +oo is very close to x.  Therefore, 
if we multiply the lower bound by the ratio G(x)/L(x) and the upper 
bound by the ratio G(x)/U(x), we will just slightly overcompensate 
in both cases.  This gives the improved bounds

      / x^2 - 1 \           A(x)              / x^4 - x^2 + 3 \
   x | --------  |    >    ------     >    x | --------------- |
      \ x^4 - 3 /           G(x)              \   x^6 + 15    /

For example, the bounds on the "6-sigma" tail area given by these two
equations are 9.8680e-10 > A(6) > 9.8654e-10.

Returning to the upper and lower approximation functions given by
equations (1), notice that they are just special cases of a family
of exact integrals of functions the asymptotically approach G(x).
We have the following indefinite integrals

   /        /     1  \            G(t)
   |  G(t) ( 1 + ---  ) dt   =  - ---- ( 1 )
   /        \    t^2 /             t


   /        /    1*3 \            G(t)
   |  G(t) ( 1 - ---  ) dt   =  - ---- ( t^2 - 1 )
   /        \    t^4 /             t^3 


   /        /    1*3*5 \           G(t)
   |  G(t) ( 1 + -----  ) dt  =  - ---- ( t^4 - t^2 + 3 )
   /        \     t^6  /            t^5


   /        /    1*3*5*7 \            G(t)
   |  G(t) ( 1 - -------  ) dt   =  - ---- ( t^6 - t^4 + 3t^2 - 15 )
   /        \      t^8   /             t^7


Evaluating these from t=x to +inf gives an alternating sequence
of upper and lower bounds on the integrals of G(t) for t > x.
Note that the denominator on the left hand side in each successive
formula increases by a factor of t^2, but the numerator increases
factorially.  As a result, it's clear that for any fixed value
of t there is a limit to how much precision we can achieve by
going to higher orders.

For example, if we want to evaluate the tail of the normal distribution
above 3 (standard deviations), each successive denominator increases
by at least 9, so the optimum approximation we can achieve (in this
direct way) would be the next one after those shown above, for which
the numerator would be 1*3*5*7*9.  Subsequent approximations would
have the numerator multiplied by 11, 13, and so on, whereas the
denominator would still just be increased by a factor of 9 on each
step, so the error would get bigger rather than smaller.

This is a characteristic of asymptotic series: for a given argument
the error becomes smaller as terms are added, until reaching a
minimum, beyond which the error becomes larger.  Thus, the series
representation

   inf
   /                    / 1     1    1*3   1*3*5   1*3*5*7      \
   | G(t) dt  ~=  G(x) ( --- - --- + --- - ----- + ------- - ... )
   /                    \ x    x^3   x^5    x^7      x^9        /
  t=x

is actually divergent for all x, but the error is smaller than the
first neglected term (recalling that consecutive partial sums give
strict upper and lower bounds), so by selecting the appropriate number
of terms for a given x we can achieve good results.


A different approach, and one that is convergent for all x, is to
use the continued fraction, as listed in Abramowitz and Stegun's
"Handbook of Mathematical Functions":

   inf              -x^2
    /  -t^2        e              1
    | e     dt  =  -----   ------------------
    /                2                1/2
   t=x                       x + ----------------
                                          1
                                   x + ------------
                                               3/2
                                         x + -----------
                                                     2
                                               x + --------
                                                      etc.

This expansion was originally found by Laplace, and can also be 
written in the form


   inf                         -x^2/2
    /  -t^2/2                 e
    | e      dt    =     ------------------
    /                               1
   t=x                     x + ----------------
                                        2
                                 x + ------------
                                             3
                                       x + -----------
                                                   4
                                             x + --------
                                                    etc.


Return to MathPages Main Menu
Сайт управляется системой uCoz