The Variance of Posterior Beliefs

Suppose a given coin has a fixed probability of EITHER b > 1/2 OR 
1 - b < 1/2 of coming up heads.   In other words, we know that 
we have one of two possible types of unfair coins, and we know the 
constant b, but we don't know whether we have a "b coin" or a "1-b
coin".  Let's say that prior probability of the coin being of the
"b" type is p.  We toss the coin several times, and with each result 
we update our probability of the coin being a "b type" or a "1-b 
type" coin.  After N tosses, what is the VARIANCE of our posterior 
beliefs?

As N increases the posterior probability will quickly approach 
either 1 or 0 and the variance will approach 0, especially if b 
is significantly different from 1/2.  Also, although there may 
be different interpretations of "the variance of our posterior 
beliefs", according to the way I would interpret it, there are 
really two distinct distributions of sequences of posterior results 
and variances, depending on whether heads actually has a probability 
of b or 1-b.

To illustrate, suppose b=3/4 and p=7/10, which means that initially 
we think there's a 70% chance that 'heads' has a probability of 3/4, 
and a 30% chance that 'heads' has a probability of 1/4.  Now we 
toss the coin, and the probability of heads on this first trial is 
given by 

    Pr{H} = Pr{b=3/4} Pr{H|(b=3/4)}  +  Pr{b=1/4} Pr{H|(b=1/4)}

           =  (7/10)(3/4) + (3/10)(1/4)   =   24/40  =  3/5

Suppose it comes up heads on this first trial.  What then is our
new confidence that heads has a probability of b=3/4?  Our prior 
sample space was

                        1-b=1/4   b=3/4
                         -----    -----
           1-p = 3/10      H        T

             p = 7/10      T        H

So if we get H on this trial we reduce the sample space to just
the H portions of this table, and in that space the probability
that Pr{H} equals b rises to 7/8.  On the other hand, if we 
happenned to get a T on this trial, our confidence that Pr{H} 
equals b drops to 7/16.

In general, given the prior value of p, the posterior value after
tossing the coin is given by the linear fractional transformation

                          b p
            p  -->  ----------------              (1)
                    (1-b) + (2b-1) p

if the result is a Head, and

                         (1-b) p
            p  -->   --------------               (2)
                      b + (1-2b) p

if the result is a Tail.  So each possible sequence of N results 
(such as HHTHTTHHHTH) will leave us with a certain posterior 
probability that can be computed by applying the two linear 
fractional transformations for H and T in the respective 
sequence.  

The distribution of those strings will be governed by the actual 
probability Pr{H}, so if Pr{H}=b the weight assigned to each 
string of length N containing exactly m Heads is b^m (1-b)^(N-m).
On the other hand, if Pr{H}=1-b then the weight of such a string
would be (1-b)^m b^(N-m).  So, the variance after N trials depends
on whether Pr{H} is b or 1-b.  This stands to reason, because if
our initial 'prior' is in the right direction, our posterior
results won't change very much, but if our initial prior is wrong,
our sequence of posterior estimates will have to migrate farther
and therefore will show a larger variance.

Let's suppose the effect on our "belief" of m Heads and N-m Tails 
is the same, regardless of the order in which they occur.  Then 
we can lump together all the C(N,m) strings containing exactly m 
Heads, and they each have a weight of  b^m (1-b)^(N-m)  (assuming 
Pr{H}=b).  We can represent the posterior probability after these 
N trials as the result of applying m Heads in a row followed by 
(N-m) tails in a row.  The effect of m consecutive Heads is given 
by iterating (1) m times, which has the closed form expression

                            b^m p
      H_m(p)  =  ---------------------------
                 [b^m - (1-b)^m] p + (1-b)^m

Similarly the effect of n=N-m consecutive Tails is given by 
iterating (2) n times, which can be written as

                        (1-b)^n p
      T_n(p)  =  -----------------------
                 [(1-b)^n - b^n] p + b^n

Composing these two (in either order) gives the total effect of
m Heads an n Tails

                                         b^m (1-b)^n p
T_n(H_m(p))=H_m(T_n(p)) = -------------------------------------------
                          [b^m (1-b)^n - b^n (1-b)^m] p + b^n (1-b)^m

If we define R=(1-b)/b, the final posterior probability "F" after n 
Heads and m Tails, beginning with an initial prior probability of p, 
can be expressed as
                                 p
               F  =   -------------------------               (3)
                      [1 - R^(m-n)] p + R^(m-n)

and there are C(m+n,m) ways of this happenning, each of which has
a probability of  b^m (1-b)^n  if  Pr{H}=b,  or  b^n (1-b)^m  
if  Pr{H}=a.

Notice from (3) that if m=n the final posterior probability is
simply p, meaning that if we get an equal number of Heads and 
Tails we will end up with the same "belief" as at the start.  Our
belief varies only to the extent that m differs from n.  We can
solve (3) for the difference k = (m-n) to give

                   ln(1-1/F)
         k    =    ---------  -  1
                     ln(R)

If the probability of Heads is actually b, then the density of F 
at any given k is proportional to  C(m+n,n) b^m (1-b)^n  for m,n 
such that m+n=N and m-n=k, so we have  m = (N+k)/2  and  n = (N-k)/2.  
Thus the density is

      d(k) = C(N,(N-k)/2) b^((N+k)/2) (1-b)^((N-k)/2)

On the other hand, if the probability of Heads is actually 1-b, then
the density is proportional to C(m+n,n) (1-b)^m b^n, leading to a
similar expression for d(k).  In either case we can substitute for 
k to give an expression for the density of F as a function of F.
For example, if Pr{H}=b we have

d(F) =


  /             N           \   N      N  ( 1 - ln(1-1/F)/ln(R) )/2
 (                           ) b  (1-b)  R
  \ (N-ln(1-1/F)/ln(R)-1)/2 /


and the analagous formula in case Pr{H}=1-b.  So, for any given values 
of N and b, this d(F) is proportional to the density of the posterior 
probabilities F.  As N increases, the valid values of F become more 
dense and d(F) approaches a continuous density function.

Return to MathPages Main Menu
Сайт управляется системой uCoz