The Variance of Posterior Beliefs
Suppose a given coin has a fixed probability of EITHER b > 1/2 OR
1 - b < 1/2 of coming up heads. In other words, we know that
we have one of two possible types of unfair coins, and we know the
constant b, but we don't know whether we have a "b coin" or a "1-b
coin". Let's say that prior probability of the coin being of the
"b" type is p. We toss the coin several times, and with each result
we update our probability of the coin being a "b type" or a "1-b
type" coin. After N tosses, what is the VARIANCE of our posterior
beliefs?
As N increases the posterior probability will quickly approach
either 1 or 0 and the variance will approach 0, especially if b
is significantly different from 1/2. Also, although there may
be different interpretations of "the variance of our posterior
beliefs", according to the way I would interpret it, there are
really two distinct distributions of sequences of posterior results
and variances, depending on whether heads actually has a probability
of b or 1-b.
To illustrate, suppose b=3/4 and p=7/10, which means that initially
we think there's a 70% chance that 'heads' has a probability of 3/4,
and a 30% chance that 'heads' has a probability of 1/4. Now we
toss the coin, and the probability of heads on this first trial is
given by
Pr{H} = Pr{b=3/4} Pr{H|(b=3/4)} + Pr{b=1/4} Pr{H|(b=1/4)}
= (7/10)(3/4) + (3/10)(1/4) = 24/40 = 3/5
Suppose it comes up heads on this first trial. What then is our
new confidence that heads has a probability of b=3/4? Our prior
sample space was
1-b=1/4 b=3/4
----- -----
1-p = 3/10 H T
p = 7/10 T H
So if we get H on this trial we reduce the sample space to just
the H portions of this table, and in that space the probability
that Pr{H} equals b rises to 7/8. On the other hand, if we
happenned to get a T on this trial, our confidence that Pr{H}
equals b drops to 7/16.
In general, given the prior value of p, the posterior value after
tossing the coin is given by the linear fractional transformation
b p
p --> ---------------- (1)
(1-b) + (2b-1) p
if the result is a Head, and
(1-b) p
p --> -------------- (2)
b + (1-2b) p
if the result is a Tail. So each possible sequence of N results
(such as HHTHTTHHHTH) will leave us with a certain posterior
probability that can be computed by applying the two linear
fractional transformations for H and T in the respective
sequence.
The distribution of those strings will be governed by the actual
probability Pr{H}, so if Pr{H}=b the weight assigned to each
string of length N containing exactly m Heads is b^m (1-b)^(N-m).
On the other hand, if Pr{H}=1-b then the weight of such a string
would be (1-b)^m b^(N-m). So, the variance after N trials depends
on whether Pr{H} is b or 1-b. This stands to reason, because if
our initial 'prior' is in the right direction, our posterior
results won't change very much, but if our initial prior is wrong,
our sequence of posterior estimates will have to migrate farther
and therefore will show a larger variance.
Let's suppose the effect on our "belief" of m Heads and N-m Tails
is the same, regardless of the order in which they occur. Then
we can lump together all the C(N,m) strings containing exactly m
Heads, and they each have a weight of b^m (1-b)^(N-m) (assuming
Pr{H}=b). We can represent the posterior probability after these
N trials as the result of applying m Heads in a row followed by
(N-m) tails in a row. The effect of m consecutive Heads is given
by iterating (1) m times, which has the closed form expression
b^m p
H_m(p) = ---------------------------
[b^m - (1-b)^m] p + (1-b)^m
Similarly the effect of n=N-m consecutive Tails is given by
iterating (2) n times, which can be written as
(1-b)^n p
T_n(p) = -----------------------
[(1-b)^n - b^n] p + b^n
Composing these two (in either order) gives the total effect of
m Heads an n Tails
b^m (1-b)^n p
T_n(H_m(p))=H_m(T_n(p)) = -------------------------------------------
[b^m (1-b)^n - b^n (1-b)^m] p + b^n (1-b)^m
If we define R=(1-b)/b, the final posterior probability "F" after n
Heads and m Tails, beginning with an initial prior probability of p,
can be expressed as
p
F = ------------------------- (3)
[1 - R^(m-n)] p + R^(m-n)
and there are C(m+n,m) ways of this happenning, each of which has
a probability of b^m (1-b)^n if Pr{H}=b, or b^n (1-b)^m
if Pr{H}=a.
Notice from (3) that if m=n the final posterior probability is
simply p, meaning that if we get an equal number of Heads and
Tails we will end up with the same "belief" as at the start. Our
belief varies only to the extent that m differs from n. We can
solve (3) for the difference k = (m-n) to give
ln(1-1/F)
k = --------- - 1
ln(R)
If the probability of Heads is actually b, then the density of F
at any given k is proportional to C(m+n,n) b^m (1-b)^n for m,n
such that m+n=N and m-n=k, so we have m = (N+k)/2 and n = (N-k)/2.
Thus the density is
d(k) = C(N,(N-k)/2) b^((N+k)/2) (1-b)^((N-k)/2)
On the other hand, if the probability of Heads is actually 1-b, then
the density is proportional to C(m+n,n) (1-b)^m b^n, leading to a
similar expression for d(k). In either case we can substitute for
k to give an expression for the density of F as a function of F.
For example, if Pr{H}=b we have
d(F) =
/ N \ N N ( 1 - ln(1-1/F)/ln(R) )/2
( ) b (1-b) R
\ (N-ln(1-1/F)/ln(R)-1)/2 /
and the analagous formula in case Pr{H}=1-b. So, for any given values
of N and b, this d(F) is proportional to the density of the posterior
probabilities F. As N increases, the valid values of F become more
dense and d(F) approaches a continuous density function.
Return to MathPages Main Menu
Сайт управляется системой
uCoz