Darts: An Introduction to Probability with calculus

Darts:
An Introduction to Probability with Calculus

We begin by throwing some (n) darts at a unit circle dartboard. The cross marks the places where the darts land.

We'll keep track of the random variable R , which is the distance the dart lands from the center.

Some Questions:

How many darts fell within a distance of 1/2 for the center?

What do you think the average value of R will be?

What if we do this repeatedly? What will the histogram of the averages look like?

What happens to the average values of each sampling when n and k are taken as very large numbers?

For now let's return to the simple experiment.

With the darts falling at random anywhere in the circle it should seem reasonable that:

The probability the dart falls into any particular region R inside the circle is proportional to ratio of the area A of that region to the area of the unit circle, i.e., A/p With a concentric circle of radius 1/2, the ratio of the area of the circle with radius 1/2 to the unit circle is p (1/4) / p = 1/4.

Sorry, this page requires a Java-compatible web browser.

We generalize and define the probability distribution function F for the random variable R by F(A) = probability that R <= A

In the case of the dart variable R,

	{	0 when A <= 0
F(A) =		A² when 0 < A < 1
		1 when A >= 1

The probability that the darts would fall in any particular band (called an annulus) formed by concentric circles is also easy from the areas.

The probability that A < R <= B is just F(B) - F(A).

With this analysis it should be clear that the probability that R = A is zero since the circle of radius A is a region in the plane with area zero.

This result can be interpreted as saying that the likelihood of the dart landing on a circle of a given radius is very small. And in an experiment, any specified number A from 0 to 1 is equally likely to occur as the value of R.

Yet the formula above also suggests that the probability that the value of R will lie between 1/8 and 1/4 is not as large as the probability that R will lie between 3/4 and 7/8.

This leads to the concepts of average probability density and point probability density.

The average probability density for an interval [A,B]is the ratio of the probability that R will fall in a certain interval [A,B] to the length of that interval, B-A. That is,

[F(B)-F(A)]/ [B-A].

The densities for the intervals [1/4,3/8] and [3/4,7/8] illustrate why larger values of R are more likely by measuring the average density of comparable length intervals that contain them.

For the interval [1/4,3/8] we have the density is

[9/64-4/64]/[3/8-1/4] = 5/8;

while for the intervals [3/4,7/8] the density is

[49/64-36/64]/[7/8-3/4] = 13/8

The point probability density of the random variable R at the point A, dF(A), is the limit as B->A of the average probability densities for intervals with endpoints A and B. So

dF(A) =	lim	F(B)-F(A)
dF(A) =	B -> A	B-A

= F'(A) = the derivative of the function F at A.

This is the key relation between the distribution function F and the probability density function of a random variable.

REMARKS on the DENSITY FUNCTION.

dF(A) >= 0 for all A provided it exists.
DENSITY AND THE FUNDAMENTAL THEOREM OF CALCULUS.

NOTE THAT

F(B) - F(A) = INTEGRAL ( dF(x) dx, x = A to x = B ).

So to find the probability that a random variable is between A and B we need only integrate its density from A to B.
Using the area interpretation of the definite integral, the probability that a random variable is between A and B we find the area of the region bounded by the X-axis, the graph of the density function for the random variable, and the lines X=A and X=B.

Density and darts. One way to estimate the definite integral of the density function over an interval [A,B] is to THROW DARTS at the entire region under the graph of the density function and above the X - axis for the range of the random variable. Then .. determine the proportion of darts that have fallen with first coordinate between A and B.

MEDIAN AND MODES of a random variable.

Let's return to our darts. One problem is to find a value for A so that the probability that R <= A is the same as the probability that R > A. It should seem reasonable that this probability is 1/2.
This number, A, is called the MEDIAN of R.
So...
we want F(A) = A²= 1/2. Not too hard for algebra...

A = sqrt(2)/2 is the MEDIAN of R.

Another problem is to find the point with the highest density. This number is called the MODE of the random variable this is the number where for any given small interval length the probability of R being in that interval is highest.

For the darts random variable R, the mode is the A with the largest value of 2A in the interval from 0 to1, that is , the mode of R is 1!

Finally , let's return to the question of the average or what is called the MEAN of the random variable.

First let's note that F(B)-F(A) is approximately dF(A) * (B-A)
or dF(B)*(B-A) or dF(M)*(B-A) where M = (A+B)/2 when B is close to A.

Let's cut the interval from 0 to 1 into n pieces each length 1/n. For example if N = 4. We might consider 16 darts thrown to determine roughly in which of the four sections the darts would fall.

Region	Use for R	Number of darts
0-1/4	1/8	1
1/4-1/2	3/8	3
1/2-3/4	5/8	5
3/4-1	7/8	7

This hypothetical experiment would give an average value for R of [(1/8)(1)+(3/8)(3)+(5/8)(5)+(7/8)(7)]/16
= (1/8)(1/16)+(3/8)(3/16)+(5/8)(5/16)+(7/8)(7/16)
=[(1/8)(F(1/4)-F(0))+(3/8)(F(1/2)-F(1/4))
+(5/8)(F(3/4)-F(1/2))+(7/8)(F(1)-F(3/4))
Which is approximately
(1/8)dF(1/8) (1/4)+(3/8)dF(3/8)(1/4)
+(5/8)dF(5/8) (1/4)+(7/8)dF(7/8)(1/4)

Now when N is large the length of these intervals will get
small and there won't be much variation of the value of R in that interval.
To estimate the average from the theory, it would seem sensible to

throw N²darts and estimate the number of darts that land in each interval based on the average probability density function for each interval;
use the midpoint of each interval to estimate the value that the dart would give for R in that interval;
add those numbers up and divide by N to find an estimate for the average.

For example when N = 4 we would find

estimated average
= (1/8)(1/16)+(3/8)(3/16)+(5/8)(5/16)+(7/8)(7/16)
= (1+9+25+49)/128
= 84/128
= 21/32
This would be an underestimate for the average. Can you see why?

THE MEAN MEETS THE RIEMANN INTEGRAL:

Now remember that F(A_k) - F(A _k-1) is approximately dF(M_k)*( 1/N) by our earlier remark. So we have then that
to estimate the average value of R theoretically we could consider S (M_k) *( F(A_k) - F(A _k-1))

= S (M_k) * dF(M_k)*( 1/N)
=S (M_k) * (2M_k)*( 1/N)
= S 2M_k² ( 1/N)

AHahhh! this last expression is precisely a Riemann Sum that estimates the definite integral

in(2t², t = 0 to 1) = 2/3.
So the mean of R must be 2/3.

AND...
in general the MEAN of the random variable R must be the integral of x*dF(x) over the interval for which dF(x) > 0.

Oh ... by the way, when N is large and k is large the histogram will have an appearance of a bell shaped curve which is the density function for a random variable sometimes called the GAUSS Normal random variable.

The distribution for that variable is the solution to the differential equation

dF(x) = k exp(-1/2*x²) with the conditions
that lim F(x) = 0 as x -> - infinity and
that lim F(x) = 1 as x -> infinity.

It is an interesting problem to find the value of k here, but that's another story which perhaps you'll find in some further course in calculus.....