Darts and Averages:
An Introduction to the The Fundamental Theorem of Calculus through Probability
© 2005 M. Flashman

We begin by throwing some darts at a unit circle dartboard.

We'll keep track of a random variable R , which measures the distance the dart lands from the center.


The Key Question:     What do you think the average value of R will be?



For now let's return to the simple experiment.

With the darts falling at random anywhere in the circle it should seem reasonable that:

The probability the dart falls into any particular region R inside the circle is proportional to ratio of the area A of that region to the area of the unit circle, i.e., `A/pi` For example, with a concentric circle of radius 1/2, the ratio of the area of the circle with radius 1/2 to the unit circle is   `{pi/4} / pi = 1/4`.
Sorry, this page requires a Java-compatible web browser.


We generalize and define the probability distribution function F for the random variable R by

F(A) = probability that  R `<=` A

In the case of the dart variable R,


{
0   when A `<=` 0
F(A) =
A2 when 0< A <1

1    when A `>=` 1

The probability that a dart  falls in any particular band (called an annulus) formed by concentric circles is also easy to calculate from the areas.

The probability that `A < R <= B` is just `F(B) - F(A)`.

With this analysis it should be clear that the probability that R = A is zero since the circle of radius A is a region in the plane with area zero.

This result can be interpreted as saying that the likelihood of the dart landing on a circle of a given radius is very small. And in an experiment, any specified number A from 0 to 1 is equally likely to occur as the value of R.


Yet the formula above also suggests that the probability that the value of R will lie between `1/8` and `1/4` is not as large as the probability that R will lie between `3/4` and `7/8`.

This leads to the concepts of average probability density and point probability density.


The average probability density for an interval [A,B] is the ratio of the probability that R will fall in a certain interval [A,B] to the length of that interval, B-A. That is,

`barF(A,B) = {F(B)-F(A)}/ {B-A}`.

The densities for the intervals `[1/4,3/8]` and  `[3/4,7/8]` illustrate why larger values of R are more likely by measuring the average density of comparable length intervals that contain them.

For the interval  `[1/4,3/8]` we have the average density is

`barF(1/4,3/8) = {9/64 - 4/64}/{3/8 - 1/4} = 5/8`;

while for the intervals `[3/4,7/8]` the average density is

`barF(3/4,7/8) = {49/64 - 36/64}/{7/8 - 3/4} = 13/8`

The point probability density of the random variable R at the point A, dF(A), is the limit as `B rarr A` of the average probability densities for intervals with endpoints A and B. So 

dF(A) =
`lim`
`{F(B) - F(A)}/{B - A}`
`B rarr A`
= F '(A) = the derivative of the function F at A.

Thus in the case of the darts,
dF(A)= 2A  for `0<=A<=1`,
and dF(A) = 0 for all other A.


This is the key relation between the distribution function F and the probability density function of a random variable.

REMARKS on the DENSITY FUNCTION.

  1. `dF(A) >= 0` for all A provided it exists.


  2. DENSITY AND NET CHANGE:
  If G is any function with `G'(A) = dF(A)` for all A,
then
the probability that `A < R <= B` is just `G(B) - G(A) = F(B) - F(A)`.

 So to find the probability that a random variable is between A and B we need only find the net change from A to B in any function that has the density function as its derivative.


Let's return to the key question of finding the average or what is called the MEAN of the random variable:


First let's note that
when `B~~ A,  F(B) - F(A)~~dF(A)*(B-A)`. This is just the "differential estimate" applied to the distribution function F at A.

Let's cut the interval from 0 to 1 into N pieces of equal length.  For example if N = 5 we would have the intervals with length 1/5. 

Now when N is large the length of these intervals will get small and there won't be much variation of the value of R in that interval.

 

To estimate the average from the theory, it would seem sensible to choose one number to represent the numbers from each interval - call it "`r_k` ". Now estimate the probability that a dart would fall in that interval, call that "`p_k`". Then, multiply the representative number by that probability and finally add those numbers up to find an estimate for the average, i.e.

Average value of R ` ~~ sum_{k=1}^{k=N} r_k*p_k`.   

 

If we choose the left hand endpoint of each interval we would find an underestimate:
Can you see why?

For example when N = 5 we would find 

underest `= r_0*p_0 + r_1*p_1 + r_2*p_2 + r_3*p_3 + r_4*p_4 `

             `= 0 +  1/5*3/25 + 2/5*5/25 + 3/5*7/25 + 4/5 * 9/25 `
             `=  {3+10 + 21+36}/125 = 70/125 = 14/25 = .56`

 THE MEAN MEETS THE Euler Sums and estimates for Net Change:

 

Now remember that `p_k =  F(A_k) - F(A_{k-1})  ~~  dF(A_{k-1}) *1/N`.
So
to estimate the average value of R theoretically we could consider use `r_k = A_{k-1}`

`sum_{k=1}^{k=N} r_k*p_k  ~~  sum_{k=1}^{k=N} A_{k-1} *dF(A_{k-1})*1/N`.



AHahhh! this last expression is precisely an Euler Sum that estimates the net change from 0 to 1 in a function S where

`S'(x) = x * dF(x) = x * 2x  = 2 x^2` 


Thus the MEAN of the random variable R must be `S(1) - S(0)` where

   `S(x) =   2{x^3}/3`.
and so the MEAN of R is  `2/3` !


Final Comments:  For any  random  variable, X, where X has values between A and B,  F  is used to denote the distribution  function  and f  is used to denote the density function.

When F and f  are continuous functions, the mean  of the random variable  is
`S(B)-S(A)` where S is any function that has `S'(x) =  x f(x) `.

The connection between (Euler) sums and differential equations will be discussed further in the calculus course and is sometimes described as the Fundamental Theorem of Calculus.