Processing math: 38%
Chain Rule Proof Theorem (Chain Rule): If g is differentiable at a and f is differentiable at b=f(a), and h(x)=f(g(x)) for an interval I with aI
 then h is differentiable at a and h(a)=f(g(a))g(a).

   
PROOF of Chain Rule:
Part I: Assume there is some interval ,I containing a where for all xI , g(x)g(a).
Let b=g(a)and k=g(a+h)-g(a)=g(x)-g(a) for h0.
Note g(x)=g(a+h)=g(a)+k=b+k. See Figure 1 .

From the assumption that g is differentiable at a, we have that g is also continuous at a . Thus we can conclude that as h0,k0.

We’ll follow the usual steps in finding the derivative of P at a:
Step I: P(a+h)=f(g(a+h))
          -P(a) 
Step II: P(a+h) - P(a) = f(g(a+h)) - f(g(a)) = f(b + k) - f(b).
Now we assumed that k ne 0 . [Note: This is a major assumption for some functions.]
So
P(a+h) - P(a) = {f(b + k) - f(b)}/k . k
Therefore
P'(a) = lim_{h to 0} { P(a+h) - P(a)}/h
= lim _{h to 0,  k to 0} {f(b + k) - f(b)}/k  cdot  k/h
= lim _{h to 0,  k to 0} {f(b + k) - f(b)}/k  cdot {g(a+h)-g(a) }/h
= f '(b)  cdot g'(a)
= f '(g(a)) cdot g'(a).

Part II:  Recall that we had k = g(a+h)-g(a) for h ne 0.
Suppose that  k = 0 for values of h arbitrarily close to 0.
Since we assume that g is differentiable we know that
lim_{h to 0} {g(a+h)-g(a)}/ h must exist. Our assumption that k = 0 for h arbitrarily close to 0 means that
there is a sequence of  values of h, {h_n} with h_n to 0 and  g(a+h_n) -g(a) = 0 for all n.
Thus lim_{n  to oo} {g(a+h_n)-g(a)}/ {h_n } = lim_{n  to oo} 0/ {h_n }= 0 . [See Figure 2 ]
Thus  g'(a) = lim_{ h to 0} {g(a+h)-g(a)} / h = 0. [0 is the only possible limit.]

To complete the argument we need only show that P'(a)= 0.
But for precisely the same h values that had k = g(a+h)-g(a) = 0, we have g(a+h) = g(a). Thus for these values of h
P(a+h) - P(a) = f(g(a+h)) - f(g(a)) = f(b + k) - f(b) = f(b) - f(b) = 0.
and hence {P(a+h) - P(a)}/h = 0. [See Figure 3]

Now for any h where k ne 0, see Figure 1, the argument of  part I is still valid to show that {P(a+h) - P(a)}/h to 0 as h  to 0.

[This is primarily because g'(a)=0.]

In summary then , as h approaches 0 either {P(a+h)-P(a)}/h is
close to or actually is 0.
Thus P'(a) =lim_{h to 0} {P(a+h) - P(a) }/h= 0. EOP.