In the last section we saw that if we know the derivatives of a pair of functions, then we can quickly find the derivative of their sum, difference, or product. We also saw that we can quickly find the derivative of a constant times any function whose derivative we already know. But we still haven't covered all the ways to combine functions.

I have no cute stories about animals or animated tattoos for quotients. Fortunately, they are easy once you know the product rule. So here is a straight-ahead explanation.

In the answer to exercise 3 of section 4.2 we turned a difference into a sum so that we could derive a difference rule from the sum rule. Here we shall turn a quotient into a product to derive a quotient rule from the product rule. We would like to know the derivative of:

f(x) h(x) =Simply by multiplying both sides of 4.3-1 by~~eq. 4.3-1 g(x)~~

h(x) g(x) = f(x) eq. 4.3-2These two equations, 4.3-1 and 4.3-2, say exactly the same thing, don't they? But in 4.3-2 we have a product on the left and no quotient anywhere. And we already know how to take the derivative of a product. So by applying the product rule to the left hand side of 4.3-2 and simply taking the derivative of the right hand side (remember, if two things are equal, their derivatives are equal as well) and we get:

(h'(x) g(x) ) + (h(x) g'(x) ) = f'(x) eq. 4.3-3All we have to do to finish this derivation is get

h'(x) g(x) = f'(x) - (h(x) g'(x) ) eq. 4.3-4Now you can divide out

f'(x) - (h(x) g'(x) ) h'(x) =And this would be a suitable answer, except that we'd like the right hand side of 4.3-5 to be in terms of~~eq. 4.3-5 g(x)~~

æ f(x) ö f'(x) - ç~~g'(x) ÷ è g(x) ø h'(x) =~~~~eq. 4.3-6 g(x)~~

Your algebra instructor would probably have you simplify this by multiplying
numerator and denominator by `g(x)`.

(g(x) f'(x) ) - (f(x) g'(x) ) h'(x) = |

(where `g ^{2}(x)` indicates the square of

f(x) h(x) =In words,~~g(x)~~

You can also go to a lot more trouble to derive this same rule using the limit method we used to derive the sum and product rules, but why bother when you can derive it using an existing rule and a little old fashioned algebra? But some instructors, for reasons known but to themselves, will ask you on an exam to derive this thing the hard way. So if you need to learn the hard way, click here. As far as I'm concerned, though, math is about doing things the easiest way you can find.

Find the derivative of:

x + 1 h(x) =We can see that this is the quotient of~~x - 1~~

( (x - 1)(1) ) - ( (x + 1)(1) ) h'(x) =When you gather up like terms in the numerator, you end up with:~~(x - 1)~~^{2}

-2 h'(x) =~~(x - 1)~~^{2}

Figure 4-6 shows both |

We have seen by two separate proofs that if
`g(x) = x ^{n}`

Recall from algebra that when we say
`h(x) = x ^{-n}`

1 h(x) =This is something we can apply the quotient rule to. If~~eq. 4.3-8 x~~^{n}

(xThe left hand half of the numerator goes away because it is multiplied by zero. The right hand half is multiplied by^{n}× 0) - (1 × nx^{n-1}) h'(x) =~~eq. 4.3-9 x~~^{2n}

-nxYou can see that we have a multiple of a power of^{n-1}h'(x) =~~eq. 4.3-10 x~~^{2n}

-n h'(x) =But when we have an exponential term in the denominator, our negative exponent rule from 4.3-8 says we can use a negative exponent to write it:~~eq. 4.3-11 x~~^{n+1}

f'(x) = -nxLook at the rule we have for taking the derivative of^{-n-1}eq. 4.3-12

**1)** Find the derivatives of the following:

x^{2}+ 2x + 1 a) h(x) =~~x~~^{2}- 3x + 2 x (x - 3) b) h(x) =~~x~~^{2}+ 1 c) h(x) = 3x^{-3}f(x) d) h(x) =~~x~~^{3}- 4x^{2}+ 3x - 1 x^{2}e) h(x) =~~g(x) 1 f) h(x) =~~~~x~~^{-2}- x^{-1}1 g) h(x) =~~f(x)~~

**3)** We know from algebra that if

f(x) h(x) =then~~f(x)~~

We have already learned how to take the derivative of pairs of functions combined by any of the four arithmetic operations. You have known those four operations since you were in grammar school, so once understand what a derivative is, it was fairly easy to understand what the sum or product or quotient rules were trying to say. Each one said, "If you have two functions combined by this arithmetic operation and you want to know the derivative of the combination, then apply the following rule:" And each one went on to prescribe some arithmetic munging of the two functions and their derivatives that, if you did that munging, you'd end up with the derivative you were after.

In grammar school, they didn't teach you about composites. That's because you can't take the composite of two numbers, only the composite of two functions. But now you know that the composite of two functions is the same as applying both functions, in succession, to some value.

So, for example, if

_ f(x) = Öxand

______ f(g(x)) = Ö1 - xNotice that this is simply applying^{2}

Because composition of functions is something you didn't study until
recently, you may have more difficulty with the concept than you do
with addition, subtraction, multiplication, or division. Here is an
everyday analogy. When you bake bread, you knead the dough. Think
of that as a function. The dough is changed by kneading it. Then
you let the kneaded dough rise. That is another function that causes
a change in the dough. Once you have kneaded the dough *and*
let it rise, you have performed a composite function on the original
dough. And notice that the order in which you performed these functions
matters. If you let the dough rise before you knead it, you would
end up with something less than bread. Of course there is another function
that you must combine with kneading and rising to get bread, and that
is baking. So bread is made by a composition of three functions on
the dough. And only *one* order of applying those three functions
works to make bread. Any other order makes something, but not many
folks would want to eat it.

There is one more concept involving composites that it is useful for
you to know, and that is the idea of functions that are *inverses* of each
other. If `f(x)` is the inverse of `g(x)`, then
`f(g(x)) = x``f(g(x)) = x``g(f(x)) = x`

If `f(x)` is the inverse of `g(x)`, then we typically
write `f(x) = g ^{-1}(x)`

An example of inverse functions is `f(x) = x ^{2}`

Some functions are their own inverses. For example,
`f(x) = C - x``C` is any constant, is
its own inverse. If you apply that function twice to any `x` you'll
always get back your original `x`. Likewise

C f(x) =is its own inverse whenever~~x~~

In section 3.1, we had a little story about the professor's abberant watch. This watch, though set at noon to match the station clock, would sometimes run fast and surge ahead of the station clock. Other times it would run slow and straggle behind the station clock.

The professor's
watch never stopped, ran backward, or skipped ahead. We gave the
name, `t`, to the time on the station clock and `t _{p}`
to the time on the professor's watch.

You will recall that the professor understood well the idiosyncrasies of
his watch. He understood them well enough that he could infer the
station clock time by looking at his watch. He knew that for any
time, `t`, that the station clock showed, there was a unique time,
`t _{p}`, that his watch would show at that same time.
Hence, he knew what function it was that

But knowing the function, `w(t)`, cannot tell you or the professor
the station clock time by reading the watch. It can only tell you the
watch time knowing the time on the station clock. To know the station
clock time from the watch time, we and the professor have to know
what the *inverse* of `w(t)` is. That is we want
`t = w ^{-1}(t_{p})`

Figure 4-7 shows `w(t)`, `w ^{-1}(t_{p})`,
and both of their derivatives.
Let's begin by looking at

Notice there is a pink trace on the graph that goes up in a straight line.
Each time the blue trace, `w(t)`, crosses the pink, the professor's watch
momentarily agrees with the station clock.

There is a second blue trace that traverses the bottom of the graph
horizontally in a wavy pattern. This is the derivative of `w(t)`.
Again, the horizontal axis it the station clock time. The vertical axis,
in this case is the *rate* at which the professor's watch advances
as a fraction of the station clock time.

At noon (extreme left) we see that the professor's watch is going at a rate of about 1.88 minutes forward for every minute the station clock goes forward. In other words, the professor's watch is running about 88% fast. At about 47 seconds past noon, the professor's watch is running at one minute for every minute on the station clock. In other words, at that moment, the professor's watch is running at the right speed, even though it reads half a minute ahead of the station clock. But at 12:01:34, the professor's watch is running at only 12% the rate of the station clock, that is, the professor's watch only advances 0.12 minutes for every minute the station clock advances. To put in another way, at that moment, the station clock is running about eight times as fast as the professor's watch. Of course, shortly after that, the professor's watch begins to speed up again, and at about 12:02:21, it is running at the same rate as the station clock, although it is still behind for having run so slow in the minute and a half just past. After that, the professor's watch continues to speed up until (at 12:03:08) it is once again running 1.88 minutes for every minute on the station clock. And then the cycle repeats itself.

Now to the green traces. The wavy green trace that ramps up from lower
left to upper right is `w ^{-1}(t_{p})`.
So it represents what the station clock reads knowing the time on
the professor's watch.

When interpreting this trace, the horizontal axis is the time
shown on the professor's watch, `t _{p}`, and the vertical
axis is the station clock time,

The station clock continues to outpace the professor's watch for just a short while, then begins to drop back relative to the watch, until they agree again at about 12:03:08. Then the cycle begins again.

The green spiky-looking trace is the derivative of
`w ^{-1}(t_{p})`. Again, when interpreting
this trace, the horizontal axis is minutes on the professor's watch.
The trace shows the

Recall that in section 3.1, we were interested in the train's position as
a function of time, and particularly as a function of time as measured
by the professor's watch. The train's position as a function of the
station clock time we shall call `x(t)`. It's position as
a function of professor's watch time we shall call
`x _{p}(t_{p})`. And remember we came up with
the relationship:

xYou can see that the right hand part of 4.3-13 is a_{p}(t_{p}) = x( w^{-1}(t_{p}) ) eq. 4.3-13

Figure 4-8 shows the train's position as it leaves the station as a function
of station clock time in blue and as a function of the professor's watch
time in green.
Observe that as the train leaves the station at noon, the professor's
watch is outrunning the station clock, or in other words, the station
clock is falling behind the professor's watch. So according the professor's
watch, the train is not going as fast as it would be going by the station
clock. Later, the station clock surges ahead of the professor's watch.
When the station clock appears to be gaining time on the professor's
watch, the train appears, by the watch, to be going faster than it does
by the station clock. In fact, knowing that the speed of the train by
the professor's watch is the slope of the green curve, we can see that
the train appears to be going quite a bit faster during the surge than
the speed as measured by the station clock. Perhaps even `8` times
as fast?

**And that is the point** this whole story has been leading to. If
the train moves at half a mile per minute by the station clock, how fast
is it going by the professor's watch at the moment the watch is running
slow by factor of `8`? In that brief time, with each second ticked
off by the professor's
watch, the station clock ticks off `8` seconds. And the train
moves
`0.5` miles per minute × `8` seconds / `60` seconds per minute`8` seconds. To the professor,
who is looking at his watch and carrying out rate calculations in his
head, the train appears to be going eight times its actual speed at
that moment.

And what function is it that we have already seen that goes up by a factor
of eight during that brief moment? It's
`w ^{-1}'(t_{p})` (see figure 4-7). Figure 4-9
shows the speed of the train as a function of time according to both
the station clock (blue) and the professor's watch (green). We can see
that according to the station clock (which is the accurate timepiece here),
the train accelerates smoothly from the station and settles at a steady
speed of

Now recall that speed is the time derivative of position. So the two
traces on figure 4-9 are `x'(t)` (blue) and
`x _{p}'(t_{p})` (green). That is, they are
the derivatives of

In the example suggested by the story, we found the derivative of the
composite,
`x( w ^{-1}(t_{p}) )`

And that is h(x) = f( g(x) ) eq. 4.3-14athen h'(x) = f'( g(x) ) g'(x) eq. 4.3-14b |

Here is an example. Suppose

h(x) = (xThere are two ways we can find the derivative of this. One is to multiply it out and use the rules we have already established. The other is to apply the chain rule. To apply the chain rule, we observe that if^{2}+ 1)^{3}eq. 4.3-15a

f(x) = xand^{3}eq. 4.3-15b

g(x) = xthen^{2}+ 1 eq. 4.3-15c

f'(x) = 3xand that^{2}eq. 4.3-15d

g'(x) = 2x eq. 4.3-15eHence we have

f'( g(x) ) = 3(x(make sure you understand how I got that by substituting 4.3-15c for^{2}+ 1)^{2}eq. 4.3-15f

Equation 4.3-15f does step
one of the chain rule, which is making the substitution into the derivative
of the outer function of the composite.
The second step says to multiply the result of
the first step by `g'(x)`, that is by the derivative of the inner
function of the composite. So from 4.3-15e and the chain rule, we have

h'(x) = 3(x^{2}+ 1)^{2}(2x) eq. 4.3-15g

I mentioned at the beginning of this example that there are two ways of doing it. And you'd expect that they both arrive at the same answer. The second way of doing it is to multiply out 4.3-15a and find the derivative of the result. When you multiply out 4.3-15a, you get

h(x) = xand taking the derivative of the sum, term by term, we have:^{6}+ 3x^{4}+ 3x^{2}+ 1 eq. 4.3-16a

h'(x) = 6xIf you use a little algebra and factor a^{5}+ 12x^{3}+ 6x eq. 4.3-16b

h'(x) = 3(xFinally if you use a little more algebra, you notice that^{4}+ 2x^{2}+ 1) (2x) eq. 4.3-16c

h'(x) = 3(xwhich is the same expression as in 4.3-15g. So the chain rule comes up with the same answer as the methods we already know in this example. Can we be sure it will come up with the right answer all the time? Here is the proof that it will (and this might be on an exam):^{2}+ 1)^{2}(2x) eq. 4.3-16d

We know that the derivative of an arbitrary function, `u(x)`,
is defined by

u(x + h) - u(x) u'(x) = lim(here I have used~~eq. 4.3-17 h~~~~> 0 h~~

f( g(x + h) ) - f( g(x) ) u'(x) = limUsing the same substitution that we used to derive the sum and product rules, that is that in the limit,~~eq. 4.3-18 h~~~~> 0 h~~

f( g(x) + h g'(x) ) - f( g(x) ) u'(x) = lim~~eq. 4.3-19 h~~~~> 0 h~~

Think about the term, `h g'(x)``h`, which goes toward zero. So `h g'(x)`

To make things easier to decipher, let's make up two new variables,

hand^{*}= h g'(h) eq. 4.3-20a

xIf we substitute them into 4.3-19 we get:^{*}= g(x) eq. 4.3-20b

f( xUsing the same principle as we used to substitute for^{*}+ h^{*}) - f( x^{*}) u'(x) = lim~~eq. 4.3-21 h~~~~> 0 h~~

f( xTaking the cancellation we have:^{*}) + (h^{*}f'( x^{*}) ) - f( x^{*}) u'(x) = lim~~eq. 4.3-22 h~~~~> 0 h~~

hNow simply substitute back to the original variables,^{*}f'( x^{*}) u'(x) = lim~~eq. 4.3-23 h~~~~> 0 h~~

h g'(x) f'( g(x) ) u'(x) = limThen using our old friend, the rule from section 2.5, we can take the limit and find that~~eq. 4.3-24 h~~~~> 0 h~~

u'(x) = g'(x) f'( g(x) ) eq. 4.3-25And thanks to the commutativity of multiplication, this is the same as what we stated as the chain rule earlier.

Your instructor might have used an approach more similar to what follows than to what I just gave above. Look at the following proof, and if it seems more familiar to you than the above, then it is the one you should reproduce on an exam if asked.

We start again with the limit formula. If
`u(x) = f(g(x) )`

f( g(x + h) ) - f( g(x) ) u'(x) = limWhatever the difference between~~eq. 4.3-26 h~~~~> 0 h~~

g(x + h) = g(x) + k eq. 4.3-27and we substitute that into equation 4.3-26 to get

f( g(x) + k ) - f( g(x) ) u'(x) = limNow multiply numerator and denominator by~~eq. 4.3-28 h~~~~> 0 h~~

f( g(x) + k ) - f( g(x) ) k u'(x) = limIt's clear that when~~eq. 4.3-29a h~~~~> 0 k h~~

f( g(x) + k ) - f( g(x) ) k u'(x) = limWe know that~~eq. 4.3-29b k~~~~> 0 k h~~

f( g(x) + k ) - f( g(x) ) f'(g(x) ) = limWhy? Because this is the limit formula for that derivative, except that~~eq. 4.3-30 k~~~~> 0 k~~

k u'(x) = lim f'(g(x) )All that remains is to find out what~~eq. 4.3-31 k~~~~> 0 h~~

g(x + h) - g(x) g'(x) = limbut remember from equation 4.2-27, we had~~eq. 4.3-32 h~~~~> 0 h~~

g(x) + k - g(x) g'(x) = limand we get cancellation of the~~eq. 4.3-33 h~~~~> 0 h~~

k g'(x) = limAs stated before, whenever~~eq. 4.3-34a h~~~~> 0 h~~

k g'(x) = limSo where~~eq. 4.3-34b k~~~~> 0 h~~

u'(x) = lim f'(g(x) ) g'(x) eq. 4.3-35 kAnd since~~> 0~~

u'(x) = f'(g(x) ) g'(x) eq. 4.3-36which is what we set out to prove.

The
Leibniz "`d`" notation makes it
easy to remember how to apply the chain rule. Remember that the Leibniz
notation always shows what variable a derivative is taken with respect to.
We want to find the derivative with respect to `x` of
`f(g(x))`

dg g'(x) =and~~eq. 4.3-37a dx~~

df f'(g(x)) =Notice that in the latter equation,~~eq. 4.3-37b dg~~

What we are interested in finding is the derivative of `f` with
respect to `x`, which, in the Leibniz notation is

dfWe can find~~dx~~

dg dfusing equations 4.3-37a & b respectively. Now look what happens when we multiply them together:and~~dx dg~~

df dg dfJust as Leibniz envisioned, we can treat the ratios of the "~~dg~~df~~=~~~~=~~~~eq. 4.3-38 dg dx~~~~dg~~dx dx

df dg dfIf you use the Leibniz notation to check your use of the chain rule, it will never steer you wrong. When the cancellations get you the derivative you are interested in with respect to the variable you are interested in taking it, then you have used the correct chain rule product.~~= f'(g(x)) g'(x) =~~~~eq. 4.3-39 dg dx dx~~

Move on to Chain Rule Applications

email me at *hahn@netsrq.com*