This video was produced by The Kaizen Effect\(^{[1]}\).
The derivation begins by expressing the problem (which is to find the minimum value of a functional \(S(q_j(x),q_j’(x),x)\)) in the language of single-variable calculus—meaning, we’ll want to express the functional \(S(q_j(x),q_j’(x),x)\) as a function of the single variable \(ε\) (which I’ll describe later) so that we can use the techniques of single-variable calculus to find the minimum value of \(S(ε)\) which occurs when \(\frac{d}{dε}(S(ε))=0\). Later on, we’ll deal with the more general case in which we solve for the stationary points of \(S(ε)\). Let the set of coordinates \(q_j(x)\) be generalized coordinates which are dependent variables of the independent variable \(x\). Let the quantity \(S\) be a parametric quantity whose magnitude is equal to the length of the curve \(c\) where \(c\) can be any arbitrary curve. (This length specifies the magnitude of our parametric quantity—which isn’t limited to being just physical length but can also be an action, a period of time, and so on.)
Let the two coordinates \((q_j(x_1),x_1)\) and \((q_j(x_2),x_2)\) denote the initial and final coordinate values associated with a system, respectively. In many physics problems, these coordinate values are typically taken to denote one time coordinate (in which case we’d replace the independent variable \(x\) with \(t\)) and the rest of the coordinates are typically taken to denote whichever spatial coordinates are the most convenient for a given problem; but in geometrical problems the generalized coordinates are, of course, taken to be all spatial coordinates. The choice of what kinds of generalized coordinates to use really just depends on the problem you’re trying to solve.
We’ll let \(S\) be any parametric quantity associated with a system going from \((q_j(x_1),x_1)\) to \((q_j(x_2),x_2)\), even those which are not minimized. Now, the whole purpose of this section will be to find the minimum value of \(S\)—those points in which the parametric quantity does not change with respect to the variables it depends on. But to do this, we must first write an expression which determines the length \(S\) of any arbitary curve\(^1\). How does one calculate the magnitude \(S\)? To do this, let’s divide the curve \(c\) into infinitely many, infinitesimally small line segments of length \(ds\). By taking the infinite sum (which is to say, by taking the integral) of all these small lengths of \(ds\), we can find that magnitude of \(S\) is given by
$$S=\int{dS}.\tag{1}$$
Equation (1) is nice and all, but we should re-express it in terms of something which can be calculated in terms of the independent variable \(x\). As a first steps towards doing this, we can rewriting the length \(dS\) using the Pythagorean Theorem to obtain \(dS=\sqrt{dx^2+dq_j^2}\). Let’s substitute this equation into Equation (1) to get
$$S=\int_?^?\sqrt{dx^2+dq_j^2}.\tag{2}$$
I have written the question marks in the limits of integration to denote that I’m leaving them out for the moment. Using algebraic manipulations, we can express the integral with respect to the independent variable \(x\) to obtain
$$S(q_j,q_j’,x)=\int_{x_1}^{x_2}\sqrt{1+\biggl(\frac{dq_j}{dx}\biggl)^2}dx=\int_{x_1}^{x_2}L(q_j,q_j’,x)dx.\tag{3}$$
where the integrand is some functional of \(q_j(x)\), \(q_j’(x)\) and \(x\) and is denoted by \(F(q_j(x),q_j’(x),x)\). (A functional is something which is a function of a function.) To find the minimum of \(q_j(x)\) would involve a procedure which you are already familiar with: the minimum occurs at the point where \(q_j(x)\) will not change (up to the first order\(^2\)) with a small change in \(x\); or, written in another way, where \(\frac{dq_j(x)}{dx}=0\). Finding the minimum value of \(S\) isn’t quite so simple. The minimum value of \(S\) corresponds to a point where \(S\) does not change, up to the first order, with small changes in \(q_j\), \(q_j’\) and \(x\). To find this minimum, we must use a technique known as calculus of variations: this is, basically, a procedure in which we use clever techniques to express \(S\) as a function of a single independent variable so that we can use the techniques of single-variable calculus in order to find its minimum value.
The first step necessary to accomplish this goal will be to assume that there is a curve \(\bar{q}_j(x)\) which is that particular curve whose arc length \(S(\bar{q}_j(x),\bar{q}_j'(x),x\) is minimized. As previously mentioned, we shall let \(q_j(x)\) represent any curve between \(q_j(x_1)\) and \(q_j(x_2\) so long that it is everywhere smooth and continuous. We shall, however, require the two constraints that \(\bar{q}_j(x_1)=q_j(x_1)\) and \(\bar{q}_j(x_2)=q_j(x_2)\). We shall now define a new function \(\eta(x)\) which we will let be any smooth curve such that \(\eta(x_1)=0\) and \(\eta(x_2)=0\). Let’s also define a parameter which we'll call \(\epsilon\) which we shall let be defined by the equation
$$q_j(x)=\bar{q}_j(x)+\epsilon\eta(x).\tag{4}$$
The product \(\epsilon\eta(x)\) is the error between the “correct path” \(\bar{q}_j(x)\) (the one whose arc length is minimized) and the arbitrarily chosen path \(q_j(x)\). By simply letting \(\eta(x)\) be a particular function (pick any you like; I have chosen the one illustrated in Figure #), so long as it satisfies the aforementioned constraints, then we can vary \(q_j\) with the single parameter \(\epsilon\) and write \(q_j(\epsilon)\). The previous sentence, for the purpose of comprehensibility, requires a little explanation. For the two fixed initial conditions \(q_j(x_1), x_1)\) and \((q_j(x_2),x_2)\), the function \(q_j(x)\) does not vary with the two functions \(\bar{q}_j(x)\) and \(\eta(x)\). The reason why \(q_j(x)\) does not vary with \(\bar{q}_j(x)\) is because \(\bar{q}_j(x)\) will not change regardless of what \(q_j(x)\) is—\(\bar{q}_j(x)\) depends upon only the initial conditions \((q_j(x_1),x_1)\) and \((q_j(x_2),x_2)\) being different. Basically, it would be very easy to see visually, on a graph, that by choosing two different initial conditions, the shortest path (\(\bar{q}_j\)) connecting those two points will also have to be different.
Lastly, since we let \(\eta(x)\) be a particular function, it follows that it also only depends on the initial conditions. (As you move the two points \(q_j(x_1), x_1)\) and \((q_j(x_2),x_2)\) apart or towards each other, you could imagine \(\eta(x)\) having to elongate or contract.) It follows that \(q_j(x)\) is, therefore, not a function of \(\eta(x)\). I have shown in Figure 1 how \(\eta\) (due to the way in which we defined it by Equation (1)) varies with \(x\) in such a way that by adding \(\epsilon\eta(x)\) to the "correct function" \(\bar{q}_j(x)\), we always manage to land on \(q_j(x)\). Now, \(q_j(x)\) represents "any" arbitrary curve; indeed, we could change \(q_j(x)\) to whatever we wanted and \(\epsilon\) would still satisfy Equation (1). In other words, we could just add a different function \(\epsilon\eta(x)\) (where \(\epsilon\) changed a little but \(\eta(x)\) did not) to \(\bar{q}_j(x)\) and land on \(q_j(x)\) again as in Figure 1. What all of this means is that the only thing which \(q_j\) depends on in Equation (1) is \(\epsilon\); therefore, we can write
$$q_j(\epsilon)=\bar{q}_j+\epsilon\eta.\tag{5}$$
By taking the derivative with the respect to \(x\) on both sides, we get
$$q_j'(\epsilon)=\bar{q}_j'+\epsilon\eta'.\tag{6}$$
At this point, we are now able to express the functional \(S(q_j(x),q_j'(x),x)\) as the function \(S(\epsilon)\). The minimum value of \(S(ε)\) occurs at a point where \(\frac{dS(ε)}{dε}=0\). In order to investigate the mathematical relationships which satisfy this condition (the condition that \(S(ε)\) is minimized), let’s differentiate both sides of Equation (3), set it equal to zero, and then proceed to use algebra to find mathematical relationships which satisfy this condition. Starting with the first step, we have
$$\frac{dS(ε)}{dε}=\int_{x_1}^{x_2}\frac{∂}{∂ε}[L(q_j,q_j’,x)]dx=0.\tag{7}$$
(To clarify any potential confusion, I took the partial derivative \(∂_ε\) on both sides; since the function \(S(ε)\) on the left-hand side is a single-variable function, it follows that \(∂_εS(ε)=\frac{dS(ε)}{dε}\).) Since \(L(q_j,q_j’,x)\) is a functional, in order to evaluate the partial derivative \(∂_εL(q_j,q_j’,x\), we must use the chain rule to get
$$\frac{dS(ε)}{dε}=\int_{x_1}^{x_2}\biggl(\frac{∂L}{∂q_j}\frac{∂q_j}{∂ε}+\frac{∂L}{∂q_j’}\frac{∂q_j’}{∂ε}\biggl)dx.\tag{8}=0.$$
Let’s evaluate the partial derivatives \(∂/∂ε[q_j(\epsilon)]\) and \(∂/∂ε[q_j’(\epsilon)]\) to get
$$\frac{∂q_j(\epsilon)}{∂ε}=\frac{∂}{∂ε}(\bar{q}_j(x)+ε\eta(x))=\eta(x)$$
and
$$\frac{∂q_j’(\epsilon)}{∂ε}=\frac{∂}{∂ε}(\bar{q}_j’(x)+ε\eta’(x))=\eta’(x).$$
Let’s substitute these results into Equation (8) to get
$$\frac{dS(ε)}{dε}=\int_{x_1}^{x_2}\biggl(\frac{∂L}{∂q_j}\eta(x)+\frac{∂L}{∂q_j’}\eta’(x)\biggl)dx=\int_{x_1}^{x_2}\frac{∂L}{∂q_j}\eta(x)dx+\int_{x_1}^{x_2}\frac{∂L}{∂q_j’}\eta'(x)dx=0.\tag{9}$$
There is great value in employing integration by parts on the second integral in Equation (9) since it’ll allow us to rewrite the integrand of the form, \(\text{‘some stuff’ times }\eta=0\); this form has the equations of motion right in front of our face as we shall see. From the standpoint of physics, the motivation of this is apparent as the equations of motion will allow us to determine the motion of a system. Recall that the equation for integrating by parts is given by
$$\int_{v_1}^{v_2}udv=uv-\int_{v_1}^{v_2}vdu.$$
If we let \(u=∂L/∂q_j’\) and \(dv=\eta’(x)\), then our second integral can be simplified to
$$\int_{x_1}^{x_2}\eta’(x)\frac{∂L}{∂q_j’}dx=\biggl(\int{udv}\biggl)dx=\biggl(\frac{∂L}{∂q_j’}\eta(x)|_{x_1}^{x_2}-\int_{x_1}^{x_2}\eta(x)\frac{d}{dx}\frac{∂L}{∂q_j’}\biggl)dx=-\int_{x_1}^{x_2}\eta(x)\frac{d}{dx}\frac{∂L}{∂q_j’}dx.$$
Let’s substitute this result into Equation (9) to get
$$\frac{dS(ε)}{dε}=\int_{x_1}^{x_2}\eta(x)\biggl[\frac{∂L}{∂q_j}-\frac{d}{dx}\frac{∂L}{∂q_j’}\biggl]dx.\tag{10}$$
Since \(\eta(x)\) can be any arbitrary function it is, in general, not equal to zero. Therefore, the other term in the product must be zero and we have
$$\frac{∂L}{∂q_j}-\frac{d}{dx}\frac{∂L}{∂q_j’}=0.\tag{11}$$
Equation (11) is known as the Euler-Lagrange equation and it is the mathematical consequence of minimizing a functional \(S(q_j(x),q_j’(x),x)\). It is a differential equation which can be solved for the dependent variable(s) \(q_j(x)\) such that the functional \(S(q_j(x),q_j’(x),x)\) is minimized. The next few sections will be concerned with different problems in which the question starts off as: find the minimum value of some quantity \(S\). These problems start off with a little math to express the quantity as a functional. All of the problems boil down to solving for the coordinates \(q_j(x)\) which minimize \(S\); this will be accomplished by solving Equation (11). Although simple to say, we shall see that this can, sometimes, involve a lot of algebra and tinkering—the math will sometimes get a little hairy.
This article is licensed under a CC BY-NC-SA 4.0 license.
References
1. The Kaizen Effect. "Lagrangian Mechanics - Lesson 1: Deriving the Euler-Lagrange Equation & Introduction". Online video clip. YouTube. YouTube, 04 May 2016. Web. 18 May 2017.
Notes
1. When we think about the curve \(q_j(x)\) which minimizes the quantity \(S=\int{(dq_j^2+dx^2)}\), it is important not to lose track of the generality of our choice of coordinates \(q_j\) and \(x\). In some problems, we'll just choose \(q_j\) and \(x\) to be spatial coordinates in which case \(S=\int{(dq_j^2+dx^2)}\) is a measure of distance; but in other problems, we'll choose \(x\) to be a time coordinate in which case \(S=\int{(dq_j^2+dx^2)}\) is not a measure of distance. I wanted to mention this early on because a common confusion and ambiguity is whether or not this derivation we'll be doing in this section applies only to functionals \(S\) which measure length. Be reassured that this is not the case; \(S\) can measure many other things besides length as we'll see in subsequent sections where we solve some problems using the analysis we developed in this section.
2. The minimum value of some arbitrary single variable function, say \(y(t)\), occurs when \(\frac{dy(t)}{dt}=0\). This condition implies that for a very small change in time \(dt\), the change in the function is \(dy(t)=0\). You might be wondering: “if \(t\) changed by a very small amount, then why didn’t \(y(t)\) change by a very small amount as well?” In reality, \(y(t)\) did in fact change a little: but this change is captured in only 2nd order (and higher) derivatives and, according to Feynman, “the deviation of the function from its minimum value is only second order [or higher].” The full expression describing the differential change in \(y(t)\) is, in general, a function of the nth order derivative. In this example, the change in \(y(t)\) as a function of the first order derivative is zero. The terminology and phrasing used to describe the previous sentence is as follows: we say that “the function \(y(t)\) does not change up to the first order.”