Historical and philosophical contexts of the calculus of variations

The calculus of variations is concerned with finding functions that extremise (maximise or minimise) a particular quantity. A classic example is the catenary problem. What shape does a chain take when hung between two points? It is the unique shape that minimises the potential energy of the chain; and such a shape is called a catenary, and is given by the cosh function. The idea that the potential energy of a hanging chain should be minimised is a variational principle. Another example of a variational principle is the notion that a soap bubble or water balloon should have a shape that has a minimal surface area, namely a sphere. The variational principles in both examples predict the same shapes as those that one would find by constructing force-balance arguments on line or surface elements, but the variational formulations are far simpler to describe and implement.

The idea that theories might be summarised by neat variational principles had been proposed since antiquity. Such theories are aesthetically pleasing in their simplicity, and in line with the principle of parsimony (or Occam’s razor).

However, there is a major difference between the above examples, and the principle of least action. In the above problems, the independent variables are spatial, and are concerned with a steady state. The principle of least action, which concerns the evolution of particle motions with respect to time, appears to require knowledge about the future. This is metaphysically troubling even today.

Optics and Fermat’s principle

In the early 1600s, a number of scientists, including Willebrord Snellius in 1621, independently discovered an empirical relationship between the angles of incidence and refraction when a beam of light passes through a boundary of different materials, which we now know as Snell’s law. In a 1662 letter, Pierre de Fermat showed that, under certain assumptions about the speed of light in different media, then Snell’s law implies that the path taken by a ray between two given points is that of minimal travel time, and conversely, a ray that takes a path of minimal travel time obeys Snell’s law at the interface. Fermat’s argument, however, assumes that light travels slower in more dense media. We now know this to be true, but actual experimental evidence that light in vacuo travels at a finite speed was not available until 1676.

Fermat’s principle of minimal time was criticised by the prevalent Cartesian school on two grounds. Firstly, the above assumption about the speed of light was unjustified, and not compatible with René Descartes’ notions that that the speed of light in vacuo is infinite, and higher in dense media. (These are not necessarily contradictory statements: the mathematical machinery for comparing infinite or infinitesimal quantities was concurrently being developed, although Newton’s Principia was not yet published and the calculus would not be formalised for another century or two.) A more fundamental criticism of Fermat’s principle was that it is teleological: why does light ‘choose’ to take a time-minimising path, and ‘know’ how to find such a path in advance? Why should it ‘choose’ to minimise travel time and not some other quantity such as distance (which would give a straight line)? Claude Clerselier, a Cartesian critic of Fermat, wrote in reply:

… The principle which you take as the basis for your proof, namely that Nature always acts by using the simplest and shortest paths, is merely a moral, and not a physical one. It is not, and cannot be, the cause of any effect in Nature.

In other words, although Fermat’s principle was mathematically equivalent to Snell’s law, and supported by experiment, it was not considered a satisfactory description of a physical basis behind Snell’s law, as no physical mechanism had been offered.

Particle mechanics and the principle of least action

Newton’s Principia was published in in 1687. After some initial controversy of their own, Newton’s ideas had become accepted by the time of Maupertuis and Euler. Newton’s formulation of particle mechanics, including the law of motion F = ma and the inverse square law for gravitation, gives a mathematical foundation for Kepler’s (empirical) laws of planetary motion.

An important development came in the 1740s with the development of the principle of least action by Pierre Louis Maupertuis and Leonhard Euler. Maupertuis defined action S as an ‘amount of motion’: for a single particle, action is momentum mv multiplied by the distance s travelled; for constant speed, s = vt, so the action is S = mv2t. In the absence of a potential, this matches our modern definition of action, up to a factor of 2. (Maupertuis referred to the quantity mv2 as the vis viva, or ‘living force’, of the particle.) Studying the velocities of two colliding bodies before and after collision, Maupertuis showed that the law of conservation of momentum (by now well-established) is equivalent to the statement that the final velocities are such that the action of this process is minimised.

Euler is generally credited with inventing the calculus of variations in an early form, applying it to studying particle trajectories. (The modern form was later developed by Lagrange, his student, in 1755.) Euler generalised Maupertuis’ definition of action into the modern action integral, and included a new term for potential energy. He showed in 1744 that a particle subject to a central force (such as planetary motion) takes a path (calculated by Newton) that extremises this action, and vice-versa. Lagrange later showed more generally that the principle of least action is mathematically equivalent to Newton’s laws.

But why is this a sensible definition of action? In fact, what is action?

Maupertuis’ reasoning was that ‘Nature is thrifty in all its actions’, positing that action is a sort of ‘effort’. He was happy to attribute the principle of least action as some sort of God trying to minimise the effort of motions in the  universe. But how does one know to choose this definition of action and not some other? As for refraction, why does one minimise travel time and not distance? Maupertuis argues that one cannot know to begin with, but that the correct functional needs to be identified.

Fermat and Euler took a rather weaker view, and refuse to make any metaphysical interpretations about their variational principles. Fermat stated that his principle is ‘a mathematical regularity from which the empirically correct law can be derived’ (Sklar 2012): this is an aesthetic statement about the theory, but says nothing about its origins.

Why do we find the principle of least action problematic?

Everyone agrees that the principle of least action is mathematically equivalent to Newton’s laws of motion, and both have equivalent status when compared against experiments. However, Newton’s laws are specified as differential equations with initial values (‘start in this state, and forward-march in time, with no memory about your past and no information about your future’). In contrast, the principle of least action is formulated as a boundary value problem (‘get from A to B in time T, accumulating as little action as possible’), governed by the Euler–Lagrange equations. Why are we less comfortable with the latter?

One reason is the question: Given that we are at the initial position A, how can we know that we will be at B after time T? This can be resolved by realising that when we solve the Euler–Lagrange equations, we have not been told what the initial velocity is, and have the freedom to choose it such that the final position will be B. Thus, one can convert between an IVP and a BVP: this is the approach taken with the shooting method for solving BVP numerically.

Another reason perhaps is cultural: most of us are taught Newtonian physics before Lagrangian physics. This is paedagogically reasonable: the Newtonian formulation requires far less mathematical machinery. There is also a technical reason for feeling more comfortable with describing physics through an IVP than a BVP: according to the Picard–Lindelöf theorem, an IVP is guaranteed to have a unique solution, at least for a finite domain; a similar guarantee cannot be made for a BVP.

Acknowledgements

The above essay has been guided by Lawrence Sklar’s book, Philosophy and the Foundations of Dynamics.

Type inference for lazy LaTeXing

I am doing some work with asymptotic expansions of the form

 h = h^{(0)} + \epsilon h^{(1)} + O(\epsilon^2)

and I don’t care about second-order terms. The parentheses are there to indicate that these are term labels, not powers. But actually, there’s no need to have them, because if I ever need to raise something to the zeroth power, I can just write 1; and if I need to raise something to the first power, I don’t need to write the power at all. So, there’s no confusion at all by writing h^0 instead of h^{(0)} ! If I need to square it, I can write h^{02}. If I need to square h^{(1)}, then I can write h^{12}; it’s unlikely I’ll need to take anything to the 12th power.

It’s an awful idea and a sane reviewer would reject it, but it does save time when LaTeXing…

Colourblindness and probability

A female acquaintance of mine was recently surprised to find that both of her sons were colourblind, despite neither parent being colourblind. A natural question to ask is ‘What are the odds?’ This question turns out to be open to interpretation, depending on what we mean by probability and odds.

Continue reading Colourblindness and probability

Chinese proverbs

I’ve noticed an annoying and persistent tendency for people to inaccurately claim that certain sayings are Chinese proverbs. ‘Give a man a fish and you feed him for a day; teach a man to fish and you feed him for a lifetime’ is one example of such. ‘A picture is worth a thousand words’ is another.

These are admittedly only a couple of examples, so I may be going a bit far, but nonetheless, I claim that the following proverbs are true:

  • C0: For any proverb P, ‘P is a Chinese proverb’ is a proverb.
  • C1: For any proverb P, P is not a Chinese proverb if and only if P is claimed to be a Chinese proverb.

Since it is unnecessary for the Chinese to claim that a statement is a Chinese proverb (we need merely claim it to be a proverb), I make also the following claim:

  • C2: For any proverb P, ‘P is a Chinese proverb’ is not a Chinese proverb.

Can these claims be consistent, and which (if any) can I consistently claim to be Chinese proverbs?

Addendum: Oftentimes, the claim that a proverb is Chinese is used by orientalist woo-peddlers to create credence for their claims. Allow me therefore to go so far as to claim:

  • C3: For any proverb P, if P is claimed to be a Chinese proverb then P is false.

Is this consistent?

Retinal detachment and Bayes’ theorem

I had my eyes tested yesterday, having put it off for several years. Happily, my vision seems not to have deteriorated in the last couple of years.

After the test, the optometrist told me that my short-sightedness meant that I was at risk of retinal detachment (RD). I asked if this was something to be worried about on a day-to-day basis. They said no, it was just something to be aware of: retinal detachment affects about 1 in 10,000 people, but 40% of cases happen in people with severe myopia.

I didn’t feel very comforted by this, since this information doesn’t directly tell you about my personal risk of retinal detachment given that I have severe myopia. To make sense of that figure, you need to know the prevalence of severe myopia.

According to Haimann et al. (1982) and Larkin (2006), the figure of 1 in 10,000 is actually an annual incidence: in a population of 10,000 healthy people, on average one new case of RD will develop after a year; the lifetime risk is therefore about 1 in 300. The prevalence of severe myopia (beyond −5 diopters) amongst Western Europeans aged 40 or over is about 4.6% (Kempen et al. 2004).

A calculation using Bayes’ theorem would predict that RD has an incidence, amongst people (Western Europeans aged 40 or over) with severe myopia, of about 1 in 1,000 per year, which corresponds to a lifetime risk of about 1 in 30.

This lifetime risk is surprisingly high, and not nearly as comforting as ‘1 in 10,000’. It is so much higher than the base incidence because severe myopia is fairly uncommon, and also because people live quite long lives; the exact relationship between lifetime risk and annual incidence depends on one’s lifespan, and the incidence is not uniform with age. Fortunately, the annual incidence of 1 in 1,000 is still quite small, so no, it’s not something to worry about every day.

This is an extremely simplified calculation using figures drawn from across different populations; the Haimann study was for Iowans of all ages. Myopia is much more common in China, but it’s unlikely that there’s any data out there specifically on Chinese ethnicity people living in Western Europe (both genetics and environment affect myopia). I’ve been unable to find any more detailed information on the prevalence of retinal detachment as a function of myopia strength.

Gariano and Kim (2004) describe the mechanism by which severe myopia might cause retinal detachment.

TL;DR: Opticians don’t understand conditional probabilities, causing me to stay up late browsing optometry and epidemiology papers.

Calculus Beyond College: Analysis and modelling

This post is motivated by a number of discussions that I had at Cambridge open days last week, when I talked to school students who were interested in doing maths at university, and who may have come across the unfamiliar term ‘analysis’ when looking at our course syllabus.

Mathematical analysis is a very large area, but, broadly speaking, it is the study of limiting processes and of approximations. The basic concept of a limiting process is that, as you make a certain parameter of a problem smaller and smaller (or larger and larger), then an answer that you get, which depends on the parameter, will tend towards a certain value. This idea underlies a lot of the assumptions that we make when we model the real world:

  • All materials are deformable to some extent, but we may assume that a rod or surface is perfectly rigid if the stiffness of its material is sufficiently high, so that any deformations are negligible, compared to the lengthscales of interest in the problem. When considering a block sliding down a slope, we do not care about deformations on a nanometre scale.
  • We might ignore air resistance when studying the motion of a projectile. This approximation works provided that the projectile’s inertia and its weight (due to gravity) dominate the effects of air resistance. Air resistance is proportional to surface area, so the dominance occurs in the limit of the projectile’s surface area being very small.
  • While the wave-like properties of light are more fundamental (they are directly governed by Maxwell’s equations), its particle-like properties come from the limit of the wavelength (about 700nm for red light) being smaller than other lengthscales of interest. This is why a laser beam acts much like a particle when shone across a room (it is localised, and can reflect cleanly off surfaces), while its wavelike properties may be seen in a double-slit experiment involving narrow slits.

These approximations are quite simple to understand and apply, and they give good agreement with empirical results. However, things are not always so straightforward, especially when there are two limiting processes which have opposite effects.

Analysis gives us the tools to study how these competing limiting processes interact with each other. I won’t discuss their resolution in detail, but I will give a few examples below.

Division by zero

Consider the function f(x) = a/x where a is some fixed positive real number. This function is defined for x \neq 0, and it is positive whenever x is positive. When x is positive and very small, f(x) is very large, since x appears in the denominator. In fact, as x gets closer and closer to 0, f(x) will become unbounded. We therefore say that the limit of f(x) as $x$ approaches 0 is infinite: we can write this as

 \displaystyle\lim_{x\rightarrow 0} f(x) = \infty.

Note that we can talk about this limit even though f(0) itself is not actually defined. We talk about the limit of x going to 0, rather than actually setting x = 0, by using the arrow symbol.

But now consider the function g(x) = (x^2 + x) / x , again defined for x \neq 0, since when x = 0 the denominator is zero and division by zero is undefined. This function is also positive whenever x is positive, but it behaves very differently under the limit x \rightarrow 0. In this limit, the numerator also goes to zero. Now 0/0 is undefined, but note that

 g(x) = \displaystyle\frac{x (x + 1)}{x} = x + 1.

Therefore,

 \displaystyle\lim_{x\rightarrow 0} g(x) = 1.

We can also say that g(x) converges to 1 as x tends to 0.

So why not simply define 0/0 as 1? This might seem sensible given that x/x = 1 for all nonzero values of x, but have a think about the similar function h(x) = (x^2 + 3x)/ x. Again, both numerator and denominator go to zero as x\rightarrow 0, but the limit of the fraction is 3, not 1.

Infinite sums

You may be familiar with Zeno’s paradoxes. In order to run 100 metres, one must first run 50 metres, then 25 metres, then 12.5 metres, and so on. That is, one must complete infinitely many tasks, each one of which require a non-zero amount of time. How can this be possible?

The metaphysical implications of this and related paradoxes are still being debated some 2,400 years since Zeno. Mathematically, one has to argue that

 \displaystyle \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \cdots = 1.

While the argument given above, which was known to the ancients, is a good illustration for why the geometric series above should equal 1, it doesn’t help us understand other sums, which can behave in rather different ways. For example, the ancients also knew that the harmonic series

 \displaystyle 1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \frac{1}{5} + \cdots

is divergent: that is, it cannot be said to take on any particular value. This is because the answer that we get after adding n terms together keeps growing as we increase n. However, the Basel series

 \displaystyle 1 + \frac{1}{2^2} + \frac{1}{3^2} + \frac{1}{4^2} + \frac{1}{5^2}

turns out to be convergent; the series takes the value \pi^2 / 6.

In each of these series, as we add on more and more terms, there are two limiting processes at work. The number of terms is tending towards infinity, but the size of each term is tending towards zero. Whether or not a series converges depends on the rate at which the terms converge to zero. The terms in the Basel series drop off faster than the ones in the harmonic series (since the denominators have squares), allowing the Basel series to converge.

The precise conditions needed for a series to converge, as well as methods for calculating or estimating the values of convergent series, are studied in analysis. An understanding of series is useful not just for pure mathematics, but also in many fields of theoretical physics, including quantum mechanics.

‘Ghosts of departed quantities’

Newton discovered calculus in the late 1600s, giving us the notion of an integral as the accumulated change to a quantity, given the small changes made over time. For example, a particle may move continuously with a certain time-dependent velocity. At each instant, the current velocity causes some displacement; the net displacement of the particle is the accumulation of these little displacements.

The traditional definition of the integral is as follows (although Newton himself would not have used the following language or notation). The area under the curve y = f(x) between x = a and x = b is to be denoted as I = \int_a^b f(x) \mathrm{d} x . To evaluate this area, one divides it into a number of rectangles of width \Delta x, plus small ‘errors’ for the bits of the area that do not fit into the rectangles:

 I = f(a) \Delta x + f(a+\Delta x) \Delta x + \cdots + f(b-\Delta x) \Delta x + \text{errors}
 I = \displaystyle \sum f(x_i) \Delta x + \text{errors}

One then makes the rectangles narrower and narrower, taking more and more of them. It is then argued that, as \Delta x gets smaller, the errors will vanish, and the sum approaches the value of the error. The symbol \mathrm{d}x represents an ‘infinitesimal narrowness’; the integral symbol \int is an elongated ‘S’, showing the link between integration and the summation.

Despite giving correct mathematical answers, Newton’s theories were attacked, both on its metaphysical foundations (as with Zeno’s paradox), and on the idea of the errors becoming small. For any nonzero width \Delta x, these errors are present. Then surely, when the width is taken to be infinitesimal but nonzero, then the errors would also be nonzero and infinitesimal?

It turns out that terms such as ‘infinitesimal’ are difficult to use: in the system of real numbers, there is no such thing as an ‘infinitesimal’ number. A more rigorous definition of the integral was given by Riemann almost 200 years after Newton’s calculus. This definition will be studied in a first course in analysis.

Stability theory

Often it is not possible to solve a problem exactly, and it is necessary to make approximations that hold in certain limits. As the mechanical examples showed above, such approximations can be very useful in simplifying a problem, stripping away unnecessary details. However, it is sometimes important to consider what the effect of those details may be – we should be sure that any such effects are negligible compared to the effect that we care about.

Stability theory is the study of how the answer to a problem changes when a small change, or perturbation, is made to the problem. A ball on a sloped surface tends to roll downwards. If the ball is sitting at the bottom of a valley, then a displacement to the ball may cause it to move slightly uphill, but then gravity will act to restore the ball to its original place. This system is said to be stable. On the other hand, a ball at the top of a hill will, if knocked, roll away from that hill and not return; this is an example of an instability.

While this is a trivial example, more complicated instabilities are responsible for many types of pattern formation, such as billows in the clouds, the formation of sand dunes from a seemingly flat desert, and the formation of spots or stripes on leopards and tigers. In biological systems, departures from homeostasis or from a population equilibrium may be mathematically modelled as instabilities. It is important to understand these instabilities, as they can lead respectively to disease or extinction.

Analysis provides the language needed to make these above statements precise, as well as methods for determining whether a system of differential equations (for example governing a mechanical or biological system) has stable or unstable behaviour. A particularly important subfield is chaos theory, which considers equations that are highly sensitive to changes to their initial conditions, such as weather systems.

Summary

Infinite or limiting processes, such as series and integrals, can have behaviours that seem mysterious. A first course in analysis will define concepts such as convergence, differentiation and (Riemann) integration in a rigorous way. However, before that is possible, one must look at more basic facts about the system of real numbers, and indeed give a proper definition of this system: it is not enough to think of real numbers simply as extended strings of decimals.

Having placed all this on a firm footing, it is then possible to answer more fundamental questions about calculus, such as ‘Why do differential equations have solutions?’ or ‘Why does the Newton–Raphson method work?’. It also allows us to use approximations more carefully, and stability theory helps us to decide whether the error introduced by an approximation will dramatically change the results of calculations.

Mathematical hairstyling: Braid groups

At a recent morning coffee meeting, I was idly playing with my hair when this was noticed by a couple of other people. This led to a discussion of different braiding styles and, because we were mathematicians, a discussion of braid theory. I continued to spend a lot of time reading about it. (Nerd-sniped.)

I didn’t know much about braid theory (or indeed group theory) before, but it turned out to be a very rich subject. I remember being introduced to group theory for the first time and finding it very hard to visualise abstract objects like generators, commutators, conjugates or normal subgroups. Braid groups may be a very useful way of introducing these: they can be demonstrated very visually and hands-on.

Continue reading Mathematical hairstyling: Braid groups