## Tipping the scales: some of the mathematics behind music

If you look at a piano keyboard, you’ll see that the keys are arranged in a repeating pattern, and those of you who play an instrument will recognise the unit of repetition as the octave. A chromatic scale gets from one end of the octave to another in twelve steps ($C\ C\sharp\ D\ E\flat\ E\ F\ F\sharp\ G\ G\sharp\ A\ B\flat\ B\ C$); a major scale does the journey in seven (e.g. $C\ D\ E\ F\ G\ A\ B\ C$), while a minor scale uses a slightly different set of seven steps (e.g. $C\ D\ E\flat\ F\ G\ A\flat\ B\flat\ C$). (For those not familiar with musical notation: $C\sharp$ (“C sharp”) and $D\flat$ (“D flat”) are both names for the “black” note that lies between C and D on the keyboard; the same applies for the other sharp and flat notes. Contrary to appearances, the black notes and the white notes are not fundamentally different: the distinction is mostly to make them easy to locate on the keyboard.)

A piano keyboard with the names of the white notes labelled.

But what is an octave, and is there any reason to divide it into twelve (or seven) steps? And what do we mean by “dividing” it anyway? The answers to these questions will take us into some interesting mathematical territory…

### The mathematical problem

A little bit of physics to get us started. A musical note, physically speaking, is a roughly periodic variation of air pressure. By Fourier analysis, we know that such a variation can be broken down into sinusoidal components with different frequencies. The frequency that dominates this decomposition is called the pitch of the note, and it is the main thing that we perceive about the note. (The human brain seems to be rather good at Fourier analysis!) For example, the A above middle C is generally taken to have a frequency of 440 Hz. For our purposes, then, a note is simply a real number corresponding to the frequency; the higher this number, the higher-pitched the note.

Examples of sound waves (left) and their corresponding frequency spectra (right). Click on the image for the source page and .wav files of these sounds.

Very few people can accurately identify the absolute pitch of a note (this ability is called perfect pitch), but almost everyone can detect relationships between notes, whether they’re played consecutively or concurrently. The most fundamental of these relationships is when the pitch of one note is precisely twice that of another: in this case, it is said to be an octave higher. The octave relationship is so fundamental that when we discuss scales we will give the same name to two notes which are separated by one or more octaves — for example, the lower and upper Cs in the first paragraph.

It is possible to write tunes that do nothing except bounce up and down between notes an integer number of octaves apart, but these tunes are not terribly exciting. To write more interesting music, we need to choose some additional frequencies that lie between our original ones: in other words, we need to build a scale. (We will refer to these original notes, such as C in the first paragraph, as the tonics of our scale.) Ideally we would like these additional notes to have the following properties. First, there should be plenty of ways to choose two notes from our scale that sound good together. Second, our scale should allow us to work our way up the octave from tonic to tonic in a regular manner. It turns out that we have to make compromises between these two properties, and this is where the mathematical interest comes in.

On the whole, notes that sound good together seem to have frequencies that are rational multiples of each other. After the octave, the most “natural” combination of notes occurs when the frequencies are in the ratio 3:2, in which case the notes are said to be a (perfect) fifth apart. (We will refer to the ratio of frequencies between two notes as the interval between them; and yes, the terminology is unhelpful!) This is — approximately, at least — the interval between two adjacent strings on a violin, or between C and the G above it on the piano keyboard. Another ratio that seems to be easy on the ear is the major third, where the frequency ratio is 5:4.

Musical instruments are probably as old as human civilisation, but the first known attempt to construct a scale on mathematical principles seems to be due to the ancient Greeks, and is often attributed to our old friend Pythagoras. The approach is to start with the tonic (which we will take to have frequency 1 in some suitable units) and then to increase the pitch by a fifth each time. This would of course rapidly overshoot the octave, so when a new note is formed we divide its pitch by two as many times as we need to find ourselves back in the original octave. (Remember that, as far as the ear’s sense of harmony is concerned, two notes an octave apart are essentially the same note.) So, denoting the ith note we construct by $f_i$, the construction proceeds as follows.

• Starting pitch: $f_0 = 1$.
• Go up by a fifth: $f_1 = \dfrac{3}{2}$. This is less than 2, so we’re fine.
• Go up by another fifth: $f_2 = \dfrac{3}{2}\times\dfrac{3}{2} = \dfrac{9}{4}$. This is greater than 2, so we reduce it by an octave to get $f_2 = \dfrac{9}{8}$.
• Go up by another fifth: $f_3 = \dfrac{9}{8}\times\dfrac{3}{2} = \dfrac{27}{16} < 2$.
• Go up by another fifth, and again reduce by an octave: $f_4 = \left(\dfrac{27}{16}\times\dfrac{3}{2}\right)\times\dfrac{1}{2} = \dfrac{81}{64}$.
• Go up by another fifth: $f_5 = \dfrac{81}{64}\times\dfrac{3}{2} = \dfrac{243}{128} < 2$.

The seven notes we have constructed form a scale known as the Lydian mode. Arranged in increasing order, they are $f_0$, $f_2$, $f_4$, $f_6$, $f_1$, $f_3$, $f_5$, and they correspond — again approximately — to the notes F G A B C D E on the piano.

It would be nice to repeat this process and complete the octave, so we try going one step further along the sequence: $f_7 = \dfrac{729}{512}\times\dfrac{3}{2} = \dfrac{2187}{1024} \approx 2.14$. If we want to complete the octave so that $f_7 = 2$, then the interval between $f_6$ and $f_7$ has to be rather smaller than a perfect fifth. This difference is easily audible, and is an example of a wolf interval — a dissonant noise that sounds to the imaginative like the howling of a wolf.

In fact, it’s easy to see that we can never go up an integer number of octaves by taking successive perfect fifths, because there are no natural numbers $m$ and $n$ such that $(3/2)^m = 2^n$. This leaves us with a dilemma. Either we relax our requirement that the perfect fifth should be built into our scale, and we try to find some regular sequence of intervals that allows us to approximate the perfect fifth; or we insist on keeping the perfect fifth and try to find some way of constructing the octave from that and other reasonably harmonious intervals. These approaches lead to rather different definitions of the scale…

### Equal temperament

The usual solution that has been used in “classical” music for roughly the last 200 years, and that is also widely used in genres such as jazz that have roots in the classical tradition, is called equal temperament. The idea here is to split the octave into $k$ equal intervals, so that the ratio of frequencies between two successive notes is $r$, and we have $r^k = 2$. We would like $k$ not to be too large, since the smaller the intervals between notes the harder they are for the human ear to distinguish; but we would also like to be able to approximate the most harmonious intervals — in particular, the perfect fifth — closely enough that only a very sensitive ear will detect the approximation.

In other words, we want to find whole numbers $k$ and $l$ such that $r = 2^k$ exactly, and $r^l \approx 3/2$ to a fair degree of accuracy, i.e. $2^l \approx (3/2)^k$. This will correspond to fitting a total of $k$ (approximate) fifths into $l$ octaves, for some whole numbers $k$ and $l$. How can we do this?

Following a paper by Schechter (1980), we rearrange the relation between $l$ and $k$ as

$(3/2)^k \approx 2^l \iff l/k \approx \rho \equiv \log(3/2)/\log(2).$

The better the approximation to $\rho$, the more closely we will be able to approximate the perfect fifth within our scale. Now, $\rho$ is an irrational number so we will never be able to express it exactly as a fraction, but there are, obviously, many ways we could approximate it. How do we decide which of these ways are most appropriate?

To do so, we need to define more tightly what we mean by the most appropriate approximation.

Definition: $l/k$ is a best approximation to $\rho$ if, for any $l'$ and $k'$ such that $\dfrac{l'}{k'} \neq \dfrac{l}{k}$ and $|l'-k'\rho| \leq |l-k\rho|$, it follows that $k' > k$.

In plain language, $l/k$ is a best approximation to $\rho$ if any closer approximation must have a denominator larger than $k$. (For example, the rational number $25/8 = 3.125$ is not a bad approximation to $\pi$, but it is not a best approximation because there is another rational number, $22/7 \approx 3.143$, which is a closer approximation and which has a smaller denominator.) In the musical context, we don’t want to split the octave into $k'$ intervals if we can approximate the perfect fifth better by using a smaller number $k$ of intervals — this would certainly be a waste of effort!

Fortunately, there is a branch of mathematics that will enable us to develop best approximations in this sense, not just to $\rho$ but to any irrational number: the theory of continued fractions. A digression follows, but it’s worth it.

### Continued fractions

We don’t have space here to go into the theory of continued fractions in any detail — which is a shame, because it is beautiful stuff. The interested reader is referred to the book by Khinchin (1997): the first two chapters contain all the results we will need, and need no more than first-year mathematical knowledge to follow. For now, we will merely give some (slightly informal) definitions and then a result without proof.

A simple continued fraction is an expression of the form

$a_0 + \dfrac{1}{a_1 + \dfrac{1}{a_2 + \dfrac{1}{a_3 + \dots}}},$

which we will denote by $[ a_0; a_1, a_2, a_3, \dots ]$.

We will take all the terms $a_i$ for $i > 0$ to be positive integers, and we will not worry for now about the properties required to ensure that this infinite expression converges meaningfully.

The first important result for us is that any irrational number can be represented by a simple continued fraction in exactly one way. It is easy to construct the elements $a_i$ systematically. Given the irrational number $\alpha$ that we want to write as a continued fraction, we have $a_0 = \left\lfloor\alpha\right\rfloor$, where the brackets denote the usual floor function (the largest integer that is smaller than or equal to $\alpha$). We can now write $\alpha = a_0 + \frac{1}{r_1}$, where $r_1 > 1$ and can be written in continued fraction form as $r_1 = [a_1; a_2, a_3, \dots]$. But now $a_1 = \left\lfloor r_1 \right\rfloor$, and we just keep on repeating the process to find as many terms as we like.

The convergents of a simple continued fraction are the numbers we get by terminating the expansion after a finite number of terms. We write the ith convergent as $p_i/q_i$ where $p_i$ and $q_i$ are both positive integers. It is easy to show by induction that for $i \geq 2$, we can calculate the convergents by the recurrence relations $p_i = a_ip_{i-1}+p_{i-2}$ and $q_i = a_iq_{i-1} + q_{i-2}$.

For example, take the simplest possible continued fraction, in which $a_i = 1$ for all values of i. This can be written as

$\phi = [1; 1, 1, 1, \dots] = 1 + \dfrac{1}{1 + \dfrac{1}{1 + \dfrac{1}{1 + \dots}}},$

and its first few convergents are

$\dfrac{p_0}{q_0} = \dfrac{1}{1}, \quad\dfrac{p_1}{q_1} = 1 + \dfrac{1}{1} = \dfrac{2}{1}, \quad \dfrac{p_2}{q_2} = 1 + \dfrac{1}{1 + \dfrac{1}{1}} = \dfrac{3}{2},$

$\dfrac{p_3}{q_3} = 1 + \dfrac{1}{1 + \dfrac{1}{1 + \dfrac{1}{1}}} = \dfrac{5}{3}, \dots$

(In fact, this particular continued fraction represents the golden ratio $(1+\sqrt{5})/2 \approx 1.618$. It has some other lovely properties, such as the fact that the terms $p_i$ and $q_i$ are members of the Fibonacci sequence, but we haven’t time to explore them here…)

Now we can state the crucial result that we need to construct our musical scales.

Theorem. Every best approximation, in the sense defined above, is a convergent of the continued fraction.

This is Khinchin’s Theorem 16: the proof is about a page long and remarkably easy to follow. (Or you could try proving it for yourself!)

Back to music…

### Convergents and splitting up the octave

Recall that the number we were trying to approximate was $\rho = \log(3/2)/\log(2)$. It is easy to calculate the first few elements in the continued fraction by the method described above: we obtain $a_0 = 0$, $a_1 = 1$, $a_2 = 1$, $a_3 = 2$, $a_4 = 2$, $a_5 = 3$, $a_6 = 1$, and so on. The corresponding convergents are:

$\dfrac{p_0}{q_0} = \dfrac{0}{1},\quad \dfrac{p_1}{q_1} = \dfrac{1}{1},\quad \dfrac{p_2}{q_2} = \dfrac{1}{2},\quad \dfrac{p_3}{q_3} = \dfrac{3}{5},\quad \dfrac{p_4}{q_4} = \dfrac{7}{12},\quad \dfrac{p_5}{q_5} = \dfrac{24}{41},\quad \dfrac{p_6}{q_6} = \dfrac{31}{53}, \dots$

What does this mean musically? Note first that the fourth convergent has denominator 12, so we split the octave into 12 equal intervals. This corresponds to the modern “equal-tempered” scale: starting from C and working up one note (one semitone; an increase of the frequency by $r = 2^{1/12}$) at a time, we have $C\ C\sharp\ D\ E\flat\ E\ F\ F\sharp\ \mathbf{G}\ G\sharp\ A\ B\flat\ B\ C$: twelve steps, and the seventh step takes us to G and our approximation to a perfect fifth: $2^{7/12} \approx 1.4983$. (The approximation is out by about $1.7\times 10^{-3}$, which is apparently within the sensitivity of some people’s hearing.)

The next convergent gives us $k = q_5 = 41$ and a 41-note division of the octave; the error in the fifth is then about $4\times 10^{-4}$. The next again gives us a 53-note octave and an error of about $5.7\times10^{-5}$. Remarkably, the first proposal for a 41- or 53-note octave seems to have come from a Chinese scholar named King Fang, in about 40 BC, and it’s been rediscovered several times since. (A 53-note musical instrument called the Enharmonic Harmonium, designed by R. H. M Bosanquet, was exhibited in London in 1876, and can apparently still be found in the Science Museum. The designer’s own description is also available online.) The next convergent, by the way, would lead to a 306-note scale. As far as I know this has never been seriously proposed by anybody.

### Pros and cons of the equal-tempered scale

Although the search for an optimal scale has interested scholars for millennia, it is only in the last couple of centuries that it has really standardised. This is partly because many instruments allow the musician to “bend” the pitch of notes slightly away from their theoretical value in order to improve the harmony: on a fretless stringed instrument such as the violin this can be done simply by shifting the fingers slightly, and on many wind instruments it can be done by careful (or careless) use of the embouchure or a mute. The ancient Chinese interest in equal-tempering may be connected with their use of bells and lithophones (the stone equivalent of xylophones), neither of which permits this kind of cheating, and European interest seems to have developed with the rise of keyboard instruments such as the harpsichord and the piano which could be tuned to any scale system but then had to remain in that system throughout the piece.

The most famous musician to have taken an interest in tempering was Johann Sebastian Bach, who was a keyboard virtuoso as well as a composer and who acted as a consultant for the pioneering piano builder Gottfried Silbermann. (Unlike his pianos, Silbermann’s organs were built using a distinctly unequal-tempered scale, which Bach detested. According to Bosanquet (1876), pp. 29–31, Bach determined that $A\flat$ was the key in which Silbermann’s organs sounded worst, and although he never wrote any organ compositions in this key, there is a story that “when Silbermann came to listen, Bach would strike up on $A\flat$ as soon as he saw him, saying ‘you tune the organ as you please, and I play as I please'”.)

Although it is unlikely that Bach employed the equal-tempered scale, his famous 48 Preludes and Fugues were written for “das wohltemperierte Klavier”, the “well-tempered” instrument, and they show off a key feature of equal temperament and related systems: all keys are equal. In a perfectly equal-tempered scale, one can transpose a melody by shifting every note up or down by the same amount, and because all the intervals — the ratios between notes — are preserved, it will sound “the same, but higher” (or lower). This is approximately true in any of the other “well-tempered” systems of Bach’s day, but it is not true of systems such as the Lydian mode of the Pythagoreans. (To demonstrate this to yourself, pick any tune that can be played on the white notes of a piano, remembering that this mode corresponds roughly to the white notes from F to F. Play the tune through once, then shift your fingers one key to the right and try again. The result should be fairly awful.)

This ability to transpose, which also makes it possible to use complicated harmonies and dissonances with impunity, is what has gradually established the equal-tempered scale as standard. It has its disadvantages as well, though. Because every single note is slightly “out of tune” (i.e. slightly displaced from being in a simple ratio to the tonic), it is difficult to tune instruments accurately to such a scale, or indeed to design instruments accurately to produce it — a capella groups such as barbershop quartets have a tendency to slip away from equal temperament and instead to latch on to simple ratios of tones. It also loses a trick that earlier composers used, which was to choose a particular key, with its distinctive set of intervals, to give a composition a particular mood. Finally, in some musical traditions there are alternative, well-developed conventions which have so far fought off the challenge of equal temperament — an example from close to home is the Highland bagpipe, and traditional Indian music also uses its own scale. (A rock-guitarist friend of mine also assures me that he employs complicated tunings because “every note’s slightly out of tune, which adds to the charm of the instrument”.)

We’ll now look briefly at one of these sets of conventions, just tuning.

### Just tuning

A just tuning is a scale that is constructed, so far as possible, from primary intervals, i.e. those that are most appealing to the human ear: the octave, the perfect fifth (the frequency ratio 3:2) and the major third (the frequency ratio 5:4). The idea of a just tuning is that every note is related to at least one other note by an exact primary interval.

A paper by Silver (1971) provides a nice way of visualising just tunings. We set out the pitches on a grid: a node of the grid corresponds to a pitch; a horizontal step to the right means an increase of a perfect fifth, and a vertical step up means an increase of a major third. To construct a just tuning on twelve notes, we select a set of adjacent grid points that includes all the twelve notes of the chromatic scale. For example, the figure below illustrates the Pythagorean tuning in which we simply increase by a perfect fifth every time (decreasing by an octave whenever appropriate).

Grid diagram for the Pythagorean tuning

Note that this choice defines what we mean by C, D and so on relative to what we pick as the tonic of the scale. If we were to select a set from the above figure that included two notes labelled as (say) F, these two notes would not be an integer number of octaves apart (because. as we’ve noted, there is no way to obtain a power of 2 by multiplying powers of 3/2 and 5/4). Thus, if we tried to continue this scale by increasing by another perfect fifth, we would not land back on C. Instead, we have an imperfect interval from F to the C above it — sometimes, as we’ve seen, called a “wolf interval”, and sometimes called a Procrustean fifth after the charming Greek blacksmith who cut off his guests’ feet to ensure they fitted properly in the bed.

We can now start constructing more complicated just tunings. The simplest variation of the Pythagorean tuning is to include a single major third among the intervals. We can’t do this arbitrarily: we have to leave off either the leftmost or the rightmost of our single row of eleven notes, and this means we have either to add an interval of a major third above the note seven fifths above the tonic, or to add a major third below the note three fifths above the tonic. Taking the first choice, we have

Once we have started with this, we can build up the rest of the eleven possible “two-row” tunings systematically. For example, the next couple are

and

and the process works its way along systematically until we reach

From this point onward, the bottom note of the scale is E rather than C, and we work our way along building up the upper row and reducing the lower row until we reach

and we’ve worked our way through all 11 essentially different two-row arrangements. (Obviously we could also shift the whole pattern up or down by any number of perfect fifths or major thirds and obtain an essentially equivalent pattern, which we will not regard as a distinct tuning: it corresponds to a different key within that tuning.)

According to Silver (1971), there are 43 three-row patterns, 55 four-row patterns and 8 five-row patterns. You might like to try constructing and counting these systematically for yourself. You might also like to visit the Wikipedia page on just intonation, which has some audio files that compare equal- and justly-tempered tunings, and see whether you can detect the difference!

Graphical representation of various temperings. Horizontal lines represent the 12-note equal-tempering. From left to right, the columns are: Lydian; Pythagorean; two "two-row" just temperings (numbers 6 and 11 in the construction described above); a three-row just tempering (figure 4 from Silver 1971); a four-row just tempering (figure 5 from Silver 1971). In each case, the tonic note C is taken to be the lowest note in the construction. The lower plot uses a vertical log scale.

In case you can’t hear the differences between these different tunings, the figures on the left illustrate them graphically. The y-position of each note corresponds to its frequency relative to the bottom note (the tonic). Note that, as they’ve been constructed to, all these tunings agree closely on the perfect fifth (G), but there is more variation for other notes — most noiceably $A\sharp$ (two steps below top C) and F (one step above E). One can imagine the dissonance that would result if two instruments tuned to the fifth and sixth systems in these figures attempted to play in harmony — in fact, anyone who’s been in an amateur choir or a school music class probably doesn’t need to imagine it. It’s also noticeable that in the Pythagorean system every note except C and G is slightly sharper (slightly higher) than in the equal-tempered system, while most other just tunings contain a mixture of sharper and flatter notes.

### Some final thoughts

This brief tour of musical scales has, with any luck, thrown up more questions than it answers. Not all these lie within the province of mathematics. For example, why should the human brain find certain simple mathematical relationships between sounds appealing, and find others unappealing? Why should we even be able to identify the pitch of a note among the jumble of frequencies that make it up? Mathematically, you might like to think about whether it’s possible to compute the numbers of possible just-tunings without listing them all (and once you’ve done it for 12 note systems, try for 41- or 53-note systems…) Then there are other aspects of music in which one can immediately spot a mathematical element: rhythms and chord progressions, to name just two.

Leibniz famously wrote in a letter to Goldbach that “Musica est exercitium arithmeticae occultum nescientis se numerare animi”– “music is the pleasure the soul experiences from counting without being aware that it is counting”. The remarkable thing is just how much more than just counting is going on when we listen to music, and just how unaware we are of most of it!

(DP)