COMPSCI 120:
Mathematics for Computer Science

Algorithms and Functions

Exercise 4.1. Two processes that you can use to multiply two nonnegative integers a,ba,b together are listed below:

Algorithm 4.1.

  1. Define a new number prodprod , and initialize it (i.e. set it equal) to 0.
  2. If a=0\mathbf{a} = 0 , stop, and return the number prodprod .
  3. Otherwise, add bb to prod, and subtract 1 from aa . Then go to 2.

An example run of Algorithm 4.1 when a=3,b=12a=3, b=12 :

stepabprod131202321223124230362(halt!)\begin{array}{c||c|c|c} step & a & b & prod \\ \hline 1 & 3 & 12 & 0 \\ 2 & & & \\ 3 & 2 & & 12 \\ 2 & & & \\ 3 & 1 & & 24 \\ 2 & & & \\ 3 & 0 & & \fbox{36} \\ 2 & (halt!) & &\\ \end{array}

Algorithm 4.2.

  1. Define a new number prodprod , and initialize it (i.e. set it equal) to 0.
  2. If a=0\mathbf{a} = 0 , stop, and return the number prodprod .
  3. Otherwise, if aa is odd subtract 11 from aa and set prod=prod+bprod = prod +b .
  4. Divide aa by 2, and multiply bb by 2.
  5. Go to step 2.

Similarly, an example run of Algorithm 4.2 when a=3,b=12a=3, b=12 :

stepabprod131202321241242303640482(halt!)\begin{array}{c||c|c|c} step & a & b & prod \\ \hline 1 & 3 & 12 & 0 \\ 2 & & & \\ 3 & 2 & & 12 \\ 4 & 1 & 24 & \\ 2 & & & \\ 3 & 0 & & \fbox{36} \\ 4 & 0 & 48 & \\ 2 & (halt!) & &\\ \end{array}

In general, which of these is the faster way to multiply two numbers? Why?

Functions in General

In your high-school mathematics classes, you’ve likely seen functions described as things like ”f(x)=2x+3f(x) = 2x+3 ” or ”g(x)=max(x,y)g(x) = \max(x,y) .” When we’re writing code, however, we don’t do this! That is: in most programming languages, you can’t just type in expressions like the ones before and trust that the computer will understand what you mean. Instead, you’ll often write something like the following:

/* function returning the max between two numbers */
int max(int num1, int num2) 
{
    /* local variable declaration */
    int result;
 
    if (num1 > num2)
        result = num1;
    else
        result = num2;
 
    return result; 
}

Notice how in this example we didn’t just get to define the rules for our function: we also had to specify the kind of inputs the function should expect, and also the type of outputs the function will generate! On one hand, this creates more work for us at the start: we can’t just tell people our rules, and we’ll often find ourselves having to go back and edit our functions as we learn what sorts of inputs we actually want to generate.

On the other hand, though, this lets us describe a much broader class of functions than what we could do before! Under our old high-school definition for a function, we just assumed that functions took in numbers and returned numbers. With the above idea, though, we can have functions take in and output anything: multiple numbers, arrays, text files, whatever!

On the third(?) hand, enforcing certain restrictions on the types of inputs and outputs to a function is also a much more secure way to think about functions in computer science. If you’re writing code in a real-life situation, you should always expect malicious or just clueless users to try to input the worst possible data to your functions. As a result, you can’t just have a function defined by the rule f(x)=1xf(x) = \frac{1}{x} and trust that your users out of the goodness of their hearts will never input 0 just to see what happens! They’ll do it immediately (as well as lots of other horrifying inputs, like 🐥, 10\frac{1}{0} , 10.9,1-0.\overline{9}, \ldots ) just to see what happens.

(from https://xkcd.com/327/)

This is why many programming languages enforce type systems: i.e. rules around their functions that specifically force you to declare the kinds of inputs and outputs ahead of time, like we’ve done above! Doing this is an important part of writing bug-free and secure code.

As this is a computer science class, we should have a definition of function that matches this concept. We provide this here:

Definition 4.1.Link copied Formally, a function consists of three parts:
  • A collection AA of possible inputs. We call this the domain of our function.
  • A collection BB describing the type of outputs that our function will generate. We call this the codomain of our function.
  • A rule ff that takes in inputs from AA and generates outputs in BB .

Furthermore, in order for this all to be a function, we need it to satisfy the following property:

For every potential input aa from AA , there should be exactly one bb in BB such that f(a)=bf(a) = b .

In other words, we never have a value aa in AA for which f(a)f(a) is undefined,as that would cause our programs to crash! As well, we also do not allow for a value aAa \in A to generate “multiple” outputs; i.e. we want to be able to rely on f(a)f(a) not changing on us without warning, if we keep aa the same.

Example 4.1. Typically, to define a function we’ll write something like “Consider the function f:ZQf: \mathbb{Z} \to \mathbb{Q} , defined by the rule f(n)=1n2+1f(n) = \frac{1}{n^2+1} . This definition tells you three things: what the domain is (the set the arrow starts from, which is the integers in this case), what the codomain is (the set the arrow points to, which is the rational numbers in this case), and the rule used to define ff .

Example 4.2. f:ZZf: \mathbb{Z} \to \mathbb{Z} defined by the rule ”f(x)=yf(x) = y if and only if x=y2x = y^2 ” is not a function. There are many reasons for this:

Example 4.3. Let AA be the set of all students at the University of Auckland, and BB be the set of all integers. We can define a function f:ABf: A\to B by defining f(a)f(a) to be equal to that student’s current ID number. This is a function, because each student has a unique ID number!

However, if we tried to define a function g:BAg: B \to A by the rule g(b)=g(b) = the student whose ID number is bb , we would fail! This is because there are many integers that are not ID numbers: for example, no student has ID number 1-1 , or 1010010^{100} .

While the objects above have had relatively “nice” rules, not all functions can be described quite so cleanly! Consider the following example:

Let AA be the collection {🇨🇦, 🐈, 🐥, 🇳🇿, 🕸️} of the Canada, cat, bird, New Zealand and web emojis, and let BB be the collection {🐙, ☢️, 🐝, 👨🏾} of the octopus, radiation, bee, and man emojis. Consider f:AB,f: A \to B, defined by the rules:

f(🇨🇦)=🐙f(🇨🇦) = 🐙 f(🐈)=🐝f(🐈) = 🐝 f(🐥)=🐙f(🐥) = 🐙 f(🇳🇿)=🐝f(🇳🇿) = 🐝 f(f( 🕸️)=👨🏾) = 👨🏾

This is a function, because we have given an output for every possible input, and also never sends an input to multiple different outputs. It’s not a function with a simple algebraic rule like ”x2+2x1x^2+2x-1 ”, but that’s OK!

A useful way to visualize functions defined in this piece-by-piece fashion is with a diagram: draw the domain at left, the codomain at right, and draw an arrow from each xx in the domain to its corresponding element f(x)f(x) in the codomain.

Domain
🇨🇦
🐥
🐈
🇳🇿
🕸️
ff
Codomain
🐙
☢️
🐝
👨🏾

Alongside the domain/codomain ideas above, another useful idea here is the concept of range:

Definition 4.2.Link copied Take any function f:ABf: A \to B . We define the range of ff as the set of all values in the codomain that our function actually sends values in the domain to. In other words, the range of ff is the following set:
 
{bB  there is some aA such that f(a)=b}\{ b \in B ~|~ \textrm{there is some }a \in A\textrm{ such that }f(a) = b\}

Note that the range is usually different to the codomain! In the examples we studied earlier, we saw the following:

Intuitively: we think of the codomain as letting us know what type of outputs we should expect. That is: in both mathematics or computer science, often just knowing that the output is “an integer” or “a binary string of length at most 20” or “a Unicode character” is enough for our compiler to work. As such, it’s often much faster to just describe the type of possible outputs, instead of laboriously finding out exactly what outputs are possible!

However, in some cases we will want to know precisely what values we get as outputs, and in that situation we will want to find the actual outputs: i.e. the range. To illustrate this, let’s consider a few examples:

Example 4.5. Consider the function f:ZZf: \mathbb{Z} \to \mathbb{Z} given by the rule f(x)=2x+2f(x) = 2|x| + 2 . This function has range equal to all even numbers that are at least 2.

To see why, simply notice that for any integer xx , x|x| is the “absolute value” of xx : i.e. it’s xx if we remove its sign so that it’s always nonnegative. As a result, 2x2|x| is always a nonnegative even number, and this 2x+22|x| + 2 must be a nonnegative even number that’s at least 2.

That tells us that the only possible outputs are even numbers that are at least 2! However, we still don’t know that all of those outputs are ones that we actually can get as outputs.

To see why this is true: take any even number that is at least 2. By definition, we can write this as 2k2k , for some k1k \geq 1 . Rewrite this as 2(k1)+22(k-1) + 2 ; if we do so, then we can see that f(k1)=2k1+2=2(k1)+2f(k-1) = 2|k-1| + 2 = 2(k-1) + 2 (because if k1,k \geq 1, then k10k-1 \geq 0 and so k1=k1|k-1| = k-1 .) As a result, we’ve shown that f(k1)=2kf(k-1) = 2k for any k1k \geq 1 , and thus that we can actually get any even number that’s at least 2 as an output.

Example 4.6. Consider the emoji function from Example 4.4. If we look at the diagram we drew before, we can see that our function generates three possible outputs: 🐙, 🐝 and 👨🏾. Therefore, the collection of these three emojis is our range!

Domain
🇨🇦
🐥
🐈
🇳🇿
🕸️
ff
Codomain
🐙
☢️
🐝
👨🏾

Example 4.7. Let AA be the collection of all pairs of words in the English language, and BB be the two values {\{ true, false}\} . Define the function f:ABf: A \to B by saying that f(w1,w2)=f(w_1, w_2) = true if the words w1,w2w_1, w_2 rhyme, and false otherwise. For example, f(f( cat, bat)=) = true, while f(f( cat, cataclysm)=)= is false.

The range of this function is {\{ true, false}\} , i.e. the same as its codomain! It is possible for the range and codomain to agree. (If this happens, we call such a function a surjective function. We’re not going to focus on these functions here, but you’ll see more about them in courses like Compsci 225 and Maths 120/130!)

Example 4.8. Let R\mathbb{R} denote the set of all real numbers (i.e. all numbers regardless of whether they’re rational or irrational; alternately, anything you can describe with a decimal expansion.) Define the function f:RRf: \mathbb{R} \to \mathbb{R} by the rule f(x)=2xf(x) = 2^x .

This function has range equal to the set of all positive numbers! This takes more math to see than we currently have: again, take things like Maths 130 to see the “why” behind this. However, if you draw a graph of 2x2^x you’ll see that the outputs (i.e. yy -values) range over all of the possible positive numbers, as claimed.

To close this section, we give a useful bit of notation for talking about functions defined in terms of other functions: function composition.

Fact 4.2. Given any two functions f:BC,g:ABf: B \to C, g: A \to B , we can combine these functions via function composition: that is, we can define the function fg:ACf \circ g: A\to C , defined by the rule fg(x)=f(g(x))f \circ g (x) = f(g(x)) . We pronounce the small open circle symbol \circ as “composed with.”

Example 4.9.

Handy!

Notice that in the definition above, we required that the domain of ff was the codomain of gg . That is: if we wanted to study fg(x)=f(g(x))f \circ g (x) = f(g(x)) , we needed to ensure that every output of gg is a valid input to ff .

This makes sense! If you tried to compose functions whose domains and codomains did not match up in this fashion, you’d get nonsense / crashes when the inner function gg returns an output at which the outer function is undefined. For example:

Example 4.10.

Algorithms

In the previous section, we came up with a “general” concept for function that we claimed would be better for our needs in computer science, as it would let us think of things like

/* function returning the max between two numbers */
int max(int num1, int num2) 
{
    /* local variable declaration */
    int result;
 
    if (num1 > num2)
        result = num1;
    else
        result = num2;
 
    return result; 
}

as a function. However, most of the examples we studied in this chapter didn’t feel too much like the code above: they were either fairly mathematical in nature (i.e. f(n)=1n2+1f(n) = \frac{1}{n^2+1} ) or word-problem-oriented (i.e. the function that sent UoA students to their ID numbers.)

To fix this issue, this section will focus on the idea of an algorithm: that is, a way of describing in general a step-by-step problem-solving process that we can easily turn into code.

We start by describing what an algorithm is:

Definition 4.3.Link copied An algorithm is a precise and unambiguous set of instructions.

Typically, people think of algorithms as a set of instructions for solving some problem; when they do so, they typically have some restrictions in mind for the kinds of instructions they consider to be valid. For example, consider the following algorithm for proving the Riemann hypothesis:

  1. Prove the Riemann hypothesis.
  2. Rejoice!

On one hand, this is a fairly precise and unambiguous set of instructions: step 1 has us come up with a proof of the Riemann hypothesis, and step 2 tells us to rejoice.

On the other hand: this is not a terribly useful algorithm. In particular, its steps are in some sense “too big” to be of any use: they reduce the problem of proving the Riemann hypothesis to … proving the Riemann hypothesis. Typically, we’ll want to limit the steps in our algorithms to simple, mechanically-reproducible steps: i.e. operations that a computer could easily perform, or operations that a person could do with relatively minimal training.

In practice, the definition of “simple” depends on the context in which you are creating your algorithm. Consider the algorithm for making delicious pancakes, given at below.

An algorithm for pancakes!

  1. Acquire and measure out the following ingredients:
  • 2 cups of buttermilk, or 1.5 cups milk + .5 cups yoghurt whisked together.
  • 2 cups of flour.
  • 2 tablespoons of sugar.
  • 2 teaspoons of baking powder.
  • 1/2 teaspoon of baking soda.
  • 1/2 teaspoon of salt.
  • 1 large egg.
  • 3 tablespoons butter.
  • Additional butter.
  • Maple syrup.
  1. Whisk the flour, sugar, baking powder, baking soda, and salt in a medium bowl.
  2. Melt the 3 tablespoons of butter.
  3. Whisk the egg and melted butter into the milk until combined.
  4. Pour the milk mixture into the dry ingredients, and whisk until just combined (a few lumps should remain.)
  5. Heat a nonstick griddle/frypan on medium heat until hot; grease with a teaspoon or two of butter.
  6. Pour 1/41/4 cup of batter onto the skillet. Repeat in various disjoint places until there is no spare room on the skillet. Leave gaps of 1cm between pancakes.
  7. Cook until large bubbles form and the edges set (i.e.\ turn a slightly darker color and are no longer liquid,) about 2 minutes.
  8. Using a spatula, flip pancakes, and cook a little less than 2 minutes longer, until golden brown.
  9. If there is still unused batter, go to 5; else, top pancakes with maple syrup and butter, and eat.

This is a good recipe. Use it!

This algorithm’s notion of “simple” is someone who is (1) able to measure out quantities of various foods, and (2) knows the meaning of various culinary operations like “whisk” and “flip.” If we wanted, we could make an algorithm that includes additional steps that define “whisking” and “flipping”. That is: at each step where we told someone to whisk the flour, we could instead have given them the following set of instructions:

(a). Grab a whisk. If you do not know what a whisk is, go to this Wikipedia article and grab the closest thing to a whisk that you can find. A fork will work if it is all that you can find. (b). Insert the whisk into the object you are whisking. (c). Move the whisk around through the object you are whisking in counterclockwise circles of varying diameter, in such a fashion to mix together the contents of the whisked object.

In this sense, we can extend our earlier algorithm to reflect a different notion of “simple,” where we no longer assume that our person knows how to whisk things. It still describes the same sets of steps, and in this sense is still the “same” algorithm — it just has more detail now!

This concept of “adding” or “removing” detail from an algorithm isn’t something that will always work; some algorithms will simply demand steps that cannot be implemented on some systems. For example, no matter how many times you type “sudo apt-get 2 cups of flour,” your laptop isn’t going to be able to implement our above pancake algorithm. As well, there may be times where a step that was previously considered “simple” becomes hideously complex on the system you’re trying to implement it on!

We’re not going to worry too much about the precise definition of “simple” in this class, because we’re not writing any code here (and so our notion of “simple” isn’t one we can precisely nail down) --- these are the details we’ll leave for your more coding-intensive courses.

Instead, let’s just look at a few examples of algorithms! We’ve already seen one in this class, when we defined the %\% operation:

Algorithm 4.3. This algorithm takes in any two integers a,na, n , where n>0n > 0 . It then calculates a%na \% n as follows:

Second, we can turn Claim 1.6 into an algorithm for how to tell if a number is prime:

Algorithm 4.4. This is an algorithm that takes in a positive integer nn , and determines whether or not nn is prime. It proceeds as follows:

This is a step-by-step process that tells us if a number is a prime or not! Notice that the algorithm itself didn’t need to contain a proof of Claim 1.6; it just has to give us instructions for how to complete a task, and not justify why those instructions will work. It is good form to provide such a justification where possible, as it will help others understand your code! However, it is worth noting that such a justification is separate from the algorithm itself: it is quite possible (and indeed, all too easy) to write something that works even though you don’t necessarily understand why.

For a third example, let’s consider an algorithm to sort a list:

Algorithm 4.5. The following algorithm, SelectionSort(L)\texttt{SelectionSort}(L) , takes in a list L=(l1,l2,ln)L =(l_1, l_2, \ldots l_n) of nn numbers and orders it from least to greatest. For example, SelectionSort(1,7,1,0)\texttt{SelectionSort}(1,7,1,0) is (0,1,1,7)(0,1,1,7) . It does this by using the following algorithm:

  1. If LL contains at most one number, LL is trivially sorted! In this situation, stop.
  2. Otherwise, LL contains at least two numbers. Let L=(l1,l2,ln)L = (l_1, l_2, \ldots l_n) , where n2n \geq 2 . Define a pair of values valmin\texttt{val}_\texttt{min} , locmin\texttt{loc}_\texttt{min} , and set them equal to the value and location of the first element in our list.
  3. One by one, starting with the second entry in our list and working our way through our entire list LL , compare the value stored in valmin\texttt{val}_\texttt{min} to the current value lkl_k that we’re examining.
  1. At the end of this process, valmin\texttt{val}_\texttt{min} and locmin\texttt{loc}_\texttt{min} describes the value and the location of the smallest element in our list. Swap the first value in our list with llocminl_{\texttt{loc}_\texttt{min}} in our list: this makes the first value in our list the smallest element in our list.
  2. To finish, set the first element of our list aside and run SelectionSort\texttt{SelectionSort} on the rest of our list.

Back in our first chapter, to understand our two previous algorithms 4.3 and 4.4 we started by running these algorithms on a few example inputs! In general, this is a good tactic to use when studying algorithms; actually plugging in some concrete inputs can make othewise-obscure instructions much simpler to understand.

To do so here, let’s run our algorithm on the list (1,7,1,0)(1,7,1,0) , following each step as written:

liststeplocminvalmincurrent kcurrent lk(1,7,1,0)1(1,7,1,0)211(1,7,1,0)327(1,7,1,0)331(1,7,1,0)340(1,7,1,0)3(a)40(0,7,1,1)4(0,7,1,1)5(0,7,1,1)1(0,7,1,1)227(0,7,1,1)331\begin{array}{c|c|c|c|c|c} \textrm{list} & \textrm{step} & \texttt{loc}_{\texttt{min}} & \texttt{val}_{\texttt{min}} & \textrm{current }k & \textrm{current }l_k\\ \hline (1,7,1,0) & 1 & & & &\\ (1,7,1,0) & 2 & 1 & 1 & &\\ (1,7,1,0) & 3 & & & 2 & 7\\ (1,7,1,0) & 3 & & & 3 & 1\\ (1,7,1,0) & 3 & & & 4 & 0\\ (1,7,1,0) & 3(a) & 4 & 0 & & \\ (0,7,1,1) & 4 & & & & \\ (0,\boxed{7,1,1}) & 5 & & & & \\ (0,\boxed{7,1,1}) & 1 & & & & \\ (0,\boxed{7,1,1}) & 2 & 2 & 7 & & \\ (0,\boxed{7,1,1}) & 3 & & & 3 & 1\\ \end{array} liststeplocminvalmincurrent kcurrent lk(0,7,1,1)3(a)31(0,7,1,1)341(0,1,7,1)4(0,1,7,1)5(0,1,7,1)1(0,1,7,1)237(0,1,7,1)341(0,1,7,1)3(a)41(0,1,1,7)4(0,1,1,7)5(0,1,1,7)1\begin{array}{c|c|c|c|c|c} \textrm{list} & \textrm{step} & \texttt{loc}_{\texttt{min}} & \texttt{val}_{\texttt{min}} & \textrm{current }k & \textrm{current }l_k\\ \hline (0,\boxed{7,1,1}) & 3(a) & 3 & 1 & & \\ (0,\boxed{7,1,1}) & 3 & & & 4 & 1\\ (0,\boxed{1,7,1}) & 4 & & & & \\ (0,1,\boxed{7,1}) & 5 & & & & \\ (0,1,\boxed{7,1}) & 1 & & & & \\ (0,1,\boxed{7,1}) & 2 & 3 & 7 & & \\ (0,1,\boxed{7,1}) & 3 & & & 4 & 1\\ (0,1,\boxed{7,1}) & 3(a) & 4 & 1 & & \\ (0,1,\boxed{1,7}) & 4 & & & & \\ (0,1,1,\boxed{7}) & 5 & & & & \\ (0,1,1,\boxed{7}) & 1 & & & & \\ \end{array}

Here, we use the 7,1,1\boxed{7,1,1} , 7,1\boxed{7,1} and 7\boxed{7} boxes to visualize the “rest of our list” part of step 5 in our algorithm.

This worked! Moreover, doing this by hand can help us see an argument for why this algorithm works:

Claim 4.1.Link copied SelectionSort\texttt{SelectionSort} (i.e. algorithm 4.5) works.

Proof. In general, there are three things we need to check to show that a given algorithm works:

So, in total, what does this mean? Well: by the above, we know that l1l2l3lnl_1 \leq l_2 \leq l_3 \leq \ldots \leq l_n , as by definition each element is smaller than all of the ones that come afterwards. In other words, this list is sorted! \square

Recursion, Composition, and Algorithms

Algorithm 4.5 had an interesting element to its structure: in its fifth step, it contained a reference to itself! This sort of thing might feel like circular reasoning: how can you define an object in terms of itself?

However, if you think about it for a bit, this sort of thing is entirely natural: many tasks and processes in life consist of “self-referential” that are defined by self-reference! We call such definitions recursive definitions, and give a few examples here:

Example 4.11. Fractals, and the many plants and living things that have fractal-like patterns built into theirselves, are examples of recursively-defined objects! For example, consider the following recursive process:

  1. Start by drawing any shape S0S_0 . For example, here’s a very stylized drawing of a cluster of fern spores:

If you know some linear algebra, the explicit formulas we are using here are the following: for each point (x,y)(x,y) in SS , draw the four points

  • [0.850.040.040.85]\begin{bmatrix} 0.85 & 0.04 \\ -0.04 & 0.85 \end{bmatrix} [xy]\begin{bmatrix} x \\ y \end{bmatrix} + [01.6]\begin{bmatrix} 0 \\ 1.6\end{bmatrix} ,

  • [0.850.040.040.85]\begin{bmatrix} 0.85 & 0.04 \\ -0.04 & 0.85 \end{bmatrix} [xy]+[01.6]\begin{bmatrix} x\\y\end{bmatrix} + \begin{bmatrix}0\\1.6\end{bmatrix} ,

  • [0.20.260.230.22]\begin{bmatrix} 0.2 & -0.26 \\ 0.23 & 0.22 \end{bmatrix} [xy]+[01.6]\begin{bmatrix} x\\y\end{bmatrix} + \begin{bmatrix}0\\1.6\end{bmatrix} ,

  • [0.150.280.260.24]\begin{bmatrix} -0.15 & 0.28 \\ 0.26 & 0.24 \end{bmatrix} [xy]+[00.44]\begin{bmatrix}x\\y\end{bmatrix} + \begin{bmatrix}0\\0.44\end{bmatrix} ,

  • [0000.16]\begin{bmatrix} 0 & 0 \\ 0 & 0.16 \end{bmatrix} [xy]+[00].\begin{bmatrix}x\\y\end{bmatrix} + \begin{bmatrix}0\\0\end{bmatrix}.

This is the precise math-y way of describing the operations we’re doing to these rectangles!

  1. Now, given any shape SS , we define T(S)T(S) as the shape made by making four copies of SS and manipulating them as follows: if SS is a shape contained within the gray rectangle at left, we make four copies of SS appropriately scaled/stretched/etc to match the four rectangles at right. (It can be hard to see the black rectangle, because it’s so squished: it’s the stem-looking bit at the bottom-middle.)

    For example, T(S0)T(S_0) , i.e. TT applied to our fern spore shape, is the following:

  2. Using our “seed” shape S0S_0 and our function TT , we then recursively define SnS_n as T(Sn1)T(S_{n-1}) for every positive integer nn . That is: S1=T(S0),S2=T(S1)S_1 = T(S_0), S_2 = T(S_1) , etc. This is a recursive definition because we’re defining our shapes in terms of previous shapes! Using the language of function composition, we can express this all at once by writing Sn=TTTn times(S0)S_n = \underbrace{T \circ T \circ \ldots \circ T}_{n \textrm{ times}}(S_0) .

  3. Now, notice what these shapes look like as we draw several of them in a row:

    Our seed grows into a fern!

This is not a biology class, and there are many open questions about precisely how plants use DNA to grow from a seed. However, the idea that many plants can be formed by taking a simple instruction (given one copy of a thing, split it into appropriately stretched/placed copies of that thing) and repeatedly applying it is one that should seem reasonable, given the number of places you see it in the world!

Example 4.12. On the less beautiful but more practical side, recursion is baked into many fundamental mathematical operations! For example, think about how you’d calculate 111311 \cdot 13 . By definition, because multiplication is just repeated addition, you could calculate this as follows:

1113=13+13+13+13+13+13+13+13+13+13+1311 times=143.11 \cdot 13 = \underbrace{13 + 13 + 13 + 13 + 13 +13 +13 + 13 + 13 + 13 + 13}_{11\textrm{ times}} = 143.

Now, however, suppose that someone asked you to calculate 121312 \cdot 13 . While you could use the process above to find this product, you could also shortcut this process by noting that

1213=(1+11)13=13+1113=13+143=156.12 \cdot 13 = (1+11)\cdot 13 = 13 + 11\cdot 13 = 13 + 143 = 156.

This sort of trick is essentially recursion! That is: if you want, you could define the product nkn \cdot k recursively for any nonnegative integer nn by the following two-step process:

The second bullet point is a recursive definition, because we defined multiplication in terms of itself! In other words, when we say that 1213=13+111312 \cdot 13 = 13 + 11 \cdot 13 , we’re really thinking of multiplication recursively: we’re not trying to calculate the whole thing out all at once, but instead are trying to just relate our problem to a previously-worked example.

Exponentiation does a similar thing! Because exponentiation is just repeated multiplication, we know that

210=222222222210 times=1024.2^{10} = \underbrace{2\cdot2\cdot2\cdot2\cdot2\cdot2\cdot2\cdot2\cdot2\cdot2}_{10 \textrm{ times}} = 1024.

Similarly, though, we can define exponentiation recursively by saying that for any nonnegative integer nn and nonzero number aa , that

In other words, with this idea, we can just say that

211=2210=21024=2048,2^{11} = 2 \cdot 2^{10} = 2 \cdot 1024 = 2048,

instead of calculating the entire thing from scratch!

These ideas can be useful if you’re working with code where you’re calculating a bunch of fairly similar products/exponents/etc. With recursion, you can store some precalculated values and just do a few extra steps of working rather than doing the whole thing out by scratch each time. Efficiency!

In classes like Compsci 220/320, you’ll study the idea of efficiency in depth, and come up with more sophisticated ideas than this one! In most practical real-life situations there are better ways to implement multiplication and exponentiation than this recursive idea; however, it can be useful in some places, and the general principle of “storing commonly-calculated values and extrapolating other values from those recursively” is one that does come up in lots of places!

Recursion also comes up in essentially every dynamical system that models the world around us! For instance, consider the following population modelling problem:

Example 4.13. Suppose that we have a petri dish in which we’re growing a population of amoebae, each of which can be in two possible states (small and large).

Amoebas grow as follows: if an amoeba is small at some time tt , then at time t+1t+1 it becomes large, by eating food around it. If an amoeba is large at some time tt , then at time t+1t+1 it splits into one large amoeba and one small amoeba.

Suppose our petri dish starts out with one small amoeba at time t=1t=1 . How many amoebae in total will be in this dish at time t=nt=n , for any natural number nn ?

Answer. To help find an answer, let’s make a chart of our amoeba populations over the first six time steps:

123456Large011235Small101123Total112358\begin{array}{c|cccccc} & 1 & 2 & 3 & 4 & 5 & 6\\\hline \textrm{Large} & 0 & 1 & 1 & 2 & 3 & 5\\ \textrm{Small} & 1 & 0 & 1 & 1 & 2 & 3\\ \textrm{Total} & 1 & 1 & 2 & 3 & 5 & 8\\ \end{array}

This chart lets us make the following observations:

  1. The number of large amoebae at time nn is precisely the total number of amoebae at time n1n-1 . This is because every amoeba at time n1n-1 either grows into a large amoeba, or already was a large amoeba!
  2. The number of small amoebae at time nn is the number of large amoebae at time n1n-1 . This is because the only source of small amoebae are the large amoebae from the earlier step when they split!
  3. By combining 1 and 2 together, we can observe that the number of small amoebae at time nn is the total number of amoebae at time n2n-2 !
  4. Consequently, because we can count the total number of amoebae by adding the large amoebae to the small amoebae, we can conclude that the total number of amoebae at time nn is the total number of amoebae at time n1n-1 , plus the total number of amoebae at time n2n-2 . In symbols,
A(n)=A(n1)+A(n2).A(n) = A(n-1) + A(n-2).

We call relations like the one above recurrence relations, and will study them in greater depth when we get to induction as a proof method.

For now, though, notice that they are remarkably useful objects! By generalizing the model above (i.e. subtracting an3a_{n-3} to allow for old age killing off amoebas, or having a can2-ca_n^2 term to denote that as the population grows too large, predation or starvation will cause the population to die off, etc.) one can basically model an arbitrarily-complicated population over the long term.

These are particularly cool because they actually accurately model predator/prey relations in real life! See Isle Royale, amongst other examples. For more details on these sorts of population-modelling processes, see the logistic map, the Lotka-Volterra equations, and most of the applied mathematics major here at Auckland!

It bears noting that the amoeba recurrence relation is not the first recurrence relation you’ve seen! If you go back to Algorithm 4.5, we proved that

SelectionSortSteps(n)=3(n+1)+SelectionSortSteps(n1).\texttt{SelectionSortSteps}(n) = 3(n+1) + \texttt{SelectionSortSteps}(n-1).

This is another recurrence relation: it describes the maximum number of operations needed to sort a list of size nn in terms of the maximum number of operations needed to sort a list of size n1n-1 .

While recurrence relations are nice, they can be a little annoying to work with directly. For example, suppose that someone asked us what SelectionSortSteps(12)\texttt{SelectionSortSteps}(12) is. Because we don’t have a non-recursive formula, the only thing we could do here is just keep recursively applying our formula, to get

SelectionSortSteps(12)\texttt{SelectionSortSteps}(12)
=3(12+1)+SelectionSortSteps(11)= 3(12+1) + \texttt{SelectionSortSteps}(11)
=3(12+1)+3(11+1)+SelectionSortSteps(10)= 3(12+1) + 3(11+1) + \texttt{SelectionSortSteps}(10)
=3(12+1)+3(11+1)+3(10+1)+SelectionSortSteps(9)= 3(12+1) + 3(11+1) + 3(10+1) + \texttt{SelectionSortSteps}(9)

= …

=3(12+1)+3(11+1)+3(10+1)++3(3+1)+3(2+1)+SelectionSortSteps(1)= 3(12+1) + 3(11+1) + 3(10+1) + \ldots + 3(3+1) + 3(2+1) + \texttt{SelectionSortSteps}(1)
=3(12+1)+3(11+1)+3(10+1)++3(3+1)+3(2+1)+1=265.= 3(12+1) + 3(11+1) + 3(10+1) + \ldots + 3(3+1) + 3(2+1) + 1 = \boxed{265}.

This is … kinda tedious. It would be nice if we had a direct formula for this: something like SelectionSortSteps(n)=2n\texttt{SelectionSortSteps}(n) =2^n or n24n+17n^2-4n + 17 that we could just plug nn into and get an answer.

At this point in time, we don’t have enough mathematics to directly find such a “closed” form. However, we can still sometimes find an answer by just guessing. To be a bit more specific: if we use our definition, we can calculate the following values for SelectionSortSteps(n)\texttt{SelectionSortSteps}(n) :

n1234567SelectionSortSteps(n)11022375576100\begin{array}{c||c|c|c|c|c|c|c} n & 1 & 2 & 3 & 4 & 5 & 6 & 7\\\hline \texttt{SelectionSortSteps}(n) & 1 & 10 & 22 & 37& 55 & 76 & 100\\ \end{array}

To get these skills, either take Maths 120 and study linear algebra / eigenvalues, or take Compsci 220/320, or take Maths 326 and learn about generating functions!

If you plug this into the Online Encyclopedia of Integer Sequences, it will give you the following guess by comparing it to all of the sequences it knows:

This turns out to be correct!

Claim 4.2.Link copied For every positive integer nn , we have SelectionSortSteps(n)=3n2+9n102\texttt{SelectionSortSteps}(n) = \dfrac{3n^2+9n-10}{2} .

We don’t have the techniques to prove this just yet. If you would like to see a proof, though, skip ahead to the induction section of our proofs chapter! We’ll tackle this problem there (along with another recurrence relation), in Section 7.8.

Runtime and Algorithms

In the above few sections, we studied SelectionSortSteps(n)\texttt{SelectionSortSteps}(n) and analyzed the maximum number of operations it needs to sort a list of length nn . In general, this sort of run-time analysis of an algorithm - i.e. the number of elementary operations needed for that algorithm to run - is a very useful thing to be able to do!

To give a brief example of why this is useful, consider another, less well-known algorithm that we can use to sort a list:

Algorithm 4.6. The following algorithm, BogoSort (L)(L) , takes in a list L=(l1,l2,ln)L =(l_1, l_2, \ldots l_n) of nn numbers and orders it from least to greatest. It does this by using the following algorithm:

  1. One by one, starting with the first entry in our list and working our way through our list, compare the values stored in consecutive elements in our list.

    If these elements never decrease - i.e. if when we perform these comparisons, we see that l1l2lnl_1 \leq l_2 \leq \ldots \leq l_n - then our list is already sorted! Stop.

  2. Otherwise, our list is not already sorted. In this case, randomly shuffle the elements of our list around, and loop back to step 1.

Here’s a sample run of this algorithm on the list (1,7,1,0)(1,7,1,0) , where I’ve used Random.org to shuffle our list when needed by step 3:

listiteration countstepsorted(1,7,1,0)11no(1,1,0,7)2(1,1,0,7)21no(7,0,1,1)2(7,0,1,1)31no(1,0,1,7)2(1,0,1,7)41no(7,1,1,0)2(7,1,1,0)51no(7,0,1,1)2\begin{array}{c|c|c|c} \textrm{list} & \textrm{iteration count} & \textrm{step} & \textrm{sorted}\\\hline (1,7,1,0) & 1 & 1 & no \\ (1,1,0,7) & & 2 & \\ (1,1,0,7) & 2 & 1 & no \\ (7,0,1,1) & & 2 & \\ (7,0,1,1) & 3 & 1 & no \\ (1,0,1,7) & & 2 & \\ (1,0,1,7) & 4 & 1 & no \\ (7,1,1,0) & & 2 & \\ (7,1,1,0) & 5 & 1 & no \\ (7,0,1,1) & & 2 & \\ \end{array} listiteration countstepsorted(7,0,1,1)61no(1,1,0,7)2(1,1,0,7)71no(1,1,7,0)2(1,1,7,0)81no(0,1,7,1)2(0,1,7,1)91no(7,1,0,1)2(7,1,0,1)101no(1,7,0,1)2\begin{array}{c|c|c|c} \textrm{list} & \textrm{iteration count} & \textrm{step} & \textrm{sorted}\\\hline (7,0,1,1) & 6 & 1 & no \\ (1,1,0,7) & & 2 & \\ (1,1,0,7) & 7 & 1 & no \\ (1,1,7,0) & & 2 & \\ (1,1,7,0) & 8 & 1 & no \\ (0,1,7,1) & & 2 & \\ (0,1,7,1) & 9 & 1 & no \\ (7,1,0,1) & & 2 & \\ (7,1,0,1) & 10 & 1 & no \\ (1,7,0,1) & & 2 & \\ \end{array}

listiteration countstepsorted(1,7,0,1)111no(1,7,1,0)2(1,7,1,0)121no(1,7,1,0)2(1,7,1,0)131no(0,1,7,1)2(0,1,7,1)141no(0,1,1,7)2(0,1,1,7)151yes!\begin{array}{c|c|c|c} \textrm{list} & \textrm{iteration count} & \textrm{step} & \textrm{sorted}\\\hline (1,7,0,1) & 11 & 1 & no \\ (1,7,1,0) & & 2 & \\ (1,7,1,0) & 12 & 1 & no \\ (1,7,1,0) & & 2 & \\ (1,7,1,0) & 13 & 1 & no \\ (0,1,7,1) & & 2 & \\ (0,1,7,1) & 14 & 1 & no \\ (0,1,1,7) & & 2 & \\ (0,1,1,7) & 15 & 1 & yes! \\ \end{array}

This … is not great. If you used BogoSort to sort a deck of cards, your process would look like the following:

By studying the running time of this algorithm, we can make “not great” into something rigorous:

Claim 4.3.Link copied The worst-case running time for BogoSort to sort any list containing more than one element is \infty .

Proof. If LL is a list containing nn different elements, there are n!n! many different ways to order LL ’s elements (this is ordered choice without repetition, where we think of ordering our list as “choosing” elements one-by-one to put in order.) If all of these elements are different, then there is exactly one way for us to put these elements in order.

Therefore, on each iteration BogoSort has a 1n!\frac{1}{n!} chance of successfully sorting our list, and therefore has a n!1n!\frac{n!-1}{n!} chance of failing to sort our list. For any n>1n >1 , n!1n! -1 is nonzero, and so the chance that our algorithm fails on any given iteration is nonzero.

Therefore, in the worst-case scenario, it is possible for our algorithm to just fail on each iteration, and thereby this algorithm could have infinite run-time.

\square

In terms of running time, then, we’ve shown that BogoSort has a strictly worse runtime than SelectionSortSteps\texttt{SelectionSortSteps} , as >3n2+9n22102\infty > \dfrac{3n^2+9n^{2^2}-10}{2} . Success!

This sort of comparison was particularly easy to perform, as one of the two things we were comparing was \infty . However, this comparison process can get trickier if we examine more interesting algorithms. Let’s consider a third sorting algorithm:

Algorithm 4.7. The following algorithm, MergeSort(L)(L) , takes in a list L=(l1,l2,,ln)L =(l_1, l_2, \ldots, l_n) of nn numbers and orders it from least to greatest. It does this by using the following algorithm:

  1. If LL contains at most one number, LL is trivially sorted! In this situation, stop.

  2. Otherwise, LL contains at least two numbers. In this case,

    (a) Split LL in half into two lists L1,L2L_1, L_2 .

    (b) Apply MergeSort to each of L1,L2L_1, L_2 to sort them.

  3. Now, we “merge” these two sorted lists:

    (a) Create a big list with nn entries in it, all of which are initially blank.

    (b)Compare the first element in L1L_1 to the first element in L2L_2 . If L1,L2L_1, L_2 are both sorted, these first elements are the smallest elements in L1,L2L_1, L_2 .

    (c) Therefore, the smaller of those two first elements is the smallest element in our entire list. Take it, remove it from the list it came from, and put it in the first blank location in our big list.

    (d) Repeat (b)+(c) until our big list is full!

As before, to better understand this algorithm, let’s run it on an example list like (1,7,1,0,2)(1,7,1,0,2) :

original LstepL1L2new list(1,7,1,0,2)12(a)(1,7)(1,0,2)2(b)(1,7)(0,1,2)3(a)(_,_,_,_,_)3(b+c)(1,7)(1,2)(0,_,_,_,_)3(b+c)(7)(1,2)(0,1,_,_,_)3(b+c)(7)(2)(0,1,1,_,_)3(b+c)(7)()(0,1,1,2,_)3(b+c),(d)()()(0,1,1,2,7)\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new list} \\\hline (1,7,1,0,2) & 1 & & &\\ & 2(a) & \fcolorbox{red}{transparent}{(1,7)} & \fcolorbox{blue}{transparent}{(1,0,2)} &\\ & 2(b) & \fcolorbox{red}{transparent}{(1,7)} & \fcolorbox{blue}{transparent}{(0,1,2)} &\\ & 3(a) & & & (\text{\textunderscore},\text{\textunderscore},\text{\textunderscore},\text{\textunderscore},\text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{red}{transparent}{(1,7)} & \fcolorbox{blue}{transparent}{(1,2)} & (0,\text{\textunderscore},\text{\textunderscore},\text{\textunderscore},\text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{red}{transparent}{(7)} & \fcolorbox{blue}{transparent}{(1,2)} & (0,1,\text{\textunderscore},\text{\textunderscore},\text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{red}{transparent}{(7)} & \fcolorbox{blue}{transparent}{(2)} & (0,1,1,\text{\textunderscore},\text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{red}{transparent}{(7)} & \fcolorbox{blue}{transparent}{()} & (0,1,1,2,\text{\textunderscore} )\\ & 3(b+c),(d) &\fcolorbox{red}{transparent}{()} & \fcolorbox{blue}{transparent}{()} & (0,1,1,2,7)\\ \end{array}

original LstepL1L2new (1,7)12(a)(1)(7)2(b)(1)(7)3(a)(_,_)3(b+c)()(7)(1,_,)3(b+c),(d)()()(1,7)\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new } \\\hline (1,7) & 1 & & &\\ & 2(a) & \fcolorbox{green}{transparent}{(1)} & \fcolorbox{pink}{transparent}{(7)} &\\ & 2(b) & \fcolorbox{green}{transparent}{(1)} & \fcolorbox{pink}{transparent}{(7)} &\\ & 3(a) & & & (\text{\textunderscore},\text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{green}{transparent}{()} & \fcolorbox{pink}{transparent}{(7)} & (1,\text{\textunderscore}, )\\ & 3(b+c),(d) &\fcolorbox{green}{transparent}{()} & \fcolorbox{pink}{transparent}{()} & (1,7)\\ \end{array} original LstepL1L2new (1,0,2)12(a)(1)(0,2)2(b)(1)(0,2)3(a)(_,_,_)3(b+c)(1)(2)(0,_,_)3(b+c)()(2)(0,1,_,)3(b+c),(d)()()(0,1,2)\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new } \\\hline (1, 0, 2) & 1 & & &\\ & 2(a) & \fcolorbox{teal}{transparent}{(1)} & \fcolorbox{purple}{transparent}{(0,2)} &\\ & 2(b) & \fcolorbox{teal}{transparent}{(1)} & \fcolorbox{purple}{transparent}{(0,2)} &\\ & 3(a) & & & (\text{\textunderscore},\text{\textunderscore}, \text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{teal}{transparent}{(1)} & \fcolorbox{purple}{transparent}{(2)} & (0,\text{\textunderscore},\text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{teal}{transparent}{()} & \fcolorbox{purple}{transparent}{(2)} & (0,1,\text{\textunderscore}, )\\ & 3(b+c),(d) &\fcolorbox{teal}{transparent}{()} & \fcolorbox{purple}{transparent}{()} & (0,1,2)\\ \end{array}

original LstepL1L2new (1)1\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new } \\\hline (1) & 1 & & &\\ \end{array} original LstepL1L2new (1)1\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new } \\\hline (1) & 1 & & &\\ \end{array} original LstepL1L2new (0,2)12(a)(0)(2)2(b)(0)(2)3(a)(_,_)3(b+c)()(2)(0,_,)3(b+c),(d)()()(0,2)\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new } \\\hline (0,2) & 1 & & &\\ & 2(a) & \fcolorbox{orange}{transparent}{(0)} & \fcolorbox{pink}{transparent}{(2)} &\\ & 2(b) & \fcolorbox{orange}{transparent}{(0)} & \fcolorbox{pink}{transparent}{(2)} &\\ & 3(a) & & & (\text{\textunderscore},\text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{orange}{transparent}{()} & \fcolorbox{pink}{transparent}{(2)} & (0,\text{\textunderscore}, )\\ & 3(b+c),(d) &\fcolorbox{orange}{transparent}{()} & \fcolorbox{pink}{transparent}{()} & (0,2)\\ \end{array}
original LstepL1L2new (0)1\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new } \\\hline (0) & 1 & & &\\ \end{array} original LstepL1L2new (2)1\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new } \\\hline (2) & 1 & & &\\ \end{array}

Here, we use the colored boxes to help us visualize the recursive applications of MergeSort\texttt{MergeSort} to smaller and smaller lists.

As before, it is worth taking a moment to explain why this algorithm works:

Claim 4.4.Link copied MergeSort\texttt{MergeSort} (i.e. algorithm 4.7) works.

Proof. We proceed in the same way as when we studied SelectionSort\texttt{SelectionSort} :

\square

As well, by using the same sort of “guess-a-pattern” idea from before, we can refine this to a closed-form solution!

Note that if we use our definition, we can calculate the following values for MergeSortSteps(2k)\texttt{MergeSortSteps}(2^k) . By definition, MergeSortSteps(1)=1\texttt{MergeSortSteps}(1) = 1 , as this algorithm stops when given any list of length 1. Therefore, by repeatedly using Claim 4.5, we can get the following:

which in table form is the following:

k1234567MergeSortSteps(2k)113911128770316633839\begin{array}{c||c|c|c|c|c|c|c} k & 1 & 2 & 3 & 4 & 5 & 6 & 7\\\hline \texttt{MergeSortSteps}(2^k) & 11 & 39 & 111 & 287 & 703 & 1663 & 3839\\ \end{array}

(Note that we’re using 2k2^k as the length of our lists. This is because our result only works for even numbers, so we want something that stays even when we keep dividing it by 2.)

Spotting the pattern here is a pain without more advanced mathematics; even the Online Encyclopedia of Integer Sequences doesn’t recognize it. WolframAlpha, however, does!

Again, this is a claim whose proof will have to wait for Section 7.8. For now, though, let’s take the following as given:

Claim 4.6.Link copied MergeSortSteps(2k)=k2k+2+2k+11\texttt{MergeSortSteps}(2^k) = k \cdot 2^{k+2} + 2^{k+1} - 1 , for every natural number kk .

We can easily extend this to a list whose length is not a power of two by just “rounding its length up to the nearest power of 2” by adding in some blank cells: i.e. if we had a list of length 28, we’d add in 4 blank cells to get a list of length 32.

If we do this, then Claim 4.6 becomes the following:

Observation 4.10.Link copied MergeSortSteps(n)k2k+2+2k+11\texttt{MergeSortSteps}(n) \leq k \cdot 2^{k+2} +2^{k+1} - 1 , where 2k2^k is nn rounded up to the nearest power of 2.

To simplify this a bit to just write things in terms of nn , notice that if 2k2^k is nn rounded up to the nearest power of 2, then k=log2(n).k = \lceil\log_2(n)\rceil. Plug this into Observation 4.10, and you’ll get the following:

Recall that x\lceil x \rceil is ”xx rounded up.”

MergeSortSteps(n)\texttt{MergeSortSteps}(n)
=log2(n)2log2(n)+2+2log2(n)+11= \leq \lceil\log_2(n)\rceil \cdot 2^{\lceil\log_2(n)\rceil + 2} + 2^{\lceil\log_2(n)\rceil+1} - 1
(log2(n)+1)2(log2(n)+1)+2+2(log2(n)+1)+11\leq (\log_2(n) + 1) 2^{(\log_2(n) + 1) + 2} + 2^{(\log_2(n)+ 1)+1} - 1 =(log2(n)+1)2log2(n)23+2log2(n)41= (\log_2(n) + 1) 2^{\log_2(n)}\cdot 2^{3} + 2^{\log_2(n)} \cdot 4 - 1
=8nlog2(n)+8n+4n= 8n\log_2(n) + 8n + 4n -
=8nlog2(n)+12n1.= \boxed{8n\log_2(n) + 12n -1.}

Two useful tricks we’re using in this calculation:

  • xxx \leq \lceil x \rceil . That is: rounding a number up only makes it larger.
  • x<x+1\lceil x \rceil < x+1 . That is: rounding up never increases a number by more than 1.

Nice! This is worth making into its own observation:

Observation 4.11.Link copied MergeSortSteps(n)8nlog2(n)+8log2(n)2n1\texttt{MergeSortSteps}(n) \leq 8n\log_2(n) + 8\log_2(n) - 2n - 1 , for every positive integer n>1n>1 .
\square

Comparing Runtimes: Limits

Now, we have an interesting problem on our hands: between MergeSort\texttt{MergeSort} and SelectionSort\texttt{SelectionSort} , which of these two algorithms is the most efficient?

That is: in terms of functions, which of the following is better?

8nlog2(n)+12n1\boxed{8n\log_2(n) + 12n -1} vs. 3n2+9n102\boxed{\dfrac{3n^2+9n-10}{2}}

To start, we can make a table to compare values:

n12345673n2+9n22102110223755761008nlog2(n)+8log2(n)2n1113973.0111151.8195.0240.2\begin{array}{c||c|c|c|c|c|c|c} n & 1 & 2 & 3 & 4 & 5 & 6 & 7\\\hline \dfrac{3n^2+9n^{2^2}-10}{2} & & 1 & 10 & 22 & 37& 55 & 76 & 100\\ 8n\log_2(n) + 8\log_2(n) - 2n - 1 & 11 & 39 & 73.0 & 111 & 151.8 & 195.0 & 240.2\\ \end{array}

It looks like the MergeSort\texttt{MergeSort} algorithm needs more steps than SelectionSort\texttt{SelectionSort} so far. However, this table only calculated the first few values of nn ; in real life, however, we often find ourselves sorting huge lists! So: in the long run, which should we prefer?

To answer this, we need the idea of a limit:

Definition 4.4.Link copied Given any function ff that is defined on the natural numbers N\mathbb{N} , we say that limnf(n)=L\displaystyle \lim_{n \to \infty} f(n) = L if ” as nn goes to infinity, f(n)f(n) gets closer and closer to LL .” In particular, we say that limnf(n)=+\displaystyle \lim_{n \to \infty} f(n) = +\infty if “as nn goes to infinity, f(n)f(n) also grows without bound.”

If you would like a more rigorous definition here than “gets close”, look into classes like Maths 130 and Maths 250!

This is a tricky concept! To understand it, let’s look at a handful of examples:

Problem 4.1. What is limn121n\displaystyle \lim_{n \to \infty} \frac{1}{2-\frac{1}{n}} ?

Answer. First, notice that as nn goes to infinity, 1n\frac{1}{n} goes to 0, because its denominator is going to infinity while the numerator stays fixed.

Therefore, 21n2-\frac{1}{n} goes to 2, and so limn121n=12.\displaystyle \lim_{n \to \infty} \frac{1}{2-\frac{1}{n}} = \frac{1}{2}.

Problem 4.2. What is limnlog2(n)\displaystyle \lim_{n \to \infty} \log_2(n) ?

Answer. limnlog2(n)=\displaystyle \lim_{n \to \infty} \log_2(n) = \infty . This is because for any positive integer kk , we can make log2(n)k\log_2(n) \geq k by setting nn to be any value 2k\geq 2^k .

In other words, as nn grows, it eventually gets larger than 2k2^k for any fixed kk , and thus log2(n)\log_2(n) itself grows without bound.

Problem 4.3. What is limn1log2(1n)\displaystyle \lim_{n \to \infty} \frac{1}{\log_2\left( \frac{1}{n}\right)} ?

Answer. To find this limit, we simply break our function down into smaller pieces:

In total, then, limn1log2(1n)=0.\displaystyle \lim_{n \to \infty} \frac{1}{\log_2\left( \frac{1}{n}\right)} = 0.

Problem 4.4. What is limnn2+3n4n3+2\displaystyle \lim_{n \to \infty} \frac{n^2 + 3n-4}{n^3+2} ?

Answer. It is tempting to just “plug in infinity” into the fraction above, and say that

“because 2+343+2==1,\frac{\infty^2 + 3\infty-4}{\infty^3+2} = \frac{\infty}{\infty} = 1, our limit is 1”

However, you can’t do manipulations like this with infinity! For example, because 1n=nn2\frac{1}{n} = \frac{n}{n^2} , we have

limnnn2=limn1n=0,\lim_{n \to \infty} \frac{n}{n^2} = \lim_{n \to \infty} \frac{1}{n} = 0,

even though the method above would say that

“because 2==1,\frac{\infty}{\infty^2} = \frac{\infty}{\infty} = 1, our limit is 1.”

The issue here is that there are different growth rates at which various expressions approach infinity: i.e. in our example above, n2n^2 approaches infinity considerably faster than nn , and so the ratio nn2\frac{n}{n^2} approaches 0 even though the numerator and denominator individually approach infinity.

Instead, if we ever have both the numerator and denominator approaching ++\infty , we need to first simplify our fraction to proceed further! In this problem, notice that if we divide both the numerator and the denominator by n3,n^3, the highest power present in either the numerator or denominator, we get the following:

n2+3n4n3+21/n31/n3=1n+3n24n31+2n3.\frac{n^2 + 3n-4}{n^3+2}\cdot \frac{1/n^3}{1/n^3} = \frac{\frac{1}{n} + \frac{3}{n^2} - \frac{4}{n^3}}{1 + \frac{2}{n^3}}.

As noted above, each of 1n,3n2,4n3,2n3\frac{1}{n}, \frac{3}{n^2}, -\frac{4}{n^3}, \frac{2}{n^3} go to 0 as nn goes to infinity, because their denominators are going off to infinity while the numerator is staying fixed.

Therefore, we have

limnn2+3n4n3+2=limn1n+3n24n31+2n3=limn0+0+01+0=0.\displaystyle \lim_{n \to \infty} \frac{n^2 + 3n-4}{n^3+2} =\lim_{n \to \infty} \frac{\frac{1}{n} + \frac{3}{n^2} - \frac{4}{n^3}}{1 + \frac{2}{n^3}} = \lim_{n \to \infty} \frac{0+0+0}{1+0} = \boxed{0}.

With the idea of limits in mind, we can now talk about how to compare functions:

Definition 4.5.Link copied Let f(n)f(n) and g(n)g(n) be two functions which depend on nn . We say that the function f(n)f(n) grows faster than the function g(n)g(n) if limnf(n)g(n)=+.\displaystyle \lim_{n \to \infty} \dfrac{|f(n)|}{|g(n)|} = +\infty.

Intuitively, this definition is saying that for huge values of nn , the ratio of f(n)f(n) to g(n)g(n) goes to infinity: that is, f(n)f(n) is eventually as many times larger than g(n)g(n) as we could want.

We work an example of this idea here, to see how it works in practice:

Example 4.14. We claim that the function f(n)=n2+2n+1f(n)=n^2+2n+1 grows faster than g(n)=5n+5g(n)=5n+5 . To see this, by our definition above, we want to look at the limit limnn2+2n+15n+5\displaystyle \lim_{n \to \infty} \dfrac{n^2+2n+1}{5n+5} .

Notice that we can factor the numerator into (n+1)2(n+1)^2 . Plugging this into our fraction leaves us with (n+1)25(n+1)\dfrac{(n+1)^2}{5(n+1)} , which we can simplify (as in Problem 4.4.) to n+15\dfrac{n+1}{5} .

As nn goes to infinity, n+15\dfrac{n+1}{5} goes to infinity.

This grows without bound: as nn goes to infinity, so does 15n+15\dfrac15n + \dfrac15 ! Therefore we’ve shown that n2+2n+1n^2+2n+1 grows faster than 5n+55n+5 .

To deal with a comparison like the one we’re trying to do between MergeSort\texttt{MergeSort} and SelectionSort\texttt{SelectionSort} , however, we need some more tricks!

Limit Techniques and Heuristics

Without calculus, our techniques for limits are a little hamstrung. As those of you who have seen NCEA L3 calculus, things like L’H^opital’s rule are extremely useful for quickly evaluating limits!

With that said, though, we do have a handful of useful techniques and heuristics that we can use to get by. Here’s an easy (if not very rigorous) technique:

Observation 4.12.Link copied Plugging In Values. Probably the simplest thing you can do, when given a limit, is just physically plug in numbers and figure out where the function is going.

For instance, suppose that we wanted to compare the runtime of our two functions MergeSortSteps\texttt{MergeSortSteps} and SelectionSortSteps\texttt{SelectionSortSteps} . By definition, this means that we want to find the limit

limn3n2+9n1028nlog2(n)+12n1\displaystyle \lim_{n \to \infty} \frac{\frac{3n^2+9n-10}{2}}{8n\log_2(n) + 12n -1}

To do this, we could just plug in various values of nn and see what happens!

xf(x)10.09100.491002.27100016.4010000126.83\begin{array}{c|c} x & f(x) \\\hline 1 & 0.09 &\\ 10 & 0.49 &\\ 100 & 2.27 &\\ 1000 & 16.40 &\\ 10000 & 126.83 &\\ \end{array}

It certainly looks like f(x)f(x) is growing arbitrarily large, so we could quite reasonably believe that the number of steps required to calculate SelectionSort\texttt{SelectionSort} is growing faster than the number of steps needed to calculate MergeSort\texttt{MergeSort} (and thus MergeSort\texttt{MergeSort} is the preferable algorithm.)

However, this method has its limitations:

A second technique, that we used in several of our worked problems earlier, is the following:

Observation 4.13.Link copied Simplifying Fractions. In the special case where your limit has the form limxf(x)g(x)\displaystyle \lim_{x \to \infty} \frac{f(x)}{g(x)} , a useful technique you can try is simpliflication! Basically, take your fraction f(x)g(x)\dfrac{f(x)}{g(x)} , and try to simplify it by factoring the top and bottom and canceling terms.

This often gives you a new function where you no longer have the top and bottom going to zero, which is often much easier to work with.

Example 4.15. If we had the limit limx+x2xx\displaystyle\lim_{x \to +\infty} \frac{x^2-x}{x} , we could factor an xx out of the top and bottom to get limxx1\displaystyle\lim_{x \to \infty} x-1 . This is ++\infty .

For a second example, if we had the limit limxx2xx2\displaystyle\lim_{x \to \infty} \frac{x^2-x}{x^2} , we could factor an x2x^2 out of the top and bottom to get the simplified limit limx+11x1\displaystyle\lim_{x \to +\infty} \frac{1-\frac{1}{x}}{1} . The numerator goes to 1 and the denominator just is 1, so this is 1.

Sometimes, however, we don’t have something that we can easily express as a fraction:

Problem 4.5. What is the limit

limx+1121/x\displaystyle\lim_{x \to +\infty} \frac{1}{1-2^{1/x}}

To approach this limit, we need another technique:

Observation 4.14.Link copied Break It Down. Given a limit limx+f(x)\displaystyle \lim_{x \to +\infty} f(x) , we can often figure out what’s going on with it by breaking our functions down into small pieces, looking at what those individual pieces do as xx goes to ++\infty , and then slowly “zooming back out” to see what the whole function does.

To understand this method, let’s apply it to Problem 4.5.

Answer to Problem 4.5. Understanding the function in our limit all at once is hard! But, notice that for very large values of xx , we know that

Therefore, as xx goes to positive infinity 1121/x\dfrac{1}{1-2^{1/x}} goes to -\infty , and we’ve found our limit by breaking our function down into smaller pieces!

Observation 4.15.Link copied Heuristics. Our last limit idea is the following: with some calculus (take Maths 130!), you can prove that Constants<<Logarithms<<Polynomials<<Exponentials<<Factorials,\footnotesize Constants << Logarithms << Polynomials << Exponentials <<Factorials,
where ”<<<< ” means “grows slower than.”

Within those groups, we sort these expressions by degrees and bases: i.e.

n<<n2<<n3<<n4<<,n << n^2 << n^3 << n^4 << \ldots,

and

2n<<3n<<4n<<5n<<,2^n << 3^n << 4^n << 5^n << \ldots,

and so on / so forth.

Finally, any sum of expressions grows as fast as its largest expression: i.e. a factorial plus a log plus a constant grows at a factorial rate, a n4n^4 plus a log plus a square root grows as fast as n4n^4 grows, etc.

As a brief justification for this, let’s look at a table:

Runtime vs. Input102030405011106sec.1106sec.1106sec.1106sec.1106sec.log2(n)3.3106sec.4.3106sec.4.9106sec.5.3106sec.5.6106sec.n1105sec.2105sec.3105sec.4105sec.5105sec.n21104sec.4104sec.9104sec.1.6103sec.2.5103sec.n31103sec.8103sec..027sec..064sec..125sec.n5.1sec.3.2sec.24.3sec.1.7min.5.2min.2n1103sec.1sec.17.9min.12.7days.35.7yearsn!3.63sec.77146years8.41018years2.61034years9.61050years\begin{array}{c||c|c|c|c|c} Runtime~vs.~Input & 10 & 20 & 30 & 40 & 50\\ \hline 1 & 1 \cdot 10^{-6} sec. & 1 \cdot 10^{-6} sec. & 1 \cdot 10^{-6} sec. & 1 \cdot 10^{-6} sec. & 1 \cdot 10^{-6} sec.\\ \log_2(n) & 3.3 \cdot 10^{-6} sec. & 4.3 \cdot 10^{-6} sec. & 4.9 \cdot 10^{-6} sec.& 5.3 \cdot 10^{-6}sec. & 5.6 \cdot 10^{-6} sec.\\ n & 1 \cdot 10^{-5} sec. & 2 \cdot 10^{-5} sec. & 3 \cdot 10^{-5} sec. & 4 \cdot 10^{-5} sec. & 5 \cdot 10^{-5} sec. \\ n^2 & 1 \cdot 10^{-4} sec. & 4 \cdot 10^{-4} sec. & 9 \cdot 10^{-4} sec. & 1.6 \cdot 10^{-3} sec. & 2.5 \cdot 10^{-3} sec. \\ n^3 & 1 \cdot 10^{-3} sec. & 8 \cdot 10^{-3} sec. & .027 sec. & .064 sec. & .125 sec. \\ n^5 & .1 sec. & 3.2 sec. & 24.3 sec. & 1.7 min. & 5.2 min. \\ 2^n & 1 \cdot 10^{-3} sec. & 1 sec. & 17.9 min. & 12.7 days. & 35.7 years \\ n! & 3.63 sec. & 77146 years & 8.4 \cdot 10^{18} years & 2.6 \cdot 10^{34} years & 9.6 \cdot 10^{50} years\\ \end{array}

Above, we’ve plotted five algorithms with runtimes 1,log2(n),n,n2,n3,n5,2n,n!1, \log_2(n), n, n^2, n^3, n^5, 2^n, n! versus input sizes for nn ranging from 1010 to 5050 , with the assumption that we can perform one step every 10610^{-6} seconds.

In this table, you can make the following observations:

We can use this heuristic to quickly answer our question about which of MergeSortSteps\texttt{MergeSortSteps} and SelectionSortSteps\texttt{SelectionSortSteps} is preferable, without having to plug in any numbers at all!

Claim 4.7.Link copied SelectionSortSteps\texttt{SelectionSortSteps} grows faster than MergeSortSteps\texttt{MergeSortSteps} .

Proof. As noted before, by definition, we want to find the limit

limn3n2+9n1028nlog2(n)+12n1.\displaystyle \lim_{n \to \infty} \frac{\tfrac{3n^2+9n-10}{2}}{8n\log_2(n) + 12n -1}.

By dividing through both sides by nn , this means that we’re looking at the ratio

limn1.5n+4.55n8log2(n)+121n.\displaystyle \lim_{n \to \infty} \frac{1.5n+4.5-\frac{5}{n}}{8\log_2(n) + 12- \frac{1}{n}}.

The top is a linear expression (i.e. polynomial of degree 1), as the largest-growing object in the numerator is the 1.5n1.5n term. The bottom, conversely, is a logarithmic expression, as the fastest-growing object in the denominator is the 8log2(n)8\log_2(n) term.

Linear expressions grow much faster than logs, so our “plug things in” step didn’t lie: SelectionSortSteps\texttt{SelectionSortSteps} is growing faster than MergeSortSteps\texttt{MergeSortSteps} (and thus MergeSort\texttt{MergeSort} is the preferable algorithm, as in the long run it will need to perform less operations to get to the same answer.)

\square

To study one more example of this idea and finish our chapter, let’s use this principle to answer our last exercise:

Answer to 4.1. In this problem, we had two multiplication algorithms and wanted to determine which is best.

Algorithm 4.1. calculated aba \cdot b by basically just calculating b+b++ba times\overbrace{b+b+\ldots +b}^{a\textrm{ times}} . As such, its runtime is pretty much just some constant times aa ; we need aa iterations of this process to calculate the answer here, and we can let our constant be the number of steps we perform in each iteration.

Conversely, Algorithm 4.2. ran by taking aa and repeatedly subtracting 1 from aa and dividing aa by 2 until it was 0 (while doing some other stuff.)

For the purpose of this problem, we don’t have to think about why this process works, though look at practice problem 6 if you’re curious! Instead, if we simply take it on faith that this algorithm does work, it’s easy to calculate how long it takes to complete: it will need as many iterations as it takes to reduce aa to 0 by these operations.

In the worst-case scenario for the number of iterations, we only ever divide by 2; i.e. aa is a power of 2. In this case, it takes kk iterations if a=2ka=2^k ; i.e. we need log2(a)\log_2(a) iterations, and thus need a constant times log2(a)\log_2(a) many operations.

As we saw before, logarithms grow much slower than linear functions! Therefore, Algorithm 4.2. is likely the better algorithm to work with.

Practice Problems

  1. (-) Your two friends, Jiawei and Francis, have written a pair of algorithms to solve a problem. On an input of size nn , Jiawei’s algorithm needs to perform n2+2nn^2 + 2n calculations to find the answer, while Francis’s algorithm needs to perform log2(n)+2n\log_2(n) + 2^n calculations to find the same answer.

    Jiawei says that their algorithm is faster, because their runtime is polynomial while Francis’s algorithm has an exponential in it. Francis says that their algorithm is faster, because their algorithm has a logarithm in it while Jiawei’s is polynomial.

    Who is right, and why?

  2. (-) Let f(x)=10x+1,g(x)=x3,h(x)=log10(x3+3)f(x) = 10^x + 1, g(x) = x-3, h(x) = \log_{10}(x^3+3) and j(x)=x3j(x) = \sqrt[3]{x} . Calculate the following compositions:

    • fh(x).f \circ h (x).
    • hj(x).h \circ j (x).
    • gf(x).g \circ f(x).
    • fg(x).f \circ g(x).
  3. We say that a function f:ABf: A \to B has an inverse if there is some function f1:BAf^{-1}: B \to A such that ff1(x)=x=f1ff \circ f^{-1} (x) = x = f^{-1} \circ f for all xx : that is, f1f^{-1} “undoes” ff , and vice-versa. For example, log2(x)\log_2(x) and 2x2^x are inverses, as is x5x^5 and x5\sqrt[5]{x} .

    • Suppose that f(x)=3x+2f(x) = 3x+2 . Find f(x)f(x) ’s inverse.
    • Now, suppose that g(x)=3g(x) = 3 . Does this constant function have an inverse?
  4. Let AA be the collection of all students enrolled at the University of Auckland and BB be the collection of classes currently running at the University of Auckland. Which of the following are functions?

    • The rule f:ABf: A \to B , that given any student outputs the classes they’re enrolled in.
    • The rule f:BAf: B \to A , that given any class outputs the student with the top mark in that class.
    • The rule f:AAf: A \to A , that given any student outputs that student.
    • The rule f:BBf: B \to B , that given any class outputs its prerequisites.
  5. You’re a programmer! You’ve found yourself dealing with a program puzzle(n)\texttt{puzzle(n)} that has no comments in its code, and you want to know what it does. After some experimentation, you’ve found that puzzle(n)\texttt{puzzle(n)} takes in as input an integer nn , and does the following:

 (i) Take nn and square it.

 (ii) If nn is 1, 2, 3 or 6, output nn and stop.

 (iii) Otherwise, replace nn with (n2)%10(n^2 )\% 10 , i.e. the last digit of n2n^2 , and go to (ii)

  1. (+) Show that after each step in Algorithm 4.2., the quantity ab+prod\boxed{\mathbf{a} \cdot \mathbf{b} + prod} is always the same. Using this, prove that Algorithm 4.2. works!

  2. (+) In our answer to 4.1., we used a sort of handwave-y “this takes about log2(a)\log_2(a) many operations” argument. Let’s make this more rigorous!

    That is: let clevermultsteps(a)\texttt{clevermultsteps}(a) denote the number of steps that this algorithm needs to multiply a given number bb by aa .

    • Find a recurrence relation clevermultsteps(a)\texttt{clevermultsteps}(a) in terms of clevermultsteps(a/2)\texttt{clevermultsteps}(a/2) , that holds whenever aa is even.
    • Use this relation to calculate clevermultsteps(a)\texttt{clevermultsteps}(a) for a=2,4,8,16,32a = 2,4,8,16,32 , and plug these values into either the Online Encyclopedia of Integer Sequences or WolframAlpha. What pattern do you see?
  3. Consider the following sorting algorithm, called BubbleSort\texttt{BubbleSort} :

    Algorithm 4.8. The following algorithm, BubbleSort(L)\texttt{BubbleSort}(L) , takes in a list L=(l1,l2,ln)L =(l_1, l_2, \ldots l_n) of nn numbers and orders it from least to greatest. It does this by using the following algorithm:

        (i) Compare l1l_1 to l2l_2 . If l1>l2l_1 > l_2 , swap these two values; otherwise, leave them where they are.

        (ii) Now, move on, and compare the “next” pair of adjacent values l2,l3l_2, l_3 . Again, if these elements are out of order (i.e. l2>l3l_2 > l_3 ) swap them.

        (iii) Keep doing this through the entire list!

        (iv) At the end of this process, if you made no swaps, stop: your list is in order, by definition. Stop!

        (v) Otherwise, the list might be out of order! Return to (i).

        (a) Use this process to sort the list (6,1,4,2,3,2)(6,1,4,2,3,2) .

        (b) (+) Let BubbleSortSteps(n)\texttt{BubbleSortSteps}(n) denote the maximum number of steps that BubbleSort\texttt{BubbleSort} needs to sort a list of length nn . Find an expression for BubbleSortSteps(n)\texttt{BubbleSortSteps}(n) , and calculate its values for n=1,5,10n=1,5,10 and 100100 .

        (c) What is the smallest number of steps that BubbleSort\texttt{BubbleSort} would need to sort a list of length nn ?

  4. Find the following limits:

  1. Which of the three algorithms BubbleSort\texttt{BubbleSort} , MergeSort\texttt{MergeSort} and SelectionSort\texttt{SelectionSort} is the fastest (i.e. needs the fewest operations) to sort a list of 5 elements? Which needs the fewest to sort a list of 10 elements? How about 100? How about for huge values of nn ?