Algorithms and Functions

Exercise 4.1. Two processes that you can use to multiply two nonnegative integers $a,b$ together are listed below:

Algorithm 4.1.

Define a new number $prod$ , and initialize it (i.e. set it equal) to 0.
If $\mathbf{a} = 0$ , stop, and return the number $prod$ .
Otherwise, add $b$ to prod, and subtract 1 from $a$ . Then go to 2.

An example run of Algorithm 4.1 when $a=3, b=12$ :

\begin{array}{c||c|c|c} step & a & b & prod \\ \hline 1 & 3 & 12 & 0 \\ 2 & & & \\ 3 & 2 & & 12 \\ 2 & & & \\ 3 & 1 & & 24 \\ 2 & & & \\ 3 & 0 & & \fbox{36} \\ 2 & (halt!) & &\\ \end{array}

Algorithm 4.2.

Define a new number $prod$ , and initialize it (i.e. set it equal) to 0.
If $\mathbf{a} = 0$ , stop, and return the number $prod$ .
Otherwise, if $a$ is odd subtract $1$ from $a$ and set $prod = prod +b$ .
Divide $a$ by 2, and multiply $b$ by 2.
Go to step 2.

Similarly, an example run of Algorithm 4.2 when $a=3, b=12$ :

\begin{array}{c||c|c|c} step & a & b & prod \\ \hline 1 & 3 & 12 & 0 \\ 2 & & & \\ 3 & 2 & & 12 \\ 4 & 1 & 24 & \\ 2 & & & \\ 3 & 0 & & \fbox{36} \\ 4 & 0 & 48 & \\ 2 & (halt!) & &\\ \end{array}

In general, which of these is the faster way to multiply two numbers? Why?

Functions in General

In your high-school mathematics classes, you’ve likely seen functions described as things like ” $f(x) = 2x+3$ ” or ” $g(x) = \max(x,y)$ .” When we’re writing code, however, we don’t do this! That is: in most programming languages, you can’t just type in expressions like the ones before and trust that the computer will understand what you mean. Instead, you’ll often write something like the following:

/* function returning the max between two numbers */
int max(int num1, int num2) 
{
    /* local variable declaration */
    int result;
 
    if (num1 > num2)
        result = num1;
    else
        result = num2;
 
    return result; 
}

Notice how in this example we didn’t just get to define the rules for our function: we also had to specify the kind of inputs the function should expect, and also the type of outputs the function will generate! On one hand, this creates more work for us at the start: we can’t just tell people our rules, and we’ll often find ourselves having to go back and edit our functions as we learn what sorts of inputs we actually want to generate.

On the other hand, though, this lets us describe a much broader class of functions than what we could do before! Under our old high-school definition for a function, we just assumed that functions took in numbers and returned numbers. With the above idea, though, we can have functions take in and output anything: multiple numbers, arrays, text files, whatever!

On the third(?) hand, enforcing certain restrictions on the types of inputs and outputs to a function is also a much more secure way to think about functions in computer science. If you’re writing code in a real-life situation, you should always expect malicious or just clueless users to try to input the worst possible data to your functions. As a result, you can’t just have a function defined by the rule $f(x) = \frac{1}{x}$ and trust that your users out of the goodness of their hearts will never input 0 just to see what happens! They’ll do it immediately (as well as lots of other horrifying inputs, like 🐥, $\frac{1}{0}$ , $1-0.\overline{9}, \ldots$ ) just to see what happens.

(from https://xkcd.com/327/)

This is why many programming languages enforce type systems: i.e. rules around their functions that specifically force you to declare the kinds of inputs and outputs ahead of time, like we’ve done above! Doing this is an important part of writing bug-free and secure code.

As this is a computer science class, we should have a definition of function that matches this concept. We provide this here:

Definition 4.1. Formally, a function consists of three parts:

A collection $A$ of possible inputs. We call this the domain of our function.

A collection $B$ describing the type of outputs that our function will generate. We call this the codomain of our function.

A rule $f$ that takes in inputs from $A$ and generates outputs in $B$ .

Furthermore, in order for this all to be a function, we need it to satisfy the following property:
For every potential input $a$ from $A$ , there should be exactly one $b$ in $B$ such that $f(a) = b$ .

In other words, we never have a value $a$ in $A$ for which $f(a)$ is undefined,as that would cause our programs to crash! As well, we also do not allow for a value $a \in A$ to generate “multiple” outputs; i.e. we want to be able to rely on $f(a)$ not changing on us without warning, if we keep $a$ the same.

Example 4.1. Typically, to define a function we’ll write something like “Consider the function $f: \mathbb{Z} \to \mathbb{Q}$ , defined by the rule $f(n) = \frac{1}{n^2+1}$ . This definition tells you three things: what the domain is (the set the arrow starts from, which is the integers in this case), what the codomain is (the set the arrow points to, which is the rational numbers in this case), and the rule used to define $f$ .

Example 4.2. $f: \mathbb{Z} \to \mathbb{Z}$ defined by the rule ” $f(x) = y$ if and only if $x = y^2$ ” is not a function. There are many reasons for this:

There are values in the domain that do not get mapped to any values in the codomain by our rule. For instance, consider $x = -1 \in \mathbb{Z}$ . There is no value $y \in \mathbb{R}$ such that $-1 = y^2$ , because no integer when squared is negative! Therefore, $x$ is not mapped to any value $y$ in the codomain, and so we do not regard $f$ as a function.
There are also values in the domain that get mapped to multiple values in the codomain by our rule. For instance, consider $x=1 \in \mathbb{Z}$ . Because $1 = y^2$ has the two solutions $y=\pm1$ , this rule maps $x=1$ to the two values $y = \pm 1$ . This is another reason why $f$ is not a function!

Example 4.3. Let $A$ be the set of all students at the University of Auckland, and $B$ be the set of all integers. We can define a function $f: A\to B$ by defining $f(a)$ to be equal to that student’s current ID number. This is a function, because each student has a unique ID number!

However, if we tried to define a function $g: B \to A$ by the rule $g(b) =$ the student whose ID number is $b$ , we would fail! This is because there are many integers that are not ID numbers: for example, no student has ID number $-1$ , or $10^{100}$ .

While the objects above have had relatively “nice” rules, not all functions can be described quite so cleanly! Consider the following example:

Let $A$ be the collection {🇨🇦, 🐈, 🐥, 🇳🇿, 🕸️} of the Canada, cat, bird, New Zealand and web emojis, and let $B$ be the collection {🐙, ☢️, 🐝, 👨🏾} of the octopus, radiation, bee, and man emojis. Consider $f: A \to B,$ defined by the rules:

f(🇨🇦) = 🐙

f(🐈) = 🐝

f(🐥) = 🐙

f(🇳🇿) = 🐝

f(

🕸️

) = 👨🏾

This is a function, because we have given an output for every possible input, and also never sends an input to multiple different outputs. It’s not a function with a simple algebraic rule like ” $x^2+2x-1$ ”, but that’s OK!

A useful way to visualize functions defined in this piece-by-piece fashion is with a diagram: draw the domain at left, the codomain at right, and draw an arrow from each $x$ in the domain to its corresponding element $f(x)$ in the codomain.

Domain

🇨🇦

🐥

🐈

🇳🇿

🕸️

f

Codomain

🐙

☢️

🐝

👨🏾

Alongside the domain/codomain ideas above, another useful idea here is the concept of range:

Definition 4.2. Take any function $f: A \to B$ . We define the range of $f$ as the set of all values in the codomain that our function actually sends values in the domain to. In other words, the range of $f$ is the following set:

$\{ b \in B ~|~ \textrm{there is some }a \in A\textrm{ such that }f(a) = b\}$

Note that the range is usually different to the codomain! In the examples we studied earlier, we saw the following:

$f: \mathbb{Z} \to \mathbb{Q}$ defined by the rule $f(n) = \dfrac{1}{n^2+1}$ does not output every rational number! Amongst other values, it will never output any number greater than 1 (as $\dfrac{1}{n^2+1} \leq \dfrac{1}{0^2+1} = 1$ for every integer $n$ .) As such, its codomain ( $\mathbb{Q}$ ) is not equal to its range.
The function $f:A \to B$ from Example 4.3, that takes in any student at the University of Auckland and outputs their student ID, does not output every integer: amongst other values, it will never output a negative integer! As such, its codomain ( $B$ ) is not equal to its range.
The emoji function in Example 4.4 never outputs ☢️, even though it’s in the codomain.

Intuitively: we think of the codomain as letting us know what type of outputs we should expect. That is: in both mathematics or computer science, often just knowing that the output is “an integer” or “a binary string of length at most 20” or “a Unicode character” is enough for our compiler to work. As such, it’s often much faster to just describe the type of possible outputs, instead of laboriously finding out exactly what outputs are possible!

However, in some cases we will want to know precisely what values we get as outputs, and in that situation we will want to find the actual outputs: i.e. the range. To illustrate this, let’s consider a few examples:

Example 4.5. Consider the function $f: \mathbb{Z} \to \mathbb{Z}$ given by the rule $f(x) = 2|x| + 2$ . This function has range equal to all even numbers that are at least 2.

To see why, simply notice that for any integer $x$ , $|x|$ is the “absolute value” of $x$ : i.e. it’s $x$ if we remove its sign so that it’s always nonnegative. As a result, $2|x|$ is always a nonnegative even number, and this $2|x| + 2$ must be a nonnegative even number that’s at least 2.

That tells us that the only possible outputs are even numbers that are at least 2! However, we still don’t know that all of those outputs are ones that we actually can get as outputs.

To see why this is true: take any even number that is at least 2. By definition, we can write this as $2k$ , for some $k \geq 1$ . Rewrite this as $2(k-1) + 2$ ; if we do so, then we can see that $f(k-1) = 2|k-1| + 2 = 2(k-1) + 2$ (because if $k \geq 1,$ then $k-1 \geq 0$ and so $|k-1| = k-1$ .) As a result, we’ve shown that $f(k-1) = 2k$ for any $k \geq 1$ , and thus that we can actually get any even number that’s at least 2 as an output.

Example 4.6. Consider the emoji function from Example 4.4. If we look at the diagram we drew before, we can see that our function generates three possible outputs: 🐙, 🐝 and 👨🏾. Therefore, the collection of these three emojis is our range!

Domain

🇨🇦

🐥

🐈

🇳🇿

🕸️

f

Codomain

🐙

☢️

🐝

👨🏾

Example 4.7. Let $A$ be the collection of all pairs of words in the English language, and $B$ be the two values $\{$ true, false $\}$ . Define the function $f: A \to B$ by saying that $f(w_1, w_2) =$ true if the words $w_1, w_2$ rhyme, and false otherwise. For example, $f($ cat, bat $) =$ true, while $f($ cat, cataclysm $)=$ is false.

The range of this function is $\{$ true, false $\}$ , i.e. the same as its codomain! It is possible for the range and codomain to agree. (If this happens, we call such a function a surjective function. We’re not going to focus on these functions here, but you’ll see more about them in courses like Compsci 225 and Maths 120/130!)

Example 4.8. Let $\mathbb{R}$ denote the set of all real numbers (i.e. all numbers regardless of whether they’re rational or irrational; alternately, anything you can describe with a decimal expansion.) Define the function $f: \mathbb{R} \to \mathbb{R}$ by the rule $f(x) = 2^x$ .

This function has range equal to the set of all positive numbers! This takes more math to see than we currently have: again, take things like Maths 130 to see the “why” behind this. However, if you draw a graph of $2^x$ you’ll see that the outputs (i.e. $y$ -values) range over all of the possible positive numbers, as claimed.

To close this section, we give a useful bit of notation for talking about functions defined in terms of other functions: function composition.

Fact 4.2. Given any two functions $f: B \to C, g: A \to B$ , we can combine these functions via function composition: that is, we can define the function $f \circ g: A\to C$ , defined by the rule $f \circ g (x) = f(g(x))$ . We pronounce the small open circle symbol $\circ$ as “composed with.”

Example 4.9.

If $f, g: \mathbb{R}\to \mathbb{R}$ are defined by the rules $f(x) = x+1$ and $g(x) = x^2-1,$ then we would have $g \circ f(x) = g(f(x)) = g(x+1) = (x+1)^2 - 1 = x^2 + 2x$ .
Notice that this is different to $f \circ g (x) = f(g(x)) = f(x^2-1) = (x^2-1)+1 = x^2$ !

In general, $f \circ g$ and $g \circ f$ are usually different functions: make sure to be careful with the order in which you compose functions.
If $f, g: \mathbb{R}\to \mathbb{R}$ are defined by the rules $f(x) = 3x-1$ and $g(x) = \frac{x+1}{3}$ , then $f \circ g (x) = f(g(x)) = f\left( \frac{x+1}{3}\right) = 3\frac{x+1}{3} - 1 = (x+1)-1 = x$ .
If $f: \mathbb{R}^+\to \mathbb{R}^+$ and $g: \mathbb{R}^+ \to \mathbb{R}$ are defined by the rules $f(x) = 2^{x^2+1}$ and $g(x) = \log_2(x) - 1$ , then $g \circ f (x) = g(f(x)) = g\left(2^{x^2+1}\right) = \log_2\left(2^{x^2+1}\right)-1 = (x^2+1) - 1 = x^2$ .

(Here, $\mathbb{R}^+$ denotes the set of all positive real numbers.)

Handy!

Notice that in the definition above, we required that the domain of $f$ was the codomain of $g$ . That is: if we wanted to study $f \circ g (x) = f(g(x))$ , we needed to ensure that every output of $g$ is a valid input to $f$ .

This makes sense! If you tried to compose functions whose domains and codomains did not match up in this fashion, you’d get nonsense / crashes when the inner function $g$ returns an output at which the outer function is undefined. For example:

Example 4.10.

If $f: \mathbb{R} \setminus \{0\} \to \mathbb{R}$ is defined by the rule $f(x) = \frac{1}{x}$ and $g: \mathbb{R} \to \mathbb{R}$ is defined by the rule $g(x) = x^2-1$ , then you might think that $f \circ g (x) = f(g(x)) = \frac{1}{x^2-1}$ .

However, this is not a function! When $x=\pm 1$ , for example, we have $f(g(\pm 1)) = \frac{1}{(\pm 1)^2 - 1} = \frac{1}{0}$ , which is undefined. This is why we insist that the codomain of $g$ is the domain of $f$ ; we need all of $g$ ’s outputs to be valid inputs to $f$ .
Let $A$ be the set of all people in your tutorial room, $B$ be the set of all ID numbers of UoA students, and $C$ be the set of all ID numbers of Compsci 120 students. Then $f: C \to \mathbb{R}$ defined by taking any Compsci 120 student’s ID and outputting their grade on the mid-sem test is a function; as well, $g: A \to B$ , defined by mapping each person in your tutorial room to their ID number is a function.

However, $f \circ g$ , the function that tries to take each person in your tutorial room and output their mid-sem test score, is undefined! In particular, your tutor is someone in your tutorial room, who even though they do have an ID number, will not have a score on the mid-sem test. Another reason to insist that the codomain of $g$ is the domain of $f$ !

Algorithms

In the previous section, we came up with a “general” concept for function that we claimed would be better for our needs in computer science, as it would let us think of things like

/* function returning the max between two numbers */
int max(int num1, int num2) 
{
    /* local variable declaration */
    int result;
 
    if (num1 > num2)
        result = num1;
    else
        result = num2;
 
    return result; 
}

as a function. However, most of the examples we studied in this chapter didn’t feel too much like the code above: they were either fairly mathematical in nature (i.e. $f(n) = \frac{1}{n^2+1}$ ) or word-problem-oriented (i.e. the function that sent UoA students to their ID numbers.)

To fix this issue, this section will focus on the idea of an algorithm: that is, a way of describing in general a step-by-step problem-solving process that we can easily turn into code.

We start by describing what an algorithm is:

Definition 4.3. An algorithm is a precise and unambiguous set of instructions.

Typically, people think of algorithms as a set of instructions for solving some problem; when they do so, they typically have some restrictions in mind for the kinds of instructions they consider to be valid. For example, consider the following algorithm for proving the Riemann hypothesis:

Prove the Riemann hypothesis.
Rejoice!

On one hand, this is a fairly precise and unambiguous set of instructions: step 1 has us come up with a proof of the Riemann hypothesis, and step 2 tells us to rejoice.

On the other hand: this is not a terribly useful algorithm. In particular, its steps are in some sense “too big” to be of any use: they reduce the problem of proving the Riemann hypothesis to … proving the Riemann hypothesis. Typically, we’ll want to limit the steps in our algorithms to simple, mechanically-reproducible steps: i.e. operations that a computer could easily perform, or operations that a person could do with relatively minimal training.

In practice, the definition of “simple” depends on the context in which you are creating your algorithm. Consider the algorithm for making delicious pancakes, given at below.

An algorithm for pancakes!

Acquire and measure out the following ingredients:

2 cups of buttermilk, or 1.5 cups milk + .5 cups yoghurt whisked together.
2 cups of flour.
2 tablespoons of sugar.
2 teaspoons of baking powder.
1/2 teaspoon of baking soda.
1/2 teaspoon of salt.
1 large egg.
3 tablespoons butter.
Additional butter.
Maple syrup.

Whisk the flour, sugar, baking powder, baking soda, and salt in a medium bowl.
Melt the 3 tablespoons of butter.
Whisk the egg and melted butter into the milk until combined.
Pour the milk mixture into the dry ingredients, and whisk until just combined (a few lumps should remain.)
Heat a nonstick griddle/frypan on medium heat until hot; grease with a teaspoon or two of butter.
Pour $1/4$ cup of batter onto the skillet. Repeat in various disjoint places until there is no spare room on the skillet. Leave gaps of 1cm between pancakes.
Cook until large bubbles form and the edges set (i.e.\ turn a slightly darker color and are no longer liquid,) about 2 minutes.
Using a spatula, flip pancakes, and cook a little less than 2 minutes longer, until golden brown.
If there is still unused batter, go to 5; else, top pancakes with maple syrup and butter, and eat.

This is a good recipe. Use it!

This algorithm’s notion of “simple” is someone who is (1) able to measure out quantities of various foods, and (2) knows the meaning of various culinary operations like “whisk” and “flip.” If we wanted, we could make an algorithm that includes additional steps that define “whisking” and “flipping”. That is: at each step where we told someone to whisk the flour, we could instead have given them the following set of instructions:

(a). Grab a whisk. If you do not know what a whisk is, go to this Wikipedia article and grab the closest thing to a whisk that you can find. A fork will work if it is all that you can find. (b). Insert the whisk into the object you are whisking. (c). Move the whisk around through the object you are whisking in counterclockwise circles of varying diameter, in such a fashion to mix together the contents of the whisked object.

In this sense, we can extend our earlier algorithm to reflect a different notion of “simple,” where we no longer assume that our person knows how to whisk things. It still describes the same sets of steps, and in this sense is still the “same” algorithm — it just has more detail now!

This concept of “adding” or “removing” detail from an algorithm isn’t something that will always work; some algorithms will simply demand steps that cannot be implemented on some systems. For example, no matter how many times you type “sudo apt-get 2 cups of flour,” your laptop isn’t going to be able to implement our above pancake algorithm. As well, there may be times where a step that was previously considered “simple” becomes hideously complex on the system you’re trying to implement it on!

We’re not going to worry too much about the precise definition of “simple” in this class, because we’re not writing any code here (and so our notion of “simple” isn’t one we can precisely nail down) --- these are the details we’ll leave for your more coding-intensive courses.

Instead, let’s just look at a few examples of algorithms! We’ve already seen one in this class, when we defined the $\%$ operation:

Algorithm 4.3. This algorithm takes in any two integers $a, n$ , where $n > 0$ . It then calculates $a \% n$ as follows:

If $a \geq n$ , we repeatedly subtract $n$ from $a$ until $a < n$ , and return the end result.
If $a < 0$ , repeatedly add $n$ to $a$ until $a > 0$ , and return the end result.
If neither of these cases apply, then we just return $a$ .

Second, we can turn Claim 1.6 into an algorithm for how to tell if a number is prime:

Algorithm 4.4. This is an algorithm that takes in a positive integer $n$ , and determines whether or not $n$ is prime. It proceeds as follows:

If $n=1$ , stop: $n$ is not prime.
Otherwise, if $n > 1$ , find all of the numbers $2,3,4,\ldots \lfloor \sqrt{n} \rfloor$ . Take each of these numbers, and test whether they divide $n$ .
If one of them does, then $n$ is not prime!
Otherwise, if none of them divide $n$ , then by Claim 1.6, $n$ is prime.

This is a step-by-step process that tells us if a number is a prime or not! Notice that the algorithm itself didn’t need to contain a proof of Claim 1.6; it just has to give us instructions for how to complete a task, and not justify why those instructions will work. It is good form to provide such a justification where possible, as it will help others understand your code! However, it is worth noting that such a justification is separate from the algorithm itself: it is quite possible (and indeed, all too easy) to write something that works even though you don’t necessarily understand why.

For a third example, let’s consider an algorithm to sort a list:

Algorithm 4.5. The following algorithm, $\texttt{SelectionSort}(L)$ , takes in a list $L =(l_1, l_2, \ldots l_n)$ of $n$ numbers and orders it from least to greatest. For example, $\texttt{SelectionSort}(1,7,1,0)$ is $(0,1,1,7)$ . It does this by using the following algorithm:

If $L$ contains at most one number, $L$ is trivially sorted! In this situation, stop.
Otherwise, $L$ contains at least two numbers. Let $L = (l_1, l_2, \ldots l_n)$ , where $n \geq 2$ . Define a pair of values $\texttt{val}_\texttt{min}$ , $\texttt{loc}_\texttt{min}$ , and set them equal to the value and location of the first element in our list.
One by one, starting with the second entry in our list and working our way through our entire list $L$ , compare the value stored in $\texttt{val}_\texttt{min}$ to the current value $l_k$ that we’re examining.

If $\texttt{val}_\texttt{min} > l_k$ , update $\texttt{val}_\texttt{min}$ to be equal to $l_k$ , and update $\texttt{loc}_\texttt{min}$ to be equal to k.
Otherwise, just go on to the next value.

At the end of this process, $\texttt{val}_\texttt{min}$ and $\texttt{loc}_\texttt{min}$ describes the value and the location of the smallest element in our list. Swap the first value in our list with $l_{\texttt{loc}_\texttt{min}}$ in our list: this makes the first value in our list the smallest element in our list.
To finish, set the first element of our list aside and run $\texttt{SelectionSort}$ on the rest of our list.

Back in our first chapter, to understand our two previous algorithms 4.3 and 4.4 we started by running these algorithms on a few example inputs! In general, this is a good tactic to use when studying algorithms; actually plugging in some concrete inputs can make othewise-obscure instructions much simpler to understand.

To do so here, let’s run our algorithm on the list $(1,7,1,0)$ , following each step as written:

\begin{array}{c|c|c|c|c|c} \textrm{list} & \textrm{step} & \texttt{loc}_{\texttt{min}} & \texttt{val}_{\texttt{min}} & \textrm{current }k & \textrm{current }l_k\\ \hline (1,7,1,0) & 1 & & & &\\ (1,7,1,0) & 2 & 1 & 1 & &\\ (1,7,1,0) & 3 & & & 2 & 7\\ (1,7,1,0) & 3 & & & 3 & 1\\ (1,7,1,0) & 3 & & & 4 & 0\\ (1,7,1,0) & 3(a) & 4 & 0 & & \\ (0,7,1,1) & 4 & & & & \\ (0,\boxed{7,1,1}) & 5 & & & & \\ (0,\boxed{7,1,1}) & 1 & & & & \\ (0,\boxed{7,1,1}) & 2 & 2 & 7 & & \\ (0,\boxed{7,1,1}) & 3 & & & 3 & 1\\ \end{array}

\begin{array}{c|c|c|c|c|c} \textrm{list} & \textrm{step} & \texttt{loc}_{\texttt{min}} & \texttt{val}_{\texttt{min}} & \textrm{current }k & \textrm{current }l_k\\ \hline (0,\boxed{7,1,1}) & 3(a) & 3 & 1 & & \\ (0,\boxed{7,1,1}) & 3 & & & 4 & 1\\ (0,\boxed{1,7,1}) & 4 & & & & \\ (0,1,\boxed{7,1}) & 5 & & & & \\ (0,1,\boxed{7,1}) & 1 & & & & \\ (0,1,\boxed{7,1}) & 2 & 3 & 7 & & \\ (0,1,\boxed{7,1}) & 3 & & & 4 & 1\\ (0,1,\boxed{7,1}) & 3(a) & 4 & 1 & & \\ (0,1,\boxed{1,7}) & 4 & & & & \\ (0,1,1,\boxed{7}) & 5 & & & & \\ (0,1,1,\boxed{7}) & 1 & & & & \\ \end{array}

Here, we use the $\boxed{7,1,1}$ , $\boxed{7,1}$ and $\boxed{7}$ boxes to visualize the “rest of our list” part of step 5 in our algorithm.

This worked! Moreover, doing this by hand can help us see an argument for why this algorithm works:

Claim 4.1. $\texttt{SelectionSort}$ (i.e. algorithm 4.5) works.

Proof. In general, there are three things we need to check to show that a given algorithm works:

The algorithm doesn’t have any bugs: i.e. every step of the process is defined, you don’t have any division by zero things or undefined cases, or stuff like that.

This is true here! The only steps we perform in this algorithm are comparisons and swaps, and the only case we encounter is ” $\texttt{val}_\texttt{min} > l_k$ is true” or ” $\texttt{val}_\texttt{min} > l_k$ is false,” which clearly covers all possible situations. As such, there are no undefined cases or undefined operations.
The algorithm doesn’t run forever: i.e. given a finite input, the algorithm will eventually stop and not enter an infinite loop.

This is also true here! To see why, let’s track the number of comparisons and write operations performed by this process, given a list of length $n$ as input:
- Step 1: one comparisons (we checked the size of the list.)
- Step 2: two write operations (we defined $\texttt{val}_\texttt{min}$ , $\texttt{loc}_\texttt{min}$ .)
- Step 3: $(n-1)$ comparisons and possibly $2(n-1)$ write operations.
  
  To see why: note that for all of the $n-1$ entries in our list from $l_2$ onwards, we looked up the value in $l_k$ and compared it to the value we have in $\texttt{val}_\texttt{min}$ , which gives us $n-1$ comparisons. If it was smaller, we rewrote the values in $\texttt{val}_\texttt{min}$ and $\texttt{loc}_\texttt{min}$ in 3(a).
- Step 4: 2 write operations (to swap these values.)
- Step 5: 1 write operation (to resize the list to set the first element aside), and however many operations we need to sort a list of $n-1$ numbers with $\texttt{SelectionSort}$ .
In total, then, we have the following formula: if $\texttt{SelectionSortSteps}(n)$ denotes the maximum number of operations in total needed to sort a list with our algorithm, then
$\texttt{SelectionSortSteps}(n) = 1+2+(n-1)+2(n-1) + 2 + 1+ \texttt{SelectionSortSteps}(n-1)$
$= 3n+3 + \texttt{SelectionSortSteps}(n-1).$
$= 3(n+1) + \texttt{SelectionSortSteps}(n-1).$
At first, this looks scary: our function is defined in terms of itself! In practice, though, this is fine. We know that $\texttt{SelectionSortSteps}(1) =1$ , because step 1 immediately ends our program if the list has size 1.

Therefore, our formula above tells us that if we set $n=2$ , we have
$\texttt{SelectionSortSteps}(2) = (3\cdot (2+1)) + \texttt{SelectionSortSteps}(2-1) = 9+1= \boxed{10},$
by using our previously-determined value for $\texttt{SelectionSortSteps}(1)$ . Still finite!

We can do this again for $n=3$ , and get
$\texttt{SelectionSortSteps}(3) = (3\cdot (3+1)) + \texttt{SelectionSortSteps}(3-1) = 12 +10 = \boxed{22},$
by using our previously-determined value for $\texttt{SelectionSortSteps}(2)$ . Again, still finite!

In general, if this process can sort a list of size $n-1$ in finitely many steps, then it only takes us $3n+3$ more steps to sort a list of size $n$ . In particular, there is no point at which our algorithm jumps to needing “infinitely many” steps to sort a list!
The algorithm produces the desired output: in this case, it produces a list ordered from least to greatest.

This happens! To see why, notice that in step 4, we always make the first element of the list we’re currently sorting the smallest element in our list. Therefore, on our first application of step 4, we have ensured that $l_1$ is the smallest element of our list.

On the second application of step 4, we were sorting the list starting from its second element: doing so ensures that $l_2$ is smaller than all of the remaining elements.

On the third application of step 4, we had set the first two elements aside and were sorting the list starting from its third element: doing so ensures that $l_3$ is smaller than all of the remaining elements.

In general, on the $k$ -th application of step 4, we had set the first $k-1$ elements aside and were sorting the list starting from its $k$ -th element! Again, doing so ensures that $l_k$ is smaller than all of the remaining elements.

So, in total, what does this mean? Well: by the above, we know that $l_1 \leq l_2 \leq l_3 \leq \ldots \leq l_n$ , as by definition each element is smaller than all of the ones that come afterwards. In other words, this list is sorted! $\square$

Recursion, Composition, and Algorithms

Algorithm 4.5 had an interesting element to its structure: in its fifth step, it contained a reference to itself! This sort of thing might feel like circular reasoning: how can you define an object in terms of itself?

However, if you think about it for a bit, this sort of thing is entirely natural: many tasks and processes in life consist of “self-referential” that are defined by self-reference! We call such definitions recursive definitions, and give a few examples here:

Example 4.11. Fractals, and the many plants and living things that have fractal-like patterns built into theirselves, are examples of recursively-defined objects! For example, consider the following recursive process:

Start by drawing any shape $S_0$ . For example, here’s a very stylized drawing of a cluster of fern spores:

If you know some linear algebra, the explicit formulas we are using here are the following: for each point $(x,y)$ in $S$ , draw the four points

$\begin{bmatrix} 0.85 & 0.04 \\ -0.04 & 0.85 \end{bmatrix}$ $\begin{bmatrix} x \\ y \end{bmatrix}$ + $\begin{bmatrix} 0 \\ 1.6\end{bmatrix}$ ,
$\begin{bmatrix} 0.85 & 0.04 \\ -0.04 & 0.85 \end{bmatrix}$ $\begin{bmatrix} x\\y\end{bmatrix} + \begin{bmatrix}0\\1.6\end{bmatrix}$ ,
$\begin{bmatrix} 0.2 & -0.26 \\ 0.23 & 0.22 \end{bmatrix}$ $\begin{bmatrix} x\\y\end{bmatrix} + \begin{bmatrix}0\\1.6\end{bmatrix}$ ,
$\begin{bmatrix} -0.15 & 0.28 \\ 0.26 & 0.24 \end{bmatrix}$ $\begin{bmatrix}x\\y\end{bmatrix} + \begin{bmatrix}0\\0.44\end{bmatrix}$ ,
$\begin{bmatrix} 0 & 0 \\ 0 & 0.16 \end{bmatrix}$ $\begin{bmatrix}x\\y\end{bmatrix} + \begin{bmatrix}0\\0\end{bmatrix}.$

This is the precise math-y way of describing the operations we’re doing to these rectangles!

Now, given any shape $S$ , we define $T(S)$ as the shape made by making four copies of $S$ and manipulating them as follows: if $S$ is a shape contained within the gray rectangle at left, we make four copies of $S$ appropriately scaled/stretched/etc to match the four rectangles at right. (It can be hard to see the black rectangle, because it’s so squished: it’s the stem-looking bit at the bottom-middle.)

For example, $T(S_0)$ , i.e. $T$ applied to our fern spore shape, is the following:
Using our “seed” shape $S_0$ and our function $T$ , we then recursively define $S_n$ as $T(S_{n-1})$ for every positive integer $n$ . That is: $S_1 = T(S_0), S_2 = T(S_1)$ , etc. This is a recursive definition because we’re defining our shapes in terms of previous shapes! Using the language of function composition, we can express this all at once by writing $S_n = \underbrace{T \circ T \circ \ldots \circ T}_{n \textrm{ times}}(S_0)$ .
Now, notice what these shapes look like as we draw several of them in a row:

Our seed grows into a fern!

This is not a biology class, and there are many open questions about precisely how plants use DNA to grow from a seed. However, the idea that many plants can be formed by taking a simple instruction (given one copy of a thing, split it into appropriately stretched/placed copies of that thing) and repeatedly applying it is one that should seem reasonable, given the number of places you see it in the world!

Example 4.12. On the less beautiful but more practical side, recursion is baked into many fundamental mathematical operations! For example, think about how you’d calculate $11 \cdot 13$ . By definition, because multiplication is just repeated addition, you could calculate this as follows:

11 \cdot 13 = \underbrace{13 + 13 + 13 + 13 + 13 +13 +13 + 13 + 13 + 13 + 13}_{11\textrm{ times}} = 143.

Now, however, suppose that someone asked you to calculate $12 \cdot 13$ . While you could use the process above to find this product, you could also shortcut this process by noting that

12 \cdot 13 = (1+11)\cdot 13 = 13 + 11\cdot 13 = 13 + 143 = 156.

This sort of trick is essentially recursion! That is: if you want, you could define the product $n \cdot k$ recursively for any nonnegative integer $n$ by the following two-step process:

$0 \cdot k = 0$ , for every $k$ .
For any $n \geq 1$ , $n \cdot k = k + (n-1)\cdot k$ .

The second bullet point is a recursive definition, because we defined multiplication in terms of itself! In other words, when we say that $12 \cdot 13 = 13 + 11 \cdot 13$ , we’re really thinking of multiplication recursively: we’re not trying to calculate the whole thing out all at once, but instead are trying to just relate our problem to a previously-worked example.

Exponentiation does a similar thing! Because exponentiation is just repeated multiplication, we know that

2^{10} = \underbrace{2\cdot2\cdot2\cdot2\cdot2\cdot2\cdot2\cdot2\cdot2\cdot2}_{10 \textrm{ times}} = 1024.

Similarly, though, we can define exponentiation recursively by saying that for any nonnegative integer $n$ and nonzero number $a$ , that

$a^0 = 1$ , and
For any $n \geq 1$ , $a^n = a \cdot a^{n-1}$ .

In other words, with this idea, we can just say that

2^{11} = 2 \cdot 2^{10} = 2 \cdot 1024 = 2048,

instead of calculating the entire thing from scratch!

These ideas can be useful if you’re working with code where you’re calculating a bunch of fairly similar products/exponents/etc. With recursion, you can store some precalculated values and just do a few extra steps of working rather than doing the whole thing out by scratch each time. Efficiency!

In classes like Compsci 220/320, you’ll study the idea of efficiency in depth, and come up with more sophisticated ideas than this one! In most practical real-life situations there are better ways to implement multiplication and exponentiation than this recursive idea; however, it can be useful in some places, and the general principle of “storing commonly-calculated values and extrapolating other values from those recursively” is one that does come up in lots of places!

Recursion also comes up in essentially every dynamical system that models the world around us! For instance, consider the following population modelling problem:

Example 4.13. Suppose that we have a petri dish in which we’re growing a population of amoebae, each of which can be in two possible states (small and large).

Amoebas grow as follows: if an amoeba is small at some time $t$ , then at time $t+1$ it becomes large, by eating food around it. If an amoeba is large at some time $t$ , then at time $t+1$ it splits into one large amoeba and one small amoeba.

Suppose our petri dish starts out with one small amoeba at time $t=1$ . How many amoebae in total will be in this dish at time $t=n$ , for any natural number $n$ ?

Answer. To help find an answer, let’s make a chart of our amoeba populations over the first six time steps:

\begin{array}{c|cccccc} & 1 & 2 & 3 & 4 & 5 & 6\\\hline \textrm{Large} & 0 & 1 & 1 & 2 & 3 & 5\\ \textrm{Small} & 1 & 0 & 1 & 1 & 2 & 3\\ \textrm{Total} & 1 & 1 & 2 & 3 & 5 & 8\\ \end{array}

This chart lets us make the following observations:

The number of large amoebae at time $n$ is precisely the total number of amoebae at time $n-1$ . This is because every amoeba at time $n-1$ either grows into a large amoeba, or already was a large amoeba!
The number of small amoebae at time $n$ is the number of large amoebae at time $n-1$ . This is because the only source of small amoebae are the large amoebae from the earlier step when they split!
By combining 1 and 2 together, we can observe that the number of small amoebae at time $n$ is the total number of amoebae at time $n-2$ !
Consequently, because we can count the total number of amoebae by adding the large amoebae to the small amoebae, we can conclude that the total number of amoebae at time $n$ is the total number of amoebae at time $n-1$ , plus the total number of amoebae at time $n-2$ . In symbols,

A(n) = A(n-1) + A(n-2).

We call relations like the one above recurrence relations, and will study them in greater depth when we get to induction as a proof method.

For now, though, notice that they are remarkably useful objects! By generalizing the model above (i.e. subtracting $a_{n-3}$ to allow for old age killing off amoebas, or having a $-ca_n^2$ term to denote that as the population grows too large, predation or starvation will cause the population to die off, etc.) one can basically model an arbitrarily-complicated population over the long term.

These are particularly cool because they actually accurately model predator/prey relations in real life! See Isle Royale, amongst other examples. For more details on these sorts of population-modelling processes, see the logistic map, the Lotka-Volterra equations, and most of the applied mathematics major here at Auckland!

It bears noting that the amoeba recurrence relation is not the first recurrence relation you’ve seen! If you go back to Algorithm 4.5, we proved that

\texttt{SelectionSortSteps}(n) = 3(n+1) + \texttt{SelectionSortSteps}(n-1).

This is another recurrence relation: it describes the maximum number of operations needed to sort a list of size $n$ in terms of the maximum number of operations needed to sort a list of size $n-1$ .

While recurrence relations are nice, they can be a little annoying to work with directly. For example, suppose that someone asked us what $\texttt{SelectionSortSteps}(12)$ is. Because we don’t have a non-recursive formula, the only thing we could do here is just keep recursively applying our formula, to get

\texttt{SelectionSortSteps}(12)

= 3(12+1) + \texttt{SelectionSortSteps}(11)

= 3(12+1) + 3(11+1) + \texttt{SelectionSortSteps}(10)

= 3(12+1) + 3(11+1) + 3(10+1) + \texttt{SelectionSortSteps}(9)

= …

= 3(12+1) + 3(11+1) + 3(10+1) + \ldots + 3(3+1) + 3(2+1) + \texttt{SelectionSortSteps}(1)

= 3(12+1) + 3(11+1) + 3(10+1) + \ldots + 3(3+1) + 3(2+1) + 1 = \boxed{265}.

This is … kinda tedious. It would be nice if we had a direct formula for this: something like $\texttt{SelectionSortSteps}(n) =2^n$ or $n^2-4n + 17$ that we could just plug $n$ into and get an answer.

At this point in time, we don’t have enough mathematics to directly find such a “closed” form. However, we can still sometimes find an answer by just guessing. To be a bit more specific: if we use our definition, we can calculate the following values for $\texttt{SelectionSortSteps}(n)$ :

\begin{array}{c||c|c|c|c|c|c|c} n & 1 & 2 & 3 & 4 & 5 & 6 & 7\\\hline \texttt{SelectionSortSteps}(n) & 1 & 10 & 22 & 37& 55 & 76 & 100\\ \end{array}

To get these skills, either take Maths 120 and study linear algebra / eigenvalues, or take Compsci 220/320, or take Maths 326 and learn about generating functions!

If you plug this into the Online Encyclopedia of Integer Sequences, it will give you the following guess by comparing it to all of the sequences it knows:

This turns out to be correct!

Claim 4.2. For every positive integer $n$ , we have $\texttt{SelectionSortSteps}(n) = \dfrac{3n^2+9n-10}{2}$ .

We don’t have the techniques to prove this just yet. If you would like to see a proof, though, skip ahead to the induction section of our proofs chapter! We’ll tackle this problem there (along with another recurrence relation), in Section 7.8.

Runtime and Algorithms

In the above few sections, we studied $\texttt{SelectionSortSteps}(n)$ and analyzed the maximum number of operations it needs to sort a list of length $n$ . In general, this sort of run-time analysis of an algorithm - i.e. the number of elementary operations needed for that algorithm to run - is a very useful thing to be able to do!

To give a brief example of why this is useful, consider another, less well-known algorithm that we can use to sort a list:

Algorithm 4.6. The following algorithm, BogoSort $(L)$ , takes in a list $L =(l_1, l_2, \ldots l_n)$ of $n$ numbers and orders it from least to greatest. It does this by using the following algorithm:

One by one, starting with the first entry in our list and working our way through our list, compare the values stored in consecutive elements in our list.

If these elements never decrease - i.e. if when we perform these comparisons, we see that $l_1 \leq l_2 \leq \ldots \leq l_n$ - then our list is already sorted! Stop.
Otherwise, our list is not already sorted. In this case, randomly shuffle the elements of our list around, and loop back to step 1.

Here’s a sample run of this algorithm on the list $(1,7,1,0)$ , where I’ve used Random.org to shuffle our list when needed by step 3:

\begin{array}{c|c|c|c} \textrm{list} & \textrm{iteration count} & \textrm{step} & \textrm{sorted}\\\hline (1,7,1,0) & 1 & 1 & no \\ (1,1,0,7) & & 2 & \\ (1,1,0,7) & 2 & 1 & no \\ (7,0,1,1) & & 2 & \\ (7,0,1,1) & 3 & 1 & no \\ (1,0,1,7) & & 2 & \\ (1,0,1,7) & 4 & 1 & no \\ (7,1,1,0) & & 2 & \\ (7,1,1,0) & 5 & 1 & no \\ (7,0,1,1) & & 2 & \\ \end{array}

\begin{array}{c|c|c|c} \textrm{list} & \textrm{iteration count} & \textrm{step} & \textrm{sorted}\\\hline (7,0,1,1) & 6 & 1 & no \\ (1,1,0,7) & & 2 & \\ (1,1,0,7) & 7 & 1 & no \\ (1,1,7,0) & & 2 & \\ (1,1,7,0) & 8 & 1 & no \\ (0,1,7,1) & & 2 & \\ (0,1,7,1) & 9 & 1 & no \\ (7,1,0,1) & & 2 & \\ (7,1,0,1) & 10 & 1 & no \\ (1,7,0,1) & & 2 & \\ \end{array}

\begin{array}{c|c|c|c} \textrm{list} & \textrm{iteration count} & \textrm{step} & \textrm{sorted}\\\hline (1,7,0,1) & 11 & 1 & no \\ (1,7,1,0) & & 2 & \\ (1,7,1,0) & 12 & 1 & no \\ (1,7,1,0) & & 2 & \\ (1,7,1,0) & 13 & 1 & no \\ (0,1,7,1) & & 2 & \\ (0,1,7,1) & 14 & 1 & no \\ (0,1,1,7) & & 2 & \\ (0,1,1,7) & 15 & 1 & yes! \\ \end{array}

This … is not great. If you used BogoSort to sort a deck of cards, your process would look like the following:

One-by-one, go through your deck of cards and see if they’re ordered.
If during this process you spot any cards that are out of order, throw the whole deck in the air, collect the cards together, and start again.

By studying the running time of this algorithm, we can make “not great” into something rigorous:

Claim 4.3. The worst-case running time for BogoSort to sort any list containing more than one element is $\infty$ .

Proof. If $L$ is a list containing $n$ different elements, there are $n!$ many different ways to order $L$ ’s elements (this is ordered choice without repetition, where we think of ordering our list as “choosing” elements one-by-one to put in order.) If all of these elements are different, then there is exactly one way for us to put these elements in order.

Therefore, on each iteration BogoSort has a $\frac{1}{n!}$ chance of successfully sorting our list, and therefore has a $\frac{n!-1}{n!}$ chance of failing to sort our list. For any $n >1$ , $n! -1$ is nonzero, and so the chance that our algorithm fails on any given iteration is nonzero.

Therefore, in the worst-case scenario, it is possible for our algorithm to just fail on each iteration, and thereby this algorithm could have infinite run-time.

\square

In terms of running time, then, we’ve shown that BogoSort has a strictly worse runtime than $\texttt{SelectionSortSteps}$ , as $\infty > \dfrac{3n^2+9n^{2^2}-10}{2}$ . Success!

This sort of comparison was particularly easy to perform, as one of the two things we were comparing was $\infty$ . However, this comparison process can get trickier if we examine more interesting algorithms. Let’s consider a third sorting algorithm:

Algorithm 4.7. The following algorithm, MergeSort $(L)$ , takes in a list $L =(l_1, l_2, \ldots, l_n)$ of $n$ numbers and orders it from least to greatest. It does this by using the following algorithm:

If $L$ contains at most one number, $L$ is trivially sorted! In this situation, stop.
Otherwise, $L$ contains at least two numbers. In this case,

(a) Split $L$ in half into two lists $L_1, L_2$ .

(b) Apply MergeSort to each of $L_1, L_2$ to sort them.
Now, we “merge” these two sorted lists:

(a) Create a big list with $n$ entries in it, all of which are initially blank.

(b)Compare the first element in $L_1$ to the first element in $L_2$ . If $L_1, L_2$ are both sorted, these first elements are the smallest elements in $L_1, L_2$ .

(c) Therefore, the smaller of those two first elements is the smallest element in our entire list. Take it, remove it from the list it came from, and put it in the first blank location in our big list.

(d) Repeat (b)+(c) until our big list is full!

As before, to better understand this algorithm, let’s run it on an example list like $(1,7,1,0,2)$ :

\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new list} \\\hline (1,7,1,0,2) & 1 & & &\\ & 2(a) & \fcolorbox{red}{transparent}{(1,7)} & \fcolorbox{blue}{transparent}{(1,0,2)} &\\ & 2(b) & \fcolorbox{red}{transparent}{(1,7)} & \fcolorbox{blue}{transparent}{(0,1,2)} &\\ & 3(a) & & & (\text{\textunderscore},\text{\textunderscore},\text{\textunderscore},\text{\textunderscore},\text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{red}{transparent}{(1,7)} & \fcolorbox{blue}{transparent}{(1,2)} & (0,\text{\textunderscore},\text{\textunderscore},\text{\textunderscore},\text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{red}{transparent}{(7)} & \fcolorbox{blue}{transparent}{(1,2)} & (0,1,\text{\textunderscore},\text{\textunderscore},\text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{red}{transparent}{(7)} & \fcolorbox{blue}{transparent}{(2)} & (0,1,1,\text{\textunderscore},\text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{red}{transparent}{(7)} & \fcolorbox{blue}{transparent}{()} & (0,1,1,2,\text{\textunderscore} )\\ & 3(b+c),(d) &\fcolorbox{red}{transparent}{()} & \fcolorbox{blue}{transparent}{()} & (0,1,1,2,7)\\ \end{array}

\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new } \\\hline (1,7) & 1 & & &\\ & 2(a) & \fcolorbox{green}{transparent}{(1)} & \fcolorbox{pink}{transparent}{(7)} &\\ & 2(b) & \fcolorbox{green}{transparent}{(1)} & \fcolorbox{pink}{transparent}{(7)} &\\ & 3(a) & & & (\text{\textunderscore},\text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{green}{transparent}{()} & \fcolorbox{pink}{transparent}{(7)} & (1,\text{\textunderscore}, )\\ & 3(b+c),(d) &\fcolorbox{green}{transparent}{()} & \fcolorbox{pink}{transparent}{()} & (1,7)\\ \end{array}

\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new } \\\hline (1, 0, 2) & 1 & & &\\ & 2(a) & \fcolorbox{teal}{transparent}{(1)} & \fcolorbox{purple}{transparent}{(0,2)} &\\ & 2(b) & \fcolorbox{teal}{transparent}{(1)} & \fcolorbox{purple}{transparent}{(0,2)} &\\ & 3(a) & & & (\text{\textunderscore},\text{\textunderscore}, \text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{teal}{transparent}{(1)} & \fcolorbox{purple}{transparent}{(2)} & (0,\text{\textunderscore},\text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{teal}{transparent}{()} & \fcolorbox{purple}{transparent}{(2)} & (0,1,\text{\textunderscore}, )\\ & 3(b+c),(d) &\fcolorbox{teal}{transparent}{()} & \fcolorbox{purple}{transparent}{()} & (0,1,2)\\ \end{array}

\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new } \\\hline (1) & 1 & & &\\ \end{array}

\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new } \\\hline (1) & 1 & & &\\ \end{array}

\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new } \\\hline (0,2) & 1 & & &\\ & 2(a) & \fcolorbox{orange}{transparent}{(0)} & \fcolorbox{pink}{transparent}{(2)} &\\ & 2(b) & \fcolorbox{orange}{transparent}{(0)} & \fcolorbox{pink}{transparent}{(2)} &\\ & 3(a) & & & (\text{\textunderscore},\text{\textunderscore} )\\ & 3(b+c) &\fcolorbox{orange}{transparent}{()} & \fcolorbox{pink}{transparent}{(2)} & (0,\text{\textunderscore}, )\\ & 3(b+c),(d) &\fcolorbox{orange}{transparent}{()} & \fcolorbox{pink}{transparent}{()} & (0,2)\\ \end{array}

\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new } \\\hline (0) & 1 & & &\\ \end{array}

\begin{array}{c|c|c|c|c} \textrm{original L} & \textrm{step} & L_1 & L_2 & \textrm{new } \\\hline (2) & 1 & & &\\ \end{array}

Here, we use the colored boxes to help us visualize the recursive applications of $\texttt{MergeSort}$ to smaller and smaller lists.

As before, it is worth taking a moment to explain why this algorithm works:

Claim 4.4. $\texttt{MergeSort}$ (i.e. algorithm 4.7) works.

Proof. We proceed in the same way as when we studied $\texttt{SelectionSort}$ :

Does the algorithm have any bugs? Nope! The only things we do in our algorithm are compare elements, split our lists, and copy elements over. Those are all well-defined things that can be done without dividing by zero or other sorts of disallowed operations!
Does the algorithm run forever? Nope!

To see why, let’s make a recurrence relation like we did before. To be precise:

Claim 4.5.Link copied If we let $\texttt{MergeSortSteps}(n)$ denote the maximum number of steps needed by MergeSort to sort a list of $n$ elements, then $\texttt{MergeSortSteps}(n) = 1 + 4n + 2\texttt{MergeSortSteps}( n/2)$ if $n$ is even.

Proof. By definition, MergeSort performs the following operations:
- In step 1, it performs one operation (it looks up the size of the list.)
- In step 2, it splits the list in half. We can do this with at most $n$ write operations by just making two blank lists of size $n/2$ and copying elements over one-by-one to these new lists. It then runs MergeSort on each half. Doing this gives us an extra $2\cdot \texttt{MergeSortSteps}(n/2)$ steps.
- In step 3, we repeatedly compare the first element in $L_1$ to the first element in $L_2$ , remove the smaller of the two elements from that list, and put it into our big list. This is one comparison and two write operations, which we did as many times as we had elements in our original list; so we have at most $3\cdot n$ operations here in total.
In total, then, we have

$\texttt{MergeSortSteps}(n)$
$= 1 + n + 2\cdot \texttt{MergeSortSteps}(n/2) + 3\cdot n$
$= \boxed{1 + 4n + 2\cdot \texttt{MergeSortSteps}(n/2)}.$
$\square$

If $n$ is odd, this gets a bit more annoying and we’d have $MergeSortSteps$ (” $n/2$ rounded up”) plus $MergeSortSteps$ (” $n/2$ rounded down”) here instead. This doesn’t materially change things, but it is annoying enough to warrant just looking at the even cases.

If we just round any odd-length list up to an even-length list by adding a blank cell, this gives us a recurrence relation that lets us reduce the task of calculating $\texttt{MergeSortSteps}(n)$ for any $n$ to the task of calculating smaller values of $\texttt{MergeSortSteps}(n)$ . Therefore this process cannot run forever, by the same reasoning as with $\texttt{SelectionSortSteps}(n)$ !
Does the algorithm sort our list? Yes!

To see why, make the following observations:
- Our algorithm trivially succeeds at sorting any list with one element.
- Our algorithm also succeeds at sorting any list with 2 elements. To see why, note that by definition, it takes those two elements and splits them into two one-element lists. From there, it puts the smaller element from those two lists into the first position, and puts the larger in the second position, according to our rules. That’s a sorted list!
- Now, consider any list on 3 or 4 elements. By definition, our algorithm will do the following:
  - It takes that list, and splits it into two lists of size 1 or 2.
  - It sorts those lists (and succeeds, by our statements above!)
  - It then repeatedly takes the smaller of the first elements in either of those two lists, removes it from that list, and puts it into our larger list. Because those small lists are sorted, each time we do this we’re removing the smallest unsorted element (as the first element in a sorted list is its smallest element.) Therefore, this process puts elements into our largest list in order by size, and thus generates a sorted list.
  So our process works on lists of 3 or 4 elements!
- The same logic will tell us that our process works for lists on 5-8 elements: because any list on 5-8 elements will split into two lists of size at most 4, and our process works on lists of size at most 4, it will continue to succeed!
- In general, our process will always work! This is because our process works by splitting $n$ in half, and thus it reduces the task of sorting a list of size $n$ into the task of sorting two lists of size $\approx n/2$ . Repeatedly performing this reduction eventually reduces our task to just sorting a bunch of small lists, which we’ve shown here that we can do.

\square

As well, by using the same sort of “guess-a-pattern” idea from before, we can refine this to a closed-form solution!

Note that if we use our definition, we can calculate the following values for $\texttt{MergeSortSteps}(2^k)$ . By definition, $\texttt{MergeSortSteps}(1) = 1$ , as this algorithm stops when given any list of length 1. Therefore, by repeatedly using Claim 4.5, we can get the following:

$\texttt{MergeSortSteps}(2^1) = 1 + 4\cdot2 + 2\cdot\texttt{MergeSortSteps}(2/2) = 1 + 8 + 2\cdot\texttt{MergeSortSteps}(1) = 11,$
$\texttt{MergeSortSteps}(2^2) = 1 + 4\cdot4 + 2\cdot\texttt{MergeSortSteps}(4/2) = 1 + 16 + 2\cdot\texttt{MergeSortSteps}(2) = 39,$
$\texttt{MergeSortSteps}(2^3) = 1 + 4\cdot8 + 2\cdot\texttt{MergeSortSteps}(8/2) = 1 + 32 + 2\cdot\texttt{MergeSortSteps}(4) = 111,$
$\texttt{MergeSortSteps}(2^4)= 1 + 4\cdot16 + 2\cdot\texttt{MergeSortSteps}(16/2) = 1 + 64 + 2\cdot\texttt{MergeSortSteps}(8) = 287,$
$\texttt{MergeSortSteps}(2^5) = 1 + 4\cdot32 + 2\cdot\texttt{MergeSortSteps}(32/2) = 1 + 128 + 2\cdot\texttt{MergeSortSteps}(16) = 703,$
$\texttt{MergeSortSteps}(2^6) = 1 + 4\cdot64 + 2\cdot\texttt{MergeSortSteps}(64/2) = 1 + 256 + 2\cdot\texttt{MergeSortSteps}(32) = 1663,$
$\texttt{MergeSortSteps}(2^7) = 1 + 4\cdot128 + 2\cdot\texttt{MergeSortSteps}(128/2) = 1 + 512 + 2\cdot\texttt{MergeSortSteps}(64) = 3839,$

which in table form is the following:

\begin{array}{c||c|c|c|c|c|c|c} k & 1 & 2 & 3 & 4 & 5 & 6 & 7\\\hline \texttt{MergeSortSteps}(2^k) & 11 & 39 & 111 & 287 & 703 & 1663 & 3839\\ \end{array}

(Note that we’re using $2^k$ as the length of our lists. This is because our result only works for even numbers, so we want something that stays even when we keep dividing it by 2.)

Spotting the pattern here is a pain without more advanced mathematics; even the Online Encyclopedia of Integer Sequences doesn’t recognize it. WolframAlpha, however, does!

Again, this is a claim whose proof will have to wait for Section 7.8. For now, though, let’s take the following as given:

Claim 4.6. $\texttt{MergeSortSteps}(2^k) = k \cdot 2^{k+2} + 2^{k+1} - 1$ , for every natural number $k$ .

We can easily extend this to a list whose length is not a power of two by just “rounding its length up to the nearest power of 2” by adding in some blank cells: i.e. if we had a list of length 28, we’d add in 4 blank cells to get a list of length 32.

If we do this, then Claim 4.6 becomes the following:

Observation 4.10. $\texttt{MergeSortSteps}(n) \leq k \cdot 2^{k+2} +2^{k+1} - 1$ , where $2^k$ is $n$ rounded up to the nearest power of 2.

To simplify this a bit to just write things in terms of $n$ , notice that if $2^k$ is $n$ rounded up to the nearest power of 2, then $k = \lceil\log_2(n)\rceil.$ Plug this into Observation 4.10, and you’ll get the following:

Recall that $\lceil x \rceil$ is ” $x$ rounded up.”

\texttt{MergeSortSteps}(n)

= \leq \lceil\log_2(n)\rceil \cdot 2^{\lceil\log_2(n)\rceil + 2} + 2^{\lceil\log_2(n)\rceil+1} - 1

\leq (\log_2(n) + 1) 2^{(\log_2(n) + 1) + 2} + 2^{(\log_2(n)+ 1)+1} - 1

= (\log_2(n) + 1) 2^{\log_2(n)}\cdot 2^{3} + 2^{\log_2(n)} \cdot 4 - 1

= 8n\log_2(n) + 8n + 4n -

= \boxed{8n\log_2(n) + 12n -1.}

Two useful tricks we’re using in this calculation:

$x \leq \lceil x \rceil$ . That is: rounding a number up only makes it larger.
$\lceil x \rceil < x+1$ . That is: rounding up never increases a number by more than 1.

Nice! This is worth making into its own observation:

Observation 4.11. $\texttt{MergeSortSteps}(n) \leq 8n\log_2(n) + 8\log_2(n) - 2n - 1$ , for every positive integer $n>1$ .

\square

Comparing Runtimes: Limits

Now, we have an interesting problem on our hands: between $\texttt{MergeSort}$ and $\texttt{SelectionSort}$ , which of these two algorithms is the most efficient?

That is: in terms of functions, which of the following is better?

$\boxed{8n\log_2(n) + 12n -1}$ vs. $\boxed{\dfrac{3n^2+9n-10}{2}}$

To start, we can make a table to compare values:

\begin{array}{c||c|c|c|c|c|c|c} n & 1 & 2 & 3 & 4 & 5 & 6 & 7\\\hline \dfrac{3n^2+9n^{2^2}-10}{2} & & 1 & 10 & 22 & 37& 55 & 76 & 100\\ 8n\log_2(n) + 8\log_2(n) - 2n - 1 & 11 & 39 & 73.0 & 111 & 151.8 & 195.0 & 240.2\\ \end{array}

It looks like the $\texttt{MergeSort}$ algorithm needs more steps than $\texttt{SelectionSort}$ so far. However, this table only calculated the first few values of $n$ ; in real life, however, we often find ourselves sorting huge lists! So: in the long run, which should we prefer?

To answer this, we need the idea of a limit:

Definition 4.4. Given any function $f$ that is defined on the natural numbers $\mathbb{N}$ , we say that $\displaystyle \lim_{n \to \infty} f(n) = L$ if ” as $n$ goes to infinity, $f(n)$ gets closer and closer to $L$ .” In particular, we say that $\displaystyle \lim_{n \to \infty} f(n) = +\infty$ if “as $n$ goes to infinity, $f(n)$ also grows without bound.”

If you would like a more rigorous definition here than “gets close”, look into classes like Maths 130 and Maths 250!

This is a tricky concept! To understand it, let’s look at a handful of examples:

Problem 4.1. What is $\displaystyle \lim_{n \to \infty} \frac{1}{2-\frac{1}{n}}$ ?

Answer. First, notice that as $n$ goes to infinity, $\frac{1}{n}$ goes to 0, because its denominator is going to infinity while the numerator stays fixed.

Therefore, $2-\frac{1}{n}$ goes to 2, and so $\displaystyle \lim_{n \to \infty} \frac{1}{2-\frac{1}{n}} = \frac{1}{2}.$

Problem 4.2. What is $\displaystyle \lim_{n \to \infty} \log_2(n)$ ?

Answer. $\displaystyle \lim_{n \to \infty} \log_2(n) = \infty$ . This is because for any positive integer $k$ , we can make $\log_2(n) \geq k$ by setting $n$ to be any value $\geq 2^k$ .

In other words, as $n$ grows, it eventually gets larger than $2^k$ for any fixed $k$ , and thus $\log_2(n)$ itself grows without bound.

Problem 4.3. What is $\displaystyle \lim_{n \to \infty} \frac{1}{\log_2\left( \frac{1}{n}\right)}$ ?

Answer. To find this limit, we simply break our function down into smaller pieces:

First, notice that as $n$ takes on increasingly large positive values, $\frac{1}{n}$ goes to 0 and stays positive.
Therefore, $\log_2\left( \frac{1}{n}\right)$ goes to negative infinity, as $\log_2($ tiny positive numbers $)$ yields increasingly huge negative numbers.
Therefore, $\frac{1}{\log_2\left( \frac{1}{n}\right)}$ goes to 0, as $\frac{1}{\textrm{huge negative numbers}}$ yields tiny negative numbers.

In total, then, $\displaystyle \lim_{n \to \infty} \frac{1}{\log_2\left( \frac{1}{n}\right)} = 0.$

Problem 4.4. What is $\displaystyle \lim_{n \to \infty} \frac{n^2 + 3n-4}{n^3+2}$ ?

Answer. It is tempting to just “plug in infinity” into the fraction above, and say that

“because $\frac{\infty^2 + 3\infty-4}{\infty^3+2} = \frac{\infty}{\infty} = 1,$ our limit is 1”

However, you can’t do manipulations like this with infinity! For example, because $\frac{1}{n} = \frac{n}{n^2}$ , we have

\lim_{n \to \infty} \frac{n}{n^2} = \lim_{n \to \infty} \frac{1}{n} = 0,

even though the method above would say that

“because $\frac{\infty}{\infty^2} = \frac{\infty}{\infty} = 1,$ our limit is 1.”

The issue here is that there are different growth rates at which various expressions approach infinity: i.e. in our example above, $n^2$ approaches infinity considerably faster than $n$ , and so the ratio $\frac{n}{n^2}$ approaches 0 even though the numerator and denominator individually approach infinity.

Instead, if we ever have both the numerator and denominator approaching $+\infty$ , we need to first simplify our fraction to proceed further! In this problem, notice that if we divide both the numerator and the denominator by $n^3,$ the highest power present in either the numerator or denominator, we get the following:

\frac{n^2 + 3n-4}{n^3+2}\cdot \frac{1/n^3}{1/n^3} = \frac{\frac{1}{n} + \frac{3}{n^2} - \frac{4}{n^3}}{1 + \frac{2}{n^3}}.

As noted above, each of $\frac{1}{n}, \frac{3}{n^2}, -\frac{4}{n^3}, \frac{2}{n^3}$ go to 0 as $n$ goes to infinity, because their denominators are going off to infinity while the numerator is staying fixed.

Therefore, we have

\displaystyle \lim_{n \to \infty} \frac{n^2 + 3n-4}{n^3+2} =\lim_{n \to \infty} \frac{\frac{1}{n} + \frac{3}{n^2} - \frac{4}{n^3}}{1 + \frac{2}{n^3}} = \lim_{n \to \infty} \frac{0+0+0}{1+0} = \boxed{0}.

With the idea of limits in mind, we can now talk about how to compare functions:

Definition 4.5. Let $f(n)$ and $g(n)$ be two functions which depend on $n$ . We say that the function $f(n)$ grows faster than the function $g(n)$ if $\displaystyle \lim_{n \to \infty} \dfrac{|f(n)|}{|g(n)|} = +\infty.$

Intuitively, this definition is saying that for huge values of $n$ , the ratio of $f(n)$ to $g(n)$ goes to infinity: that is, $f(n)$ is eventually as many times larger than $g(n)$ as we could want.

We work an example of this idea here, to see how it works in practice:

Example 4.14. We claim that the function $f(n)=n^2+2n+1$ grows faster than $g(n)=5n+5$ . To see this, by our definition above, we want to look at the limit $\displaystyle \lim_{n \to \infty} \dfrac{n^2+2n+1}{5n+5}$ .

Notice that we can factor the numerator into $(n+1)^2$ . Plugging this into our fraction leaves us with $\dfrac{(n+1)^2}{5(n+1)}$ , which we can simplify (as in Problem 4.4.) to $\dfrac{n+1}{5}$ .

As $n$ goes to infinity, $\dfrac{n+1}{5}$ goes to infinity.

This grows without bound: as $n$ goes to infinity, so does $\dfrac15n + \dfrac15$ ! Therefore we’ve shown that $n^2+2n+1$ grows faster than $5n+5$ .

To deal with a comparison like the one we’re trying to do between $\texttt{MergeSort}$ and $\texttt{SelectionSort}$ , however, we need some more tricks!

Limit Techniques and Heuristics

Without calculus, our techniques for limits are a little hamstrung. As those of you who have seen NCEA L3 calculus, things like L’H^opital’s rule are extremely useful for quickly evaluating limits!

With that said, though, we do have a handful of useful techniques and heuristics that we can use to get by. Here’s an easy (if not very rigorous) technique:

Observation 4.12. Plugging In Values. Probably the simplest thing you can do, when given a limit, is just physically plug in numbers and figure out where the function is going.

For instance, suppose that we wanted to compare the runtime of our two functions $\texttt{MergeSortSteps}$ and $\texttt{SelectionSortSteps}$ . By definition, this means that we want to find the limit

\displaystyle \lim_{n \to \infty} \frac{\frac{3n^2+9n-10}{2}}{8n\log_2(n) + 12n -1}

To do this, we could just plug in various values of $n$ and see what happens!

\begin{array}{c|c} x & f(x) \\\hline 1 & 0.09 &\\ 10 & 0.49 &\\ 100 & 2.27 &\\ 1000 & 16.40 &\\ 10000 & 126.83 &\\ \end{array}

It certainly looks like $f(x)$ is growing arbitrarily large, so we could quite reasonably believe that the number of steps required to calculate $\texttt{SelectionSort}$ is growing faster than the number of steps needed to calculate $\texttt{MergeSort}$ (and thus $\texttt{MergeSort}$ is the preferable algorithm.)

However, this method has its limitations:

One issue with the above is that it is fairly prone to human error if you’re manually doing this by plugging numbers into a calculator. That is: if you have a limit like $\displaystyle\lim_{x \to \infty} \frac{x^3 -3x^2 + 3x}{x^2-x}$ , and you’re trying to plug in something like $x=1000000$ into that fraction, it’s going to be really easy to forget a zero in one of those $x$ expressions.
Another is that it can be pretty hard to tell whether or not you’re actually plugging in enough values to figure out the pattern! For example, when we made our table to compare the runtimes of these two functions by listing values from 1 through 7, we thought that $\texttt{MergeSortSteps}$ was growing faster. This larger table seems to be telling the opposite story: but maybe the situation will reverse itself again if we zoom out further!

To give a second example, suppose that someone asked you to calculate
$\displaystyle\lim_{n \to \infty} \log_2(\log_2(\log_2(\log_2(\log_2(n)))))$
If you plugged in $n= 100, 1000, 10000,$ , you’d get $\approx -0.9,-0.34,-0.11$ . This looks like it’s slowly increasing to 0, so you’d be tempted to guess that
$\displaystyle\lim_{n \to \infty} \log_2(\log_2(\log_2(\log_2(\log_2(n)))) = 0.$
This is very false! As we saw before, as $n$ goes to infinity, $\log_2(n)$ goes to infinity. Therefore, $\log_2(\log_2(n))$ also goes to infinity, and in general any composition of logs will eventually go off to infinity as well: i.e. $+\infty$ is the correct limit here. So plugging things in can lead us to make mistakes!

A second technique, that we used in several of our worked problems earlier, is the following:

Observation 4.13. Simplifying Fractions. In the special case where your limit has the form $\displaystyle \lim_{x \to \infty} \frac{f(x)}{g(x)}$ , a useful technique you can try is simpliflication! Basically, take your fraction $\dfrac{f(x)}{g(x)}$ , and try to simplify it by factoring the top and bottom and canceling terms.

This often gives you a new function where you no longer have the top and bottom going to zero, which is often much easier to work with.

Example 4.15. If we had the limit $\displaystyle\lim_{x \to +\infty} \frac{x^2-x}{x}$ , we could factor an $x$ out of the top and bottom to get $\displaystyle\lim_{x \to \infty} x-1$ . This is $+\infty$ .

For a second example, if we had the limit $\displaystyle\lim_{x \to \infty} \frac{x^2-x}{x^2}$ , we could factor an $x^2$ out of the top and bottom to get the simplified limit $\displaystyle\lim_{x \to +\infty} \frac{1-\frac{1}{x}}{1}$ . The numerator goes to 1 and the denominator just is 1, so this is 1.

Sometimes, however, we don’t have something that we can easily express as a fraction:

Problem 4.5. What is the limit

\displaystyle\lim_{x \to +\infty} \frac{1}{1-2^{1/x}}

To approach this limit, we need another technique:

Observation 4.14. Break It Down. Given a limit $\displaystyle \lim_{x \to +\infty} f(x)$ , we can often figure out what’s going on with it by breaking our functions down into small pieces, looking at what those individual pieces do as $x$ goes to $+\infty$ , and then slowly “zooming back out” to see what the whole function does.

To understand this method, let’s apply it to Problem 4.5.

Answer to Problem 4.5. Understanding the function in our limit all at once is hard! But, notice that for very large values of $x$ , we know that

$1/x$ is a very small positive number, therefore
$2^{1/x}$ is basically $2^{\textrm{basically }0, \textrm{ positive}}$ , which is slightly larger than 1, therefore
$1- 2^{1/x}$ is basically $1 - ($ slightly larger than 1 $)$ , and so in turn is a tiny negative number, therefore
$\dfrac{1}{1-2^{1/x}}$ is 1 over a tiny negative number, and thus is an increasingly huge negative number.

Therefore, as $x$ goes to positive infinity $\dfrac{1}{1-2^{1/x}}$ goes to $-\infty$ , and we’ve found our limit by breaking our function down into smaller pieces!

Observation 4.15. Heuristics. Our last limit idea is the following: with some calculus (take Maths 130!), you can prove that $\footnotesize Constants << Logarithms << Polynomials << Exponentials <<Factorials,$
where ” $<<$ ” means “grows slower than.”

Within those groups, we sort these expressions by degrees and bases: i.e.

n << n^2 << n^3 << n^4 << \ldots,

and

2^n << 3^n << 4^n << 5^n << \ldots,

and so on / so forth.

Finally, any sum of expressions grows as fast as its largest expression: i.e. a factorial plus a log plus a constant grows at a factorial rate, a $n^4$ plus a log plus a square root grows as fast as $n^4$ grows, etc.

As a brief justification for this, let’s look at a table:

\begin{array}{c||c|c|c|c|c} Runtime~vs.~Input & 10 & 20 & 30 & 40 & 50\\ \hline 1 & 1 \cdot 10^{-6} sec. & 1 \cdot 10^{-6} sec. & 1 \cdot 10^{-6} sec. & 1 \cdot 10^{-6} sec. & 1 \cdot 10^{-6} sec.\\ \log_2(n) & 3.3 \cdot 10^{-6} sec. & 4.3 \cdot 10^{-6} sec. & 4.9 \cdot 10^{-6} sec.& 5.3 \cdot 10^{-6}sec. & 5.6 \cdot 10^{-6} sec.\\ n & 1 \cdot 10^{-5} sec. & 2 \cdot 10^{-5} sec. & 3 \cdot 10^{-5} sec. & 4 \cdot 10^{-5} sec. & 5 \cdot 10^{-5} sec. \\ n^2 & 1 \cdot 10^{-4} sec. & 4 \cdot 10^{-4} sec. & 9 \cdot 10^{-4} sec. & 1.6 \cdot 10^{-3} sec. & 2.5 \cdot 10^{-3} sec. \\ n^3 & 1 \cdot 10^{-3} sec. & 8 \cdot 10^{-3} sec. & .027 sec. & .064 sec. & .125 sec. \\ n^5 & .1 sec. & 3.2 sec. & 24.3 sec. & 1.7 min. & 5.2 min. \\ 2^n & 1 \cdot 10^{-3} sec. & 1 sec. & 17.9 min. & 12.7 days. & 35.7 years \\ n! & 3.63 sec. & 77146 years & 8.4 \cdot 10^{18} years & 2.6 \cdot 10^{34} years & 9.6 \cdot 10^{50} years\\ \end{array}

Above, we’ve plotted five algorithms with runtimes $1, \log_2(n), n, n^2, n^3, n^5, 2^n, n!$ versus input sizes for $n$ ranging from $10$ to $50$ , with the assumption that we can perform one step every $10^{-6}$ seconds.

In this table, you can make the following observations:

The constant-runtime algorithm is great!
The logarithmic-runtime algorithm is also pretty great!
The polynomial-runtime algorithms all take longer than the constant or log algorithm, but are all at least reasonable.
The exponential runtime algorithm starts off OK, but gets horrible fast\ldots
… but is somehow not as bad as the factorial-runtime algorithm, which is almost immediately unusable for any value of $n$ .

We can use this heuristic to quickly answer our question about which of $\texttt{MergeSortSteps}$ and $\texttt{SelectionSortSteps}$ is preferable, without having to plug in any numbers at all!

Claim 4.7. $\texttt{SelectionSortSteps}$ grows faster than $\texttt{MergeSortSteps}$ .

Proof. As noted before, by definition, we want to find the limit

\displaystyle \lim_{n \to \infty} \frac{\tfrac{3n^2+9n-10}{2}}{8n\log_2(n) + 12n -1}.

By dividing through both sides by $n$ , this means that we’re looking at the ratio

\displaystyle \lim_{n \to \infty} \frac{1.5n+4.5-\frac{5}{n}}{8\log_2(n) + 12- \frac{1}{n}}.

The top is a linear expression (i.e. polynomial of degree 1), as the largest-growing object in the numerator is the $1.5n$ term. The bottom, conversely, is a logarithmic expression, as the fastest-growing object in the denominator is the $8\log_2(n)$ term.

Linear expressions grow much faster than logs, so our “plug things in” step didn’t lie: $\texttt{SelectionSortSteps}$ is growing faster than $\texttt{MergeSortSteps}$ (and thus $\texttt{MergeSort}$ is the preferable algorithm, as in the long run it will need to perform less operations to get to the same answer.)

\square

To study one more example of this idea and finish our chapter, let’s use this principle to answer our last exercise:

Answer to 4.1. In this problem, we had two multiplication algorithms and wanted to determine which is best.

Algorithm 4.1. calculated $a \cdot b$ by basically just calculating $\overbrace{b+b+\ldots +b}^{a\textrm{ times}}$ . As such, its runtime is pretty much just some constant times $a$ ; we need $a$ iterations of this process to calculate the answer here, and we can let our constant be the number of steps we perform in each iteration.

Conversely, Algorithm 4.2. ran by taking $a$ and repeatedly subtracting 1 from $a$ and dividing $a$ by 2 until it was 0 (while doing some other stuff.)

For the purpose of this problem, we don’t have to think about why this process works, though look at practice problem 6 if you’re curious! Instead, if we simply take it on faith that this algorithm does work, it’s easy to calculate how long it takes to complete: it will need as many iterations as it takes to reduce $a$ to 0 by these operations.

In the worst-case scenario for the number of iterations, we only ever divide by 2; i.e. $a$ is a power of 2. In this case, it takes $k$ iterations if $a=2^k$ ; i.e. we need $\log_2(a)$ iterations, and thus need a constant times $\log_2(a)$ many operations.

As we saw before, logarithms grow much slower than linear functions! Therefore, Algorithm 4.2. is likely the better algorithm to work with.

Practice Problems

(-) Your two friends, Jiawei and Francis, have written a pair of algorithms to solve a problem. On an input of size $n$ , Jiawei’s algorithm needs to perform $n^2 + 2n$ calculations to find the answer, while Francis’s algorithm needs to perform $\log_2(n) + 2^n$ calculations to find the same answer.

Jiawei says that their algorithm is faster, because their runtime is polynomial while Francis’s algorithm has an exponential in it. Francis says that their algorithm is faster, because their algorithm has a logarithm in it while Jiawei’s is polynomial.

Who is right, and why?
(-) Let $f(x) = 10^x + 1, g(x) = x-3, h(x) = \log_{10}(x^3+3)$ and $j(x) = \sqrt[3]{x}$ . Calculate the following compositions:
- $f \circ h (x).$
- $h \circ j (x).$
- $g \circ f(x).$
- $f \circ g(x).$
We say that a function $f: A \to B$ has an inverse if there is some function $f^{-1}: B \to A$ such that $f \circ f^{-1} (x) = x = f^{-1} \circ f$ for all $x$ : that is, $f^{-1}$ “undoes” $f$ , and vice-versa. For example, $\log_2(x)$ and $2^x$ are inverses, as is $x^5$ and $\sqrt[5]{x}$ .
- Suppose that $f(x) = 3x+2$ . Find $f(x)$ ’s inverse.
- Now, suppose that $g(x) = 3$ . Does this constant function have an inverse?
Let $A$ be the collection of all students enrolled at the University of Auckland and $B$ be the collection of classes currently running at the University of Auckland. Which of the following are functions?
- The rule $f: A \to B$ , that given any student outputs the classes they’re enrolled in.
- The rule $f: B \to A$ , that given any class outputs the student with the top mark in that class.
- The rule $f: A \to A$ , that given any student outputs that student.
- The rule $f: B \to B$ , that given any class outputs its prerequisites.
You’re a programmer! You’ve found yourself dealing with a program $\texttt{puzzle(n)}$ that has no comments in its code, and you want to know what it does. After some experimentation, you’ve found that $\texttt{puzzle(n)}$ takes in as input an integer $n$ , and does the following:

(i) Take $n$ and square it.

(ii) If $n$ is 1, 2, 3 or 6, output $n$ and stop.

(iii) Otherwise, replace $n$ with $(n^2 )\% 10$ , i.e. the last digit of $n^2$ , and go to (ii)

Is $\texttt{puzzle(n)}$ a function, if we think of its domain and codomain as $\mathbb{Z}$ ?
What is the range of $\texttt{puzzle(n)}$ ?

(+) Show that after each step in Algorithm 4.2., the quantity $\boxed{\mathbf{a} \cdot \mathbf{b} + prod}$ is always the same. Using this, prove that Algorithm 4.2. works!
(+) In our answer to 4.1., we used a sort of handwave-y “this takes about $\log_2(a)$ many operations” argument. Let’s make this more rigorous!

That is: let $\texttt{clevermultsteps}(a)$ denote the number of steps that this algorithm needs to multiply a given number $b$ by $a$ .
- Find a recurrence relation $\texttt{clevermultsteps}(a)$ in terms of $\texttt{clevermultsteps}(a/2)$ , that holds whenever $a$ is even.
- Use this relation to calculate $\texttt{clevermultsteps}(a)$ for $a = 2,4,8,16,32$ , and plug these values into either the Online Encyclopedia of Integer Sequences or WolframAlpha. What pattern do you see?
Consider the following sorting algorithm, called $\texttt{BubbleSort}$ :

Algorithm 4.8. The following algorithm, $\texttt{BubbleSort}(L)$ , takes in a list $L =(l_1, l_2, \ldots l_n)$ of $n$ numbers and orders it from least to greatest. It does this by using the following algorithm:

    (i) Compare $l_1$ to $l_2$ . If $l_1 > l_2$ , swap these two values; otherwise, leave them where they are.

    (ii) Now, move on, and compare the “next” pair of adjacent values $l_2, l_3$ . Again, if these elements are out of order (i.e. $l_2 > l_3$ ) swap them.

    (iii) Keep doing this through the entire list!

    (iv) At the end of this process, if you made no swaps, stop: your list is in order, by definition. Stop!

    (v) Otherwise, the list might be out of order! Return to (i).

    (a) Use this process to sort the list $(6,1,4,2,3,2)$ .

    (b) (+) Let $\texttt{BubbleSortSteps}(n)$ denote the maximum number of steps that $\texttt{BubbleSort}$ needs to sort a list of length $n$ . Find an expression for $\texttt{BubbleSortSteps}(n)$ , and calculate its values for $n=1,5,10$ and $100$ .

    (c) What is the smallest number of steps that $\texttt{BubbleSort}$ would need to sort a list of length $n$ ?
Find the following limits:

$\displaystyle \lim_{n \to \infty} \frac{2^n + 2^{n+1}}{2^n - 2^{n-1}}$
$\displaystyle \lim_{n \to \infty} \frac{n^3 + 3n^2 + 3n+1}{n^3- 3n^2 + 3n - 1}$
$\displaystyle \lim_{n \to \infty} 2^{ \log_2(n) - 6}$
$\displaystyle \lim_{n \to \infty} \frac{\sin(2^n + n) + n}{n - \log_2(n)}$
$\displaystyle \lim_{n \to \infty} \log_2\left(\frac{1}{\log_2(n)}\right)$
$\displaystyle \lim_{n \to \infty} \frac{n^n}{n!}$

Which of the three algorithms $\texttt{BubbleSort}$ , $\texttt{MergeSort}$ and $\texttt{SelectionSort}$ is the fastest (i.e. needs the fewest operations) to sort a list of 5 elements? Which needs the fewest to sort a list of 10 elements? How about 100? How about for huge values of $n$ ?