Sets and Strings

Exercise 2.1. You’re trying to break into a safe that has a PIN lock. The safe has two buttons: 0 and 1. The PIN you’re trying to guess is a three-digit sequence of binary numbers, and accepts the last three digits you’ve typed in without needing you to hit enter: i.e. if you typed in “00010”, the safe would open if the pin was either “000” or “001” or “010”.

Sounds easy, right? There’s only eight possible PINs to check (two possibilities per digit, three digits in the PIN $\Rightarrow 2^3 = 8$ possible pins), so we should be able to brute-force the lock by checking all possibilities.

However, the safe is wired to call the cops if more than ten buttons are pressed and the correct PIN is not entered. As such, we can’t use our brute-force approach: that could take $8\cdot 3$ entries!

Is there an approach that is guaranteed to break us into the safe?

Exercise 2.2. You’re a geneticist! As such, you’re working with DNA strands, which we can think of as long strings over the alphabet $\{A, C,G,T\}$ , if we let these letters represent the nucleotides adenine, cytosine, guanine and thymine.

You’ve designed a clever little combination of DNA restriction+polymerase enzymes that do the following: given any string $s$ of DNA strands, every time there’s a substring of the form “\ldots AC \ldots” in $s$ , that substring gets cut out and replaced with “\ldots CCA \ldots”

So, for example, if your DNA strand was “ACGT,” it would get turned into “CCAGT” and then would stay stable from there. If your strand was “ACCT”, however, it would first turn into “CCACT”, and then “CCCCAT.”

Suppose you’re originally working with strings of DNA all of the form “AAAAC,” and you dump them into a bath with your enzymes in it. What would you expect to see at the end of this process?

Earlier in this coursebook, we discussed various properties about numbers (divisibility, modular arithmetic, etc) that are very useful in computer science!

However, numbers are not the only things that we work with in computing systems. We also work heavily with things like passwords, user IDs, databases full of names: i.e. strings! We study these objects in the following section:

Strings

First, let’s define what an alphabet is:

Definition 2.1. An alphabet is any collection of symbols.

Example 2.1. You’re already familiar with the Roman alphabet:

\fbox{A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z}

Another alphabet that comes in handy is the collection of decimal digits! We use this to describe numbers:

\fbox{0,1,2,3,4,5,6,7,8,9}

If we’re working in binary, we use a much smaller alphabet:

\fbox{0,1}

If we’re working with DNA, we’d use

\fbox{A,C,G,T}

as our alphabet, where these represent the four nucleotide bases cytosine [C], guanine [G], adenine [A] or thymine [T].

There are other alphabets that are too big to write down here: for example, the set of all Unicode symbols, or the set of all emojis!

Given an alphabet, it’s often useful to be able to refer to the whole thing with a symbol. We’ll do this by writing something like $\Sigma = \{0,1,2,3,4,5,6,7,8,9\}$ . This notation, where we list our symbols between a pair of curly braces and separate them with commas, tells us that $\Sigma$ is an alphabet containing the ten symbols $0,1,\ldots 9$ .

With the definition of an alphabet in hand, we can define strings:

Definition 2.2. Take any alphabet $\Sigma$ . A string over the alphabet $\Sigma$ is any sequence of letters in an alphabet.

Some people refer to strings as “words:” if you see an author referring to a collection of words over a given alphabet, this is just a synonym for strings!

Example 2.2. If we let $\Sigma$ be the Roman alphabet described earlier, then “cat,” “mongoose,” and “ssssssssssss” are all strings over this alphabet. Note that these strings don’t have to correspond to any particular meaning; they’re just sequences of symbols!

If we let $\Sigma$ be the decimal alphabet, then “123,” “00012,” and “999” are each possible strings over this alphabet. Again, these don’t always have to correspond to numbers! In particular, notice that as strings we think that “00012” and “12” are different things. Even though as numbers they’re equal, as strings they’re quite different: “00012” has zeroes in it, while “12”does not. (That is, think about entering a password on your phone. There, if someone has a password of “00012,” entering “12” shouldn’t unlock your phone!)

We will sometimes not specify an alphabet, and instead just refer to strings by listing their entries. If so, we assume that their alphabet is the most reasonable one to work with that string in (usually either the Roman alphabet, decimal, or binary.)

A particularly useful string to refer to is the empty string "", i.e. the string containing no symbols. We denote this string by writing $\lambda$ .

Strings are incredibly useful in computer science! Essentially every program we have works with data in the form of strings, in the form of ID numbers, names, IP addresses, and just simply the binary strings that encode literally everything that a computer does.

Perhaps the simplest operation to define on strings is length:

Definition 2.3. The length of any string is the number of characters in that string.

Example 2.3. The string “abcdef” has length 6, the string “00000” has length 5, and the string “0123” has length 4.

The idea of length is useful when we’re trying to describe a general string! Many arguments involving strings will start with the sentence “Take a string $s$ over the alphabet $\Sigma$ . Let $n$ be the length of $s$ , and write $s$ as $s_1s_2s_3\ldots s_n$ .”

Note that when we’re working with strings, writing something like ” $s_1s_2s_3\ldots s_n$ ” does not mean that we’re multiplying these things all together as if they were numbers! That is: the string “0123” is not the same thing as the product $0\cdot1\cdot2\cdot3 = 0$ .

This is why it’s important to keep track of the type of thing you’re working with / in general, at the start of problems, to define your variables and notation.

In particular, we can use this to define what it means for two strings to be equal:

Definition 2.4. Take any two strings $s = s_1s_2\ldots s_n$ and $t = t_1t_2 \ldots t_n$ of the same length. We say that $s$ and $t$ are equal if $s_1 = t_1, s_2 = t_2$ , $\ldots$ and $s_n = t_n$ .

In other words, two strings are equal if and only if they are literally character-for-character identical! Note that two strings of different lengths are always nonequal.

Example 2.4.

The strings “00001” and “1” are different. Even though the underlying numbers they represent are the same, these are different-length strings.
If we take the alphabet given by all characters on a keyboard, the strings “12+23” and “10+25” are different. Even though these are the same length and represent the same underlying integer, the characters are different in some places: for instance, the second character of the first string is 2, while the second character of the second string is 0.

A particularly useful operation on strings in computer science is concatenation:

Definition 2.5. Take any two strings $s = s_1s_2 \ldots s_n$ and $t = t_1t_2\ldots t_m$ . The concatenation of $s$ and $t$ , written $st$ , is the string $s_1s_2\ldots s_nt_1t_2\ldots t_m$ .

Example 2.5.

Let $s =$ “song” and $t=$ “bird”. Then $st$ is the string “songbird”.
Let $s=$ “12” and $t=$ “0”. Then $st$ is equal to “120”. Notice that this is very different to what we would mean by writing $st$ if we thought of $s,t$ as integers; there, $st$ would denote $12 \cdot 0 = 0$ !

In general, if you’re using string concatenation on strings of numbers, make sure to indicate this to your reader repeatedly through your working so that they know what you’re doing. The use of quotation marks can help keep things clear: that is, because we wrote $s=$ “12” and $t=$ “0”, we’ve told you that we are thinking of $s,t$ as strings, and thus that concatenation is the appropriate way to combine them.
You can concatenate multiple strings at once: i.e. if $s=$ “3”, $t=$ ”.”, and $u=$ “14159265…”, then $stu$ is just “3.14159265…”

Notice that if $s$ has length $n$ and $t$ has length $m$ , then $st$ has length $n+m$ .

Concatenation is used in tons of practical applications:

Every bank account number is a concatenation of a bank code (telling you what company you bank with,) an account number (which tells the bank who owns this account,) and an account type code (telling you what kind of account that number is attached to.)
We saw that many ID numbers have “check digits” when we worked with modular arithmetic! As such, your full ID number is usually created by concatenating your account number with the check digit.

Related to the idea of concatenation, we have the concepts of prefixes, substring and suffixes:

Definition 2.6. Let $s$ and $t$ be strings. We say that $s$ is a prefix of $t$ if $t$ is just $s$ with some additional stuff possibly tacked on the end: i.e. if we can find a third string $u$ such that $su = t$ .
Similarly, we say that $s$ is a suffix of $t$ if $t$ is just $s$ with some additional stuff possibly tacked on the front: i.e. if we can find a third string $u$ such that $us = t$ .
Finally, we say that $s$ is a substring (alternately, an “infix”) of $t$ if $t$ is just $s$ with some stuff possibly tacked on both the front and end: i.e. if we can find strings $u,v$ such that $usv = t$ .

Example 2.6.

If $t=$ “snowball,” then “snow” is a prefix of $t$ , “ball” is a suffix of $t$ , and “now” is a substring of $t$ .
If $t=$ “112323411,” then “112” is a prefix of $t$ , “323411” is a suffix of $t$ , and “2” is a substring of $t$ .

As a bit of practice with writing arguments, we study a few claims:

Claim 2.1. The empty string $\lambda$ is a prefix, suffix, and substring of every string $t$ .

Proof. Take any string $t$ . Notice that $t = \lambda t = t \lambda = \lambda t \lambda$ , because attaching the empty string to the start or end of any string doesn’t change it. Therefore $\lambda$ meets the definition of being a prefix, suffix, and substring for any other string $t$ ! $\square$

Claim 2.2. If $s$ is a prefix of $t$ , then $s$ is a substring of $t$ .

Proof. If $s$ is a prefix of $t$ , then there is some string $u$ such that $su = t$ . Therefore, we have $\lambda s u = t$ as well, because concatenating the empty string $\lambda$ with any string doesn’t change it! This shows us that $s$ satisfies the definition of substring, as claimed.

We can use this idea of a “substring” to answer our safe-cracking problem:

Answer to Exercise 2.1. Think of the sequence of keys we’re entering into the safe as a string $s$ . If we do this, then the properties we want $s$ to have are the following: we want every three-digit binary string to occur as a substring of $s$ , and we want $s$ to have length at most 10.

As it turns out, we can do this! Enter the following string: “0001011100.” This string has length 10, and contains all possible three-digit pins as subsequences, as shown below. Success!

$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$

We can also use it to answer our DNA puzzle:

Answer to Exercise 2.2.

On one hand, we could just brute-force the answer, by repeatedly looking for “AC” substrings and replacing them with “CCA” substrings:

AAAC	$\to$	AAACCA
AAACCA	$\to$	AACCACA
AACCACA	$\to$	ACCACCCAA
ACCACCCAA	$\to$	CCACCCACCAA
CCACCCACCAA	$\to$	CCCCACCCCACAA
CCCCACCCCACAA	$\to$	CCCCCCACCCCCAAA
CCCCCCACCCCCAAA	$\to$	CCCCCCCCACCCCAAA
CCCCCCCCACCCCAAA	$\to$	CCCCCCCCCCACCCAAA
CCCCCCCCCCACCCAAA	$\to$	CCCCCCCCCCCCACCAAA
CCCCCCCCCCCCACCAAA	$\to$	CCCCCCCCCCCCCCACAAA
CCCCCCCCCCCCCCACAAA	$\to$	CCCCCCCCCCCCCCCCAAAA

In other words: the result is a string of 16 C’s, followed by 4 A’s.

Alternately, we could just notice the following: every time a C moves past an A, we replace that $C$ with two $C$ . Therefore, if we move a $C$ past two $A$ ’s in a row, we’d expect to repeat this “doubling” process twice, and have four $C$ ’s; in general, if we move a $C$ past $n$ $A$ ’s in a row, we’d expect to see $2^n$ $C$ ’s at the end, as we’ve doubled our $C$ ’s $n$ times in this process! This matches our results, as $2^4 = 16$ .

Sets

A second useful object, that we will often study in relation to strings, is the concept of a set:

Definition 2.7. A set $A$ is just a collection of things. We call those things the elements of $A$ , and write $x \in A$ to denote with symbols the statement ” $x$ is an element of $A$ “.
To describe a set, we just list its elements between a pair of curly braces: for example, $\{1,2,3\}$ would be how we would describe the set consisting of the three numbers 1, 2 and 3.

Basically every collection of things in real life can be thought of as a set:

Example 2.7.

The collection of all strings in the Oxford English Dictionary is a set. It contains elements like “heart” and “number,” but not things like “arbleorble.”
The collection of all words in Māori is a set. This set contains elements like “tapawhāa” and “tau” (the Māori words for rectangle and number,) but does not contain strings like “123abc.”
The collection of all commands in C is a set.
The collection of all binary strings of length at most 2 is a set. We could write this set out by listing its elements: $\{\lambda, 0, 1, 00, 01,10,11\}$ .
The “empty” set containing no elements $\{\}$ is a set! We call this the empty set, and refer to this by drawing the symbol $\emptyset$ . This is a fairly useful set to be able to refer to, for the same reasons that 0 is a useful number; it can be handy to talk about “nothing” in a concrete way!
The set of all prime numbers is a set: $\{2,3,5,7,11,13,\ldots\}$
The set of integers $\mathbb{Z}$ , the set of rational numbers $\mathbb{Q}$ , the set of natural numbers $\mathbb{N}$ , and the set of real numbers $\mathbb{R}$ are all sets.
The set of all polynomials with degree at most 3 is a set: it contains things like $2x-4$ and $x^3-3x^2+\pi$ .
The set of all irrational numbers is a set.
The set of all numbers that are solutions to the equation $x^3-3x^2+3x-1 = 0$ is a set. (Specifically, because $x^3-3x^2+3x-1 = (x-1)^3$ , this set is just $\{1\}$ , the set containing only one object, namely 1).

Notice that sets can be finite (in the case of things like “the collection of all English words”) or infinite (in the case of the set of all prime numbers!)

To make our lives easier when working with sets, let’s make a few notational conventions about how we should treat them:

When we’re describing a set, we don’t care about the order in which we list our elements: i.e. $\{$ cat, tag, tact $\}$ and $\{$ tag, cat, tact $\}$ are both the “same” to us! This is because we only care about what things are contained within a set; the order is something that we’ll wind up changing a lot depending on the context (i.e. sometimes alphabetical, sometimes by length…) and isn’t itself something we want to care about.
Similarly, when we’re describing a set, we only want to list each element once. This is because otherwise it would be quite irritating to try to look things up in our set: imagine a dictionary that just listed the word “mongoose” forty times in a row!

As such, if someone gives you a set in which an element is repeated twice, we just remove duplicates: i.e. we say $\{$ cat, tag, tact, tact, tact $\}$ and $\{$ cat, tag, tact $\}$ are the same, and would never write the first thing if we couldn’t help it.
In the case of $\{$ cat, tag, tact $\}$ , we were able to describe our set by just listing its elements. This works for small cases, but becomes quite unwieldly for larger sets: imagine having to write out all of the words in French before discussing the French language!

To deal with this, we have an alternate way of writing sets: you can describe them by giving a property. For instance, when we say “the set of all words in Māori” above, we’re giving you a property that a given string of letters may or may not satisfy (i.e. “is it a word in Māori”), and then taking the set of all words that satisfy that property.

While the sentences we used in our examples above do work as definitions for sets, you can also use the following more “math-y” construction: to describe the set of all strings $s$ with property blah, you can just write $\{ s ~|~ s \textrm{ has property }blah\}$ .

For instance, the set of all odd-length binary strings could be described as the following:
$\{s ~|~ \textrm{length}(s) = 2k+1 \textrm{ for some }k \in \mathbb{Z}, \textrm{ and } s \textrm{ is a binary string}.\}$

We use the notation ” $\in$ ” as shorthand for the word “in.”

The ” $s$ ” on the left tells you the variable name, the divider $|$ just separates the variable from its property, and the text at the right gives the required property.

You can also use the left-hand part to describe the structure of your set’s elements: i.e. something like
$\{\textrm{concatenate}(001, s) ~|~ s \textrm{ is a binary string}\}$
gives you all binary strings that start with the prefix “001”.

One useful concept when working with sets is a notion of “size:”

Definition 2.8. A set $A$ has size $n$ if it contains precisely $n$ different elements. If $A$ contains infinitely many different elements, we say that $A$ has “infinite” size. We denote the size of $A$ by writing $|A|$ .

In maths, the word cardinality is used to refer to the size of a set. If you take papers like Maths 190 or Compsci 225, you can learn to study the idea of “different sizes of infinity” by working with cardinality! In particular, using the idea of a bijection in those courses, you can show that the integers, rationals, and natural numbers somehow all have the same “countable” size of infinity, while the real numbers somehow have a larger and “uncountable” size of infinity.

Example 2.8.

The set $\{1,2,3, \pi, 7\}$ has size 5.
The set of all binary strings of length 2, i.e. $\{00, 01, 10, 11\}$ , has size 4.

Another useful concept when working with sets is the idea of a “subset:”

Definition 2.9. Take two sets $A, B$ We say that $B$ is a subset of $A$ , and write $B \subseteq A$ , if every object in $B$ is also an object in $A$ .

Example 2.9.

Let $A$ be the collection of all University of Auckland ID numbers, and let $B$ be the collection of all University of Auckland ID numbers corresponding to active Compsci 120 students. Then $B$ is a subset of $A$ !
Let $A$ be the set of all binary strings of length 3, and let $B$ be the set of all binary strings with exactly two 1’s.

Then $B$ is not a subset of $A$ . This is because $B$ contains things like ” $11000$ ”, which are not in $A$ . Similarly, $A$ is not a subset of $B$ , because $A$ contains things like ” $000$ ” that are not in $B$ !
Let $A$ be the English language, and $B$ be the collection of all English words that rhyme with “avocado.” Then $B$ is a subset of $A$ , as every word in $B$ is by definition a word in $A$ !

We also have a number of useful operations that we perform on sets:

Definition 2.10. Let $A, B$ be a pair of sets. We define the union of these two sets, $A \cup B$ , to be the collection of all elements that are in either $A$ or $B$ or both.

Example 2.10.

Let $A$ be the collection of all English words with even length and $B$ be the collection of all English words with odd length. In this case, $A \cup B$ is the collection of all English words.
Let $A$ be the collection of all Compsci 120 students that turned in assignment 1, and $B$ be the collection of all Compsci 120 students that attended tutorial 1. Then $A \cup B$ is the collection of all Compsci 120 students who either attended tutorial 1 or turned in assignment 1, or both.

In general, unions work like “or” operations: the union of a set defined by property $A$ with a set defined by property $B$ is just the collection of all elements that satisfy property $A$ or $B$ .
Let $A$ be the collection of the 1000 most common phrases used in spam emails (things like “You be a Winner!!!1!!”) and $B$ be a collection of dodgy email addresses (e.g. "bi11.gates@micr0soft.ie"). Then, the union $A \cup B$ is a good start for a “block list,” i.e. something that an email filter can use to automatically trash certain emails.

Definition 2.11. Let $A, B$ be a pair of sets. We define the intersection of these two sets, $A \cap B$ , to be the collection of all elements that are in both $A$ and $B$ at the same time.

Example 2.11.

Let $A$ be the English language and $B$ be the German language. Then $A \cap B$ is the set of words that are both in English and German at the same time: i.e. words like “alphabet,” “computer” and ”tag” would be in $A \cap B$ , as they are all both English and German words.
Let $A$ be the set of numbers that are multiples of 3, and $B$ be the set of numbers who are multiples of 2. Then $A \cap B$ is the set of numbers that are multiples of both 2 and 3; i.e. it’s the set of all numbers that are multiples of 6!

Like how union was an “or,” intersection works like an “and” operation: that is, the intersection of a set defined by property $A$ with a set defined by property $B$ is just the collection of all elements that satisfy property $A$ and $B$ .
If $A$ is the set consisting of ID numbers of current Compsci 120 students, and $B$ is the set consisting of ID numbers of current Compsci 720 students, then $A \cap B = \emptyset$ , the empty set. (This is because there are no students simultaneously taking 120 and 720!)

Definition 2.12. Let $A, B$ be a pair of sets. We define the difference of these two sets, written $A \setminus B$ or alternately $A - B$ , to be the collection of all elements that are both in $A$ and not in $B$ at the same time.

Example 2.12.

If $A$ was the set of ID numbers for all current Compsci 120 students, and $B$ was the set of ID numbers for Compsci 120 students who attended at least eight tutorials, then $A \setminus B$ is the set of ID numbers for students who attended seven or fewer tutorials (i.e. the ID numbers of students who will not have perfect marks for tutorials. Don’t be in this set!)
If $A$ is the set of prime numbers, and $B$ is set of odd integers, then $A \setminus B$ is the collection of all primes that are not odd: that is, $A \cap B = \{2\}$ .
Let $A$ denote the set of all ASCII strings of length at least 10, $B$ be the set of all English words, and $L_3$ be a list of the 10,000 most common passwords. The set $A \setminus (B \cup L_3)$ is a good start to a list of “acceptable” passwords: i.e. if you were making a login system, you could require all of your users to pick words in $A \setminus (B \cup L_3)$ . Doing this would mean that they have to pick passwords that
- Have length at least 10 (i.e. are in $A$ ),
- Aren’t in a dictionary (i.e. not in $B$ ), and
- Aren’t commonly used (i.e. not in $L_3$ ).
Useful!

Finally, we describe what it means for two sets to be equal:

Definition 2.13. We say that two sets $A, B$ are equal if they both consist of the same elements; that is, if

Every element in $A$ is a element in $B$ , and

Every element in $B$ is also a element in $A$ .

If you go back to our remarks earlier, this should make sense. We said that the only thing we cared about for a set was the elements it contained; i.e. we didn’t care about the order, and we ignored repeats/etc. Therefore, two sets should be the same if they contain the same elements!

A useful proof technique, that we’ll often use to show that two sets are the same is the following. Take two sets $A, B$ that you want to show are equal. Suppose you showed that

every element in $A$ is a element in $B$ , and also
every element in $B$ is a element in $A$ .

Then, by the definition above, we would know that $A$ and $B$ are equal! As such, we can use this two-part approach to prove that many pairs of objects are equal. We study a few examples here, to get the hang of this:

Claim 2.3. Let $A, B$ be any two sets such that $A \subseteq B$ . Then $A \cup B = B$ .

Proof.

We proceed as suggested above:

First, we show that every element in $A \cup B$ is in $B$ . To do this, we note that by definition $A \cup B$ is the set of all elements that are in either $A$ or $B$ . Therefore, if we take any element $s \in A \cup B$ , we either have $s \in A$ or $s \in B$ . This lets us work in cases:

If $s \in A$ , then recall that we’ve assumed that $A \subseteq B$ . By definition, this means that every element in $A$ is also in $B$ . Therefore, we have $s \in B$ .
If $s \in B$ , then we trivially have $s \in B$ .

Therefore, we’ve shown that for any $s\in A \cup B$ , we have $s \in B$ , as desired.

Second, we show that every element in $B$ is in $A \cup B$ . This is not too challenging.

Just notice that $A \cup B$ , by definition, contains all elements in either $A$ or $B$ . Therefore, for any $s \in B$ , we have $s \in A \cup B$ by definition.

This completes the two-way argument, as desired!

This is not the only way to prove that two sets are equal! As always, simply expanding the definitions of both sets can often do the same trick:

Claim 2.4. Let $A, B$ be any two sets. Then $(A \setminus B) \setminus A = \emptyset$ .

Proof.

We proceed by expanding our definitions:

First, notice that $A \setminus B$ , by definition, is the collection of all elements in $A$ that are not in $B$ .
By definition again, $(A \setminus B) \setminus A$ is “(the collection of all elements in $A$ that are not in $B$ ) that are also not in $A$ .”
We can simplify this to “the collection of all elements in $A$ that are not in $B$ or $A$ .”
Every element of $A$ is, um, in $A$ .
Therefore, “the collection of all elements in $A$ that are not in $B$ or $A$ ” is the empty set, as the “not in $A$ ” condition eliminates all of the elements in $A$ .

So we’ve proven our claim!

To close our chapter, we study a trickier example of this process:

Claim 2.5. If $A, B, C$ are three sets, then $A \setminus (B \cup C) = (A \setminus B) \cap (A \setminus C)$ .

Proof.

We again proceed by expanding our definitions:

On the left-hand-side, we have $A \setminus (B \cup C)$ . By definition, $A \setminus (B \cup C)$ consists of all of the elements that are in $A$ , but not in $B \cup C$ .

As well, by definition we know that $B \cup C$ is the set of all elements that are in either $B$ or $C$ , or both.

Therefore, $A \setminus (B\cup C)$ can just be thought of as all of the elements that are in $A$ , but not in either $B$ or $C$ .

On the right-hand-side, we can similarly use our definitions to notice that $A \setminus B$ is the set of all elements that are in $A$ but not $B$ , and that $A \setminus C$ is the set of all elements that are in $A$ but not $C$ .

As a consequence, we have that $(A \setminus B) \cap (A \setminus C)$ is the set of all elements that are both (in $A$ but not $B$ ) and (in $A$ but not $C$ ). Logically, we can simplify this sentence to the condition “in $A$ , but not in either $B$ or $C$ .”

These are the same statements; therefore we’ve shown that these sets are equal! $\square$

Practice Problems

(-) Show that any suffix of a word $t$ is a substring of $t$ .
If $s$ is both a prefix and a suffix of $t$ , then must $s=t$ ? Either show that this is true, or find a counterexample.
Show that if $s$ is a substring of $t$ and $t$ is a substring of $s$ , then $s=t$ .
(+) Suppose our safe in Exercise 2.1 had PIN numbers with decimal digits, not binary (i.e. they could be any digit from 0-9, instead of just 0 and 1). What is the smallest number of buttons we would have to press to guarantee that the safe would open in this situation?
Suppose that $A$ and $B$ are two sets with the following property: every string in $A$ is a prefix of a string in $B$ . Is it possible for $A$ to contain more elements than $B$ ? Either find such an example, or explain why this is impossible.
(-) Explain why $\emptyset \cup L = L$ for any set $L$ .
Show that for any three set $A, B, C$ , that $A \setminus (B \cap C) = (A \setminus B) \cup (A \setminus C)$ . (Try doing so using the method from Claim 2.3, and then try with the method from Claim 2.4! Which do you prefer?)
Take two binary strings $s, t$ of the same length. We say that $s$ and $t$ are orthogonal if they disagree at precisely half of their locations: for example, $s=$ “1111” and $t=$ “1100” are orthogonal.

(-) Show that if $s, t$ are odd-length strings, then $s$ and $t$ cannot be orthogonal.
Find a set consisting of four length-4 strings that are all orthogonal to each other (i.e. every possible pair of strings in your set should be orthogonal).
(+) Find a set consisting of $2^n$ length- $2^n$ strings that are all orthogonal to each other.
(++) What is the largest set of orthogonal length-668 strings that you can make?

$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$

$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$

$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$
$0$	$0$	$0$	$1$	$0$	$1$	$1$	$1$	$0$	$0$