The Art of Computer Programming, I.B.Profin Edition: Chapter 1.1 - Algorithms

About four years and one detour through some self-teaching in control system theory later, I have returned to Donald Knuth’s famous tome, The Art of Computer Programming (volume 1). My previous attempt to read it was extremely discouraging as I stumbled on the first part of the first chapter. Taking a second read through, it made sense this time. When I first got stuck, I went looking for discussion on the topic online and was unable to find it; I’m making this post in the hopes that it’ll help some other reader in the future succeed where I initially failed.

So, let’s back-fill some pieces of chapter 1.1 of The Art of Computer Programming.

The cover of The Art of Computer Programming, and a bottle of Advil

Knuth’s book and recommended study aids. Do not mix with alcohol; substitute one for the other.

Just a quick side-bar about math books in general: I want to yeet the phrases “It should be obvious that” and “is evident” into the sun. If you’re struggling with this material, make no mistake: it’s high-level stuff. But also, Knuth can have a tendency to bury a concept in math and trust the audience to figure it out, which is not an approach I’m fond of. In a lecture at Olin, Professor Woodie Flowers once described mathematical notation as a great way to summarize a concept for people who already understand it, and a bad way to communicate one. I’m inclined to agree.

Okay, let’s dig in.

The algorithm representation

Knuth represents a computational method (note: not an algorithm) as a four-part construct (a “quadruple”) of (Q, I, Ω, f). These have the following meaning:

  • I is the set of all inputs. For the example Knuth provides of Euler’s algorithm, which takes two inputs, it’s the set of all pairs of positive integers m and n, such as (24, 12) or (30, 10). Applying f to one of these inputs gets you some value in Q (which might be an output).
  • Ω is the set of all outputs. Knuth defines Ω to be “pointwise fixed” with respect to f; that is, feeding a member of Ω to f gives you that member right back. Why do this? I think because it makes the definition of f a little simpler, which we’ll see soon.
  • Q is actually a huge, heterogeneous set. It includes every member of the following (non-overlapping) subsets:
    • All the inputs I
    • All the outputs Ω
    • All the state vectors, which we’ll get to in a minute. Knuth didn’t give this subset its own name; I’m going to call it Σ for convenience. Keep in mind, Q is all those things.
  • f is a function that is defined across every member of Q and emits a member of Q, which we can describe using the notation f: Q -> Q. I think Knuth made Q so huge and f on Ω so strange to make this part of the definition simple: f doesn’t have any special-case undefined inputs and can just be applied to something from Q over and over again. That does lead to some weird consequences: while f(x) for x in Ω is defined, it just returns x, which… Okay but why? We could also have just left Ω out of Q and said said “f can return something from Q or something from Ω” I think Knuth doesn’t do that because it’s a harder thing to say in math.

Worth noting above is that f is not “the program.” f includes more stuff than you’d have in a regular computer program; it not only accepts inputs I, it also accepts outputs (and gives them back) and it accepts state vectors from Σ (and gives back either another state vector or an output). There’s also a hiccup in this definition; we’ll come back to it. But let’s drill down on state vectors.

The state vector

Knuth never names the state vector explicitly in this section (he doesn’t even give it a symbol; it’s represented instead as “whatever is left in the set Q if you remove the sets I and Ω from it”). But it’s both kind of core to understanding his definition and a powerful little math tool, so let’s pop it open.

Basically, the state vector is all the info you need to know to know what the program is doing at any given time. It’s represented as a vector of values, and is a vector of heterogeneous types of data. For Euler’s method, he overloads it a bit (he swaps out the third element in his 4-element state vector because one variable in his program is all integers x > 0 for step 3 and all integers x >= 0 for steps 1 and 2). Knuth’s state vector’s four elements are

  1. the input m (which gets changed as we go, becuase Euler’s method is recursive)
  2. The input n (same; changed because Euler’s method is recusrive)
  3. In steps 1 and 2, it’s the remainder of m / n, which might be 0; in step 3, it’s m / n and we know it’s not zero (because we got to step 3).
  4. A number 1, 2, or 3, denoting what step of Euler’s method we’re on. This can basically be thought of as a program counter. We could have used any symbols here; (a), (b), and (c) would work just as well and may make it easier to wrap your head around if you’re tired of staring at numbers.

This idea—that we can represent a moment in the evaluation of a program by the current state of the machine—is extremely powerful and underpins all kinds of behaviors in programming, including supporting undo, save states in videogames and emulators, and interpreters (programs that run programs). There is another very powerful idea here, the idea of computational determinism. At least at this point in the story, Knuth is talking about deterministic programs: programs where if you have two instances of a program, then no matter their history to this point, if you have them evaluate the exact same state vector, they will give you the exact same next step! This is necessary from how he’s defined his computational method here: f always gives exactly one answer, and gives the same answer for the same input. The state vector encodes everything f could possibly have to care about to decide what answer to give. Pick any valid point in the four-dimensional grid of values in Σ, and f will give you the next value (either another member of Σ or an output value from Ω).

Now that we have all those pieces in place, we can consider what Knuth is asserting with them. First, Knuth defines f by giving the important transforms among members of Q:

  • For an input I (which are all shaped like (m, n)), go to the state vector (m, n, 0, 1) in Σ.
  • For an output Ω (which are all shaped like (n)), just give it right back. f(n) -> n.
  • For a state vector in Σ, there’s a set of rules that step through the states. From step 1, we do m / n and give back a state vector in step 2. For step 2, we either give an output (if m / n was 0) or we go to step 3. For step 3, we transfer n to m and p to n, so (m,n,p,3) moves to (n,p,p,1) (it would have been the same to do (n, p, 0, 1), because step 1 doesn’t care about the third value in the state vector).

Armed with the definition of a computational sequence, Knuth then refines it into an algorithm definition by restricting our consideration to just the computational methods where we can get from I to Ω in finitely many steps for every member of I. Knuth also talks about refining the definition further to include his notion of effectiveness; I won’t go into this today (feel free to send me an email if you’d like a deeper dive on this part!).

One small bug

While this covers the ideas in this section, there is one small issue with Knuth’s definition that may trip up some readers (it got me the first time I tried to understand this part). It happens to be the case that the cardinality of the vectors in Q are different for inputs, outputs, and state vectors (2, 1, and 4, respectively). What if the algorithm we were considering were the simple successor algorithm: “For any integer N, give me N + 1?” We run into a problem with Knuth’s representation because now I and Ω are the same cardinality and therefore indistinguishable; if I try to compute f(3), I want to give back 4, but I can’t because 3 is actually in the output set Ω (it’s f(2)) so by Knuth’s definition I must satisfy f(3) = 3. The cardinality of the vectors in Q is a small piece of implicit detail that Knuth uses here to distinguish inputs, outputs, and state vectors, and he never explicitly states “I and Ω must be disjoint sets.” This is fixed easily enough; in our example we can just augment all the elements of I with a “1” to denote they’re inputs and all the elements of Ω with a “2” to denote they’re outputs. So now f((3,1)) becomes (4,2), and we’re good.

Moving forward

I hope this was useful. As I chew through the book, I’ll plan to do more of these for sections that trip me up. If there’s a section you encounter that you want expanded, let me know and I’ll take a look!