Dakotah Lambert

Foundations: The λ-Calculus

2024 Nov 12 at 01:05

Many are familiar with the concept of the Turing machine, the mathematical model of computation that gives us the phrase “Turing Complete”. A Turing machine is built of several parts: a finite alphabet of symbols, an infinite tape divided into cells marked by these symbols, a read/write head that can modify the tape, and a finite set of control states that determine what should be written and how the tape should be moved depending on what is read. Less familiar but much simpler is the λ-calculus of Alonzo Church. Equivalent in expressive power to the Turing machine, the λ-calculus has only the creation and application of single-argument functions.

An identifier $x$ is an expression representing a parameter
If $E$ is an expression and $x$ an identifier, then $(λ x . E)$ is an expression representing a function.
If $E$ and $F$ are expressions, then $(E F)$ is an expression representing the application of $E$ to $F$ .
Nothing else is an expression.

To improve readability by avoiding excessive parentheses, function application is associated to the left: $w x y z \equiv (((w x) y) z)$ . Further, parentheses are omitted from functions that are created but not applied: $λ f . λ x . λ y . f y x \equiv (λ f . (λ x . (λ y . (f y) x)))$ , distinct from $λ f . λ x . (λ y . f y) 𝑥$ $\equiv (λ f . (λ x . (λ y . (f y)) x))$ .

Programs are built from data and transformations. In a high-level language, the job of a programmer is to define the structure of relevant objects and some sequence of valid transformations to convert input states into desired output states. This structure applies to all computational formalisms. For a Turing machine, objects must be encoded as some linear sequence of symbols on the tape. For the λ-calculus, objects must be encoded as some sort of single-argument function. Here, we’ll explore representations of some basic objects.

Propositional Logic

Arguably the simplest data structure is the humble Boolean, with two distinct values: true and false. How might one represent a Boolean value as a single-argument function? To answer this, consider how Boolean values are used. Typically, they are used in conditional expressions: if some condition $C$ holds, then $X$ , else $Y$ . Instead of defining true and false as atomic values, we can define them in terms of this conditional structure. If $C$ is a Boolean value, we can have $C X Y$ produce $X$ when $C$ is true, and produce $Y$ when $C$ is false.

This looks like we are considering two-argument functions, but we need not take more than one at a time. Let us define the Boolean value “true” as $𝐓$ . We want $𝐓 X Y$ to be $X$ . In other words, we want $(𝐓 X)$ to be a function of one argument which returns $X$ no matter its input: $𝐓 \equiv λ x . λ y . x$ . In action: $𝐓 X Y \equiv (λ x . λ y . x) X Y$ $\equiv (λ y . X) Y$ $\equiv X$ . (This process of selecting an argument and instantiating it inside the function body is called β-reduction.) Similarly, $𝐅$ selects the second of the two arguments: $𝐅 \equiv λ x . λ y . y$ . In action: $𝐅 X Y \equiv (λ x . λ y . y) X Y$ $\equiv (λ y . y) Y \equiv Y$ .

Boolean combinators such as AND, OR and NOT, are naturally two-argument functions. First consider the OR combinator, which we will denote $\lor$ . Recall that Booleans are essentially two-argument functions that represent if-then-else constructions. It should be that $\lor P Q$ is true if $P$ is true, else it is true if $Q$ is true and false otherwise. In fully expanded form: $\lor P Q \equiv P 𝐓 (Q 𝐓 𝐅)$ . Notice, however, that $(Q 𝐓 𝐅)$ is equivalent to $Q$ : if $Q \equiv 𝐓$ then $𝐓$ is selected, and if $Q \equiv 𝐅$ then $𝐅$ is selected, so assuming $Q$ is a Boolean value, we can simplify this definition to $\lor P Q \equiv P 𝐓 Q$ . Notice also that, if $P$ is a Boolean value and selects the $𝐓$ here, then $P$ must have been true; in other words, this simplifies further to $\lor P Q \equiv P P Q$ .

Doing β-reduction in reverse, we can “unapply” the function one argument at a time. This is called β-abstraction: If $\lor P Q \equiv P P Q$ , then $\lor P Q \equiv (λ q . P P q) Q$ . In other words, $\lor P \equiv λ q . P P q$ . After another round of β-abstraction, given $\lor P \equiv λ q . P P q$ we derive $\lor P \equiv (λ p . λ q . p p q) P$ . In other words, $\lor \equiv λ p . λ q . p p q$ . Whenever we see a wrapped application $λ x . E x$ , where this $x$ does not appear in $E$ , we can apply a rule called η-reduction to transform this into just $E$ . So $\lor \equiv λ p . p p$ .

Similarly, AND (denoted $\land$ ) can be defined as $\land \equiv λ p . λ q . p q p$ . If the first argument is false, it should select something falsey, which it itself is. So it can select itself! But if it is true, then the outcome depends on the second operand: the outcome is true if and only if that second operand is true, so the second operand itself is the right thing to return.

Finally, NOT (denoted $\neg$ ) returns false if the operand is true and returns true if the operand is false: $\neg \equiv λ p . p 𝐅 𝐓$ .

Just like with circuits, more complex operations can be built by composing these simple pieces. For instance, the exclusive-or operation (XOR, denoted $\oplus$ ) is defined such that $\oplus \equiv λ p . λ q . \land (\lor p q) (\neg (\land p q))$ ; the first operand is true or the second is true, but not both. Alternatively: $\oplus \equiv λ p . λ q . p (\neg q) q$ .

Operation	Expression
$𝐓$	$λ x . λ y . x$
$𝐅$	$λ x . λ y . y$
$\neg$	$λ p . p 𝐅 𝐓$
$\lor$	$λ p . p p$
$\land$	$λ p . λ q . p q p$
$\oplus$	$λ p . λ q . p (\neg q) q$

Natural Numbers

Just as Boolean values were represented as functions based on their use in if-then-else constructions, so too are natural numbers represented based on their use. What is, say, three? In the λ-calculus, threeness is doing something thrice. A natural number $n$ is a function which takes as its argument a function $f$ and returns $f$ composed with itself for a total of $n$ applications. So $𝟑 \equiv λ f . λ x . f (f (f x))$ .

Those familiar with Peano numbers know that it suffices to define a zero and a successor function. The result of applying a function zero times to an argument $x$ is just $x$ itself, so $𝟎 \equiv λ f . λ x . x$ . This should look familiar: $𝟎$ takes two arguments and returns the second. In other words, $𝟎 \equiv 𝐅$ . The successor function takes a number and adds one to it, applying the function one more time. $𝐒 \equiv λ n . λ f . λ x . f (n f x)$ .

We want $𝟏$ to be the identity function such that $𝟏 f x \equiv f x$ . Also, $𝟏 \equiv 𝐒 𝟎$ . Check: $𝐒 𝟎 \equiv (λ n . λ f . λ x . f (n f x)) 𝟎$ $\equiv λ f . λ x . f (𝟎 f x)$ $\equiv λ f . λ x . f x$ $\equiv λ f . f$ , the identity function, as desired.

The successor function 𝐒 adds one to its first argument. To add $m$ is to apply the successor function $m$ times. But repeated application is exactly what these numbers represent! The sum of $m$ and $n$ is simply $m 𝐒 n$ . Define addition in exactly that way: $+ \equiv λ m . λ n . m 𝐒 n$ . By η-reduction, $+ \equiv λ m . m 𝐒$ . And multiplication of $m$ by $n$ is simply to add $m$ copies of $n$ : $5 \times 3 \equiv 3 + 3 + 3 + 3 + 3$ . That is, $\times \equiv λ m . λ n . n (n 𝐒) 𝟎$ . As a simpler alternative, $\times \equiv λ m . λ n . λ f . n (n 𝐒) f$ .

Tying in to the previous section, sometimes one wants to query whether a number satisfies some property. One simple property is whether the number is zero. In order to do this, we need a function that takes an argument and returns false: $λ x . 𝐅$ . If this function is applied one or more times to an argument, the result is false. But if it is applied zero times to an argument $x$ , then the result is $x$ itself. Define $𝐙 \equiv λ n . n (λ x . 𝐅) 𝐓$ . Then $𝐙 𝟎 \equiv 𝟎 (λ x . 𝐅) 𝐓 \equiv 𝐓$ , but $𝐙 𝟐 \equiv 𝟐 (λ x . 𝐅) 𝐓$ $\equiv (λ f . λ x . f (f x)) (λ x . 𝐅) 𝐓$ $\equiv (λ x . (λ x . 𝐅) ((λ x . 𝐅) x)) 𝐓$ $\equiv (λ x . 𝐅) ((λ x . 𝐅) 𝐓) \equiv 𝐅$ . Recall how the Boolean values are defined; $λ x . 𝐅$ is simply $𝐓 𝐅$ . So, $𝐙 \equiv λ n . n (𝐓 𝐅) 𝐓$ .

To determine whether two arbitrary numbers are equal is more difficult. There is no built-in comparison function, no equality operator that we can leverage. Instead of directly defining equality, it is easier to first define inequality, $\leq$ , and to derive equality from that. We begin by defining a clamped predecessor function $𝐏$ . If $x$ is zero then $𝐏 x$ is also zero, otherwise it is $x - 1$ . In order to do this, we introduce a representation of pairs. Then we can define a function $f (a, b) \equiv (a + 1, a)$ such that $f (0, 0) \equiv (1, 0)$ , $f (f (0, 0)) \equiv f (1, 0) \equiv (2, 1)$ , and so on. Then the result of $n$ applications of $f$ to $(0, 0)$ will contain $n - 1$ in its second component.

The pair $(a, b)$ can be represented by the function $λ f . f a b$ . Applying a select-first function yields $a$ and applying a select-second function yields $b$ . We have those, they are $𝐓$ and $𝐅$ , respectively. Our partial-predecessor function is then $λ p . λ f . f (𝐒 (p 𝐓)) (p 𝐓)$ . Take a pair $p \equiv (a, b)$ and return a new pair whose first component is $a + 1$ and whose second component is just $a$ . Then $𝐏 \equiv λ n . n (λ p . λ f . f (𝐒 (p 𝐓)) (p 𝐓)) (λ f . f 𝟎 𝟎) 𝐅$ .

With that in place, we have clamped subtraction, sometimes called “monus”. The difference $x ∸ y$ is $y$ iterations of the predecessor function to $x$ : $∸ \equiv λ x . λ y . y 𝐏 x$ . As $𝐏 𝟎 \equiv 𝟎$ , we have that $∸ x y$ is zero if and only if $x \leq y$ . Define $\leq \equiv λ x . λ y . 𝐙 (∸ x y)$ . Two natural numbers are equal if each is less than or equal to the other; define equality, $=$ , as $λ x . λ y . \land (\leq x y) (\leq y x)$ . The other possibilities can easily be defined in terms of these.

Object	Expression
$𝟎$	$λ x . λ y . y$
$𝐒$ (successor)	$λ n . λ f . λ x . f (n f x)$
$𝐏$ (clamped predecessor)	$λ n . n (λ p . λ f . f (𝐒 (p 𝐓)) (p 𝐓)) (λ f . f 𝟎 𝟎) 𝐅$
$𝐙$ (zero?)	$λ n . n (𝐓 𝐅) 𝐓$
$+$	$λ m . m 𝐒$
$∸$ (“monus”)	$λ m . λ n . n 𝐏 m$
$\times$	$λ m . λ n . λ f . m (n f)$
$\leq$	$λ m . λ n . 𝐙 (∸ m n)$
$=$	$λ x . λ y . \land (\leq x y) (\leq y x)$

Integers

So far, we have the natural numbers, the nonnegative numbers. We also have pairs and Boolean values, so one way to represent all integers would be to encode a pair containing a sign bit and a magnitude. Those familiar with foundational set theory, however, may be aware of a different representation, which does not use Boolean values. Instead, integers are represented by pairs, where $(x, y)$ represents the integer $x - y$ . Each possible integer has infinitely many representative pairs, but this representation is easier to work with.

For instance, addition is pointwise. The pair $(x, y)$ represents $x - y$ , and $(a, b)$ represents $a - b$ , and $(x + a, y + b)$ represents $(x + a) - (y + b)$ $\equiv x + a - y - b$ $\equiv (x - 𝑦) + (a - b)$ . Recall that a pair selects a function and applies it with its two components as arguments, in order, and that $𝐓$ selects the first and $𝐅$ selects the second. Define $+_{Z} \equiv λ p . λ r . λ f . f (+ (p 𝐓) (r 𝐓)) (+ (p 𝐅) (r 𝐅))$ ; take two pairs $p$ and $r$ representing integers, then return the pair obtained by adding each component separately.

If $(x, y)$ represents $x - y$ , then $- (x - y) \equiv y - x$ is represented by $(y, x)$ . Negation is to flip the pair: $𝐍 \equiv λ p . λ f . f (p 𝐅) (p 𝐓)$ . To subtract two integers, we can define $-_{Z} \equiv λ m . λ n . +_{Z} m (𝐍 n)$ . After expansion and simplification, $-_{Z} \equiv λ p . λ r . λ f . f (+ (p 𝐓) (r 𝐅)) (+ (p 𝐅) (r 𝐓))$ .

Recall that $(∸ x y)$ represents $\max (x - y, 0)$ . If $x \leq y$ , this is $0$ , but if $x \geq y$ , this is $x - y$ . By trichotomy, $+ (∸ x y) (∸ y x)$ is the absolute value of $x - 𝑦$ . Thus, to obtain the absolute value of an integer as a natural number, we define $𝐀 \equiv λ x . + (x ∸) (x (λ a . λ b . a 𝐏 b)) \equiv λ x . + (x ∸) (x (λ a . a 𝐏))$ . To test whether an integer is zero is to test whether its absolute value is zero, so $𝐙_{Z} \equiv λ n . 𝐙 (𝐀 n)$ .

Finally, to multiply integers, note that the distributive property holds: $k \cdot (x - y) \equiv k \cdot x - k \cdot y$ . So we want to multiply each component of one pair by the absolute value of the other pair, and negate the result if necessary. Define $\times_{Z} \equiv λ m . λ n . m \leq 𝐍 (𝟎 𝐍) (λ f . f (\times (𝐀 m) (n 𝐓)) (\times (𝐀 m) (n 𝐅)))$ .

Operation	Expression
$𝐀$ (absolute value)	$λ n . + (n ∸) (n (λ a . a 𝐏))$
$𝐙_{Z}$	$λ n . 𝐙 (𝐀 n)$
$+_{Z}$	$λ m . λ n . λ f . f (+ (m 𝐓) (n 𝐓)) (+ (m 𝐅) (n 𝐅))$
$-_{Z}$	$λ m . λ n . λ f . f (+ (m 𝐓) (n 𝐅)) (+ (m 𝐅) (n 𝐓))$
$\times_{Z}$	$λ m . λ n . m \leq 𝐍 (𝟎 𝐍) (λ f . f (\times (𝐀 m) (n 𝐓)) (\times (𝐀 m) (n 𝐅)))$

Lists

With numbers available, one might naturally want to represent a list of numbers. Recall that in defining integers we used pairs, represented by functions that take (essentially) two-argument functions as input. Given a pair $p$ of natural numbers, $p 𝐓$ is the first, $p 𝐅$ is the second, $p +$ is their sum, and so on. To represent a list, we might consider a similar sort of configuration: a list $L$ is a function which takes as its arguments a binary operation $f$ and a default value $x$ . If the list is empty, then application yields the default value, $L f x \equiv x$ . Otherwise, the list has a head $h$ and a tail $t$ , and the application yields $L f x \equiv f h (t f x)$ .

This gives us exactly the information that we need in order to construct something akin to a linked list. Like zero and falsity, the empty list ignores its first argument and returns its second: $\emptyset \equiv 𝟎$ . Then to cons a head $h$ onto a tail $t$ , we can use $𝐂 \equiv λ h . λ t . λ f . λ x . f h (t f x)$ .

A function which takes a list of integers and returns their sum is $Σ \equiv λ a . a +_{Z} (λ f . f 𝟎 𝟎)$ . The product is $Π \equiv λ a . a \times_{Z} (λ f . f (𝐒 𝟎) 𝟎)$ . To get the length, the function should ignore the head and return the successor of the result, and the default value should be the natural number zero: $𝐋 \equiv λ a . a (λ h . 𝐒) 𝟎$ . To extract the head, or return $𝐅$ in the case of an empty list, define $𝐇 \equiv λ a . a 𝐓 𝐅$ . Finally, to extract the tail, we use a technique similar to that used for finding the predecessor of a natural number. The initial value is a pair consisting of two copies of the empty set. The binary operation takes the head $h$ of the list and such a pair $(x, y)$ , and returns $(𝐂 h x, x)$ . The tail in the end is the second component: $↓ \equiv λ a . a (λ h . λ p . λ f . f (𝐂 h (p 𝐓)) (p 𝐓)) (λ f . f \emptyset \emptyset) 𝐅$ . This, with repeated application gives an indexing function: drop $n$ elements and take the head: $@ \equiv λ a . λ n . 𝐇 (n ↓ 𝑎)$ .

Operation	Expression
$\emptyset$ (empty)	$𝟎$
$𝐂$ (cons)	$λ h . λ t . λ f . λ x . f h (t f x)$
$Σ$ (sum)	$λ a . a +_{Z} (λ f . f 𝟎 𝟎)$
$Π$ (product)	$λ a . a \times_{Z} (λ f . f (𝐒 𝟎) 𝟎)$
$𝐋$ (length)	$λ a . a (λ h . 𝐒) 𝟎$
$𝐇$ (head)	$λ a . a 𝐓 𝐅$
$↓$ (tail)	$λ a . a (λ h . λ p . λ f . f (𝐂 h (p 𝐓)) (p 𝐓)) (λ f . f \emptyset \emptyset) 𝐅$
$@$ (index)	$λ a . λ n . 𝐇 (n ↓ 𝑎)$

Recursion

To this point, none of our functions have been recursive. This is both because we have not needed such power and because the λ-calculus does not support recursion at all. Suppose we want to find the sum of the natural numbers up to and including $n$ . Now, of course, we know that the result should be $n (n + 1) / 2$ , but, alas, we have yet to define a division function. Rather than incorporate division, let us consider how we might compute this result.

One method is to create a function $φ$ that will map a pair $(x, a)$ to a new pair $(x + 1, a + x + 1)$ . Applying this function thrice to $(0, 0)$ proceeds as follows: $(0, 0)$ becomes $(1, 1)$ , which becomes $(2, 3)$ , which finally becomes $(3, 6)$ . In general, upon $n$ applications, the first component is $n$ and the second component is the sum of the natural numbers up to and including $n$ . Because numbers in the λ-calculus represent repeated application, this satisfies the requirement. Define $φ \equiv λ x . λ f . f (𝐒 (x 𝐓)) (+ (𝐒 (x 𝐓)) (x 𝐅))$ . Then define $Δ \equiv λ n . n φ (λ f . f 𝟎 𝟎) 𝐅$ to compute the desired sum. This is essentially a functionalization of the iterative solution to the problem.

Another high-level approach would be to use a recursive function: $Δ (n)$ is 0 if $n$ is 0, else it is $n + Δ (n - 1)$ . But we cannot say [invalid: $Δ \equiv λ n . 𝐙 n 𝟎 (+ n (Δ (𝐏 n)))$ ], because this kind of self-reference creates an infinitely long formula. What we can do is supply as an argument the function to be used as the “recursive” call: $Δ \equiv λ f . λ n . 𝐙 n 𝟎 (+ n (f (𝐏 n)))$ . Then, as long as we provide a sufficiently large stack of $Δ$ , we can recursively compute the solution: $Δ (Δ (Δ (Δ Δ))) 𝟑 \equiv (+ 𝟑 (Δ (Δ (Δ Δ)) 𝟐))$ $\equiv (+ 𝟑 (+ 𝟐 (Δ (Δ Δ) 𝟏)))$ $\equiv (+ 𝟑 (+ 𝟐 (+ 𝟏 (Δ Δ 𝟎))))$ $\equiv (+ 𝟑 (+ 𝟐 (+ 𝟏 (𝟎))))$ $\equiv 𝟔$ . Then what we want is some way to get a sufficiently tall stack of $Δ$ , no matter the parameter.

Enter the $𝐘$ combinator, defined such that $𝐘 f \equiv f (𝐘 f)$ . We cannot define it this way, because this uses recursion. Instead, we define $𝐘$ such that it accepts a function as an argument, then reconstructs itself as an argument to that function: $𝐘 \equiv λ f . (λ x . f (x x)) (λ x . f (x x))$ . Then applying this to a function $g$ yields $𝐘 g \equiv (λ f . (λ x . f (x x)) (λ x . f (x x))) g$ $\equiv (λ x . g (x x)) (λ x . g (x x))$ $\equiv g ((λ x . g (x x)) (λ x . g (x x)))$ $\equiv 𝑔 (𝐘 g)$ . Now, we have $𝐘 Δ 𝟑 \equiv Δ (𝐘 Δ) 𝟑$ $\equiv (+ 𝟑 (𝐘 Δ 𝟐))$ $\equiv (+ 𝟑 (Δ (𝐘 Δ) 𝟐))$ $\equiv (+ 𝟑 (+ 𝟐 (𝐘 Δ 𝟏)))$ $\equiv (+ 𝟑 (+ 𝟐 (Δ (𝐘 Δ) 𝟏)))$ $\equiv (+ 𝟑 (+ 𝟐 (+ 𝟏 (𝐘 Δ 𝟎))))$ $\equiv (+ 𝟑 (+ 𝟐 (+ 𝟏 (Δ (𝐘 Δ) 𝟎))))$ $\equiv (+ 𝟑 (+ 𝟐 (+ 𝟏 (𝟎))))$ $\equiv 𝟔$ , as desired. We need not concern ourselves with providing an appropriate number of $Δ$ , because $𝐘 Δ$ always produces another.

Directions

This post sampled a small collection of common objects and data structures to serve as an introduction to the λ-calculus and the functional approach to computation. Unlike a Turing machine or a personal computer, there is no notion of a memory layout, no restriction to a linear pattern of cells. Even so, structuring data requires careful thought. Using the provided structures and applications as a guide, consider trying these additional challenges. For how many of them is $𝐘$ necessary?

Define the missing inequalities on natural numbers.
Define equality and inequalities on integers.
Define the factorial function on natural numbers, where the factorial of zero is one, and the factorial of any $n$ greater than zero is the product of $n$ and the factorial of $n - 1$ .
Define the integer-division of natural numbers or integers.
Construct the rational numbers along with appropriate addition, subtraction, multiplication and division.
Implement natural number exponents on natural numbers or integers.
Implement integer exponents on rational numbers.
Implement some representation of a binary tree.