blog

Dependent Types

Exploring Dependent Types in Idris

by

When I’m not coding the "impossible" at Art+Logic, I take a lot of interest in new programming technologies and paradigms; even if they’re not yet viable for use in production, there can often be takeaways for improving your everyday code.

My current fascination is the Idris programming language, a research language built around making dependent types practical. This is a quick primer on what dependent types are, how they work in Idris, and how they can change the way you think about types in other languages; we’ll assume no prior knowledge of Idris or of purely functional languages in general, but a basic familiarity with functional programming will make things easier to follow.

What is a Dependent Type?

A dependent type is a type whose definition depends on a value. More generally, Idris has first-class types, meaning that types can be computed and manipulated like any other language construct (e.g. functions or values).

In Idris for example, it’s no problem to define a function with a type like this:

stringOrInt : Bool -> Type

The stringOrInt : notation begins a type definition, and Bool -> Type defines a function of one parameter (Bool) with a return type of Type. It could be implemented like so:

stringOrInt : Bool -> Type
stringOrInt False = String
stringOrInt True = Int

Idris (and Haskell, the language its syntax is based on) lets you implement a function by pattern-matching on its arguments. So the full definition above can be understood as

  1. Define a function stringOrInt which takes a Bool and returns a Type
  2. If the argument matches False, then return String
  3. If the argument matches True, then return Int

The compiler is smart enough to know that you’ve covered all possible cases in the pattern match. This is how functions are defined in a purely functional language: declaratively, by describing the expression that calculates the result (rather than by describing a sequence of imperative steps directly).

So what we’ve got is a function which returns not a value but a type, something which isn’t supposed to even exist at runtime. What good is that? Well, we can use it in the type of another definition to do something that would be totally crazy in almost any other language:

dependent : (x : Bool) -> stringOrInt x
dependent False = "a string"
dependent True = 123

The (x : Bool) syntax here gives the name x to the parameter, so that it can be referred to in the return type. This dependent function returns either a string or a number, depending on the input.

In this example, stringOrInt is a dependent type: what it ends up being depends on the Bool you provide to it. While stringOrInt might not be very useful, this concept enables some powerful capabilities.

Why Dependent Types?

A practical example of the power of dependent types is for defining a data type for lists of a specific length. In Idris, a Vect type can be defined like so:

data Vect : Nat -> Type -> Type where
   Nil  : Vect 0 t
   Cons : t -> Vect k t -> Vect (S k) t

We’ll gloss over the syntax here, so this can be understood as

  1. Define a new data type Vect k a, where k is a Nat (a natural number) and a is the type of the elements
  2. Define a constructor Nil to create an empty Vect for any element type, of length 0
  3. Define a constructor Cons x xs to prepend the element x to another Vect xs, adding 1 to the length of the Vect ((S k) here meaning "add 1 to k")

This Vect n t type has something in common with generic array types like TypeScript’s Array<T>: they each have a t or T parameter to define the type of the array elements (although since TypeScript’s types aren’t first-class, it appears in the special <> syntax rather than being a regular old function parameter). What Vect adds is a value which encodes the length of the list right in the type itself.

With this definition in hand, we can define Vect values using the Nil and Cons constructors:

-- An Int Vect of length 0
noInts : Vect Z Int
noInts = Nil
-- A Bool Vect of length 2
twoBools : Vect 2 Bool
twoBools = Cons True (Cons False Nil)
-- A Bool Vect of length 3, built by adding a third element to twoBools
threeBools : Vect 3 Bool
threeBools = Cons True twoBools

This length allows you to be much more specific in the type of, for instance, an append function:

append : Vect n t -> Vect m t -> Vect (n + m) t
append Nil ys = ys
append (Cons x xs) ys = Cons x (append xs ys)

Let’s break down what’s going on here:

  1. The type of append says it takes two parameters, a Vect of length n and a Vect of length m, each of which have elements of the same type t.
  2. The type says that the return type is Vect (n + m) t, meaning it’s a Vect with the combined lengths of both inputs. We’ve got math going on at the type level, and Idris is OK with this: when we implement or use append, the compiler will evaluate this to make sure the code conforms. Crazy!
  3. If the first Vect is Nil (i.e. empty), then just return the second Vect.
  4. Otherwise, the first Vect must be an element x prepended onto some other Vect, xs. We can recursively append the remaining Vects together and prepend x onto the result.

Using dependent types, we’re able to have append make a much stronger guarantee than it otherwise could: not only will it return a Vect with the same type of elements as the inputs, but it will guarantee that the result has the correct length, or else the code won’t even compile.

Consider what would have happened if we had made a simple mistake and accidentally wrote a second xs instead of a ys:

append : Vect n t -> Vect m t -> Vect (n + m) t
append Nil ys = ys
append (Cons x xs) = Cons x (append xs xs)
                                    -- ^ Oops, this should be ys

This code will not compile, producing an error something like this:

Error: While processing right hand side of append. When unifying n + n and n + m.
Mismatch between: n and m.
Example:28:25--28:46
 26 | append : Vect n t -> Vect m t -> Vect (n + m) t
 27 | append Nil ys = ys
 28 | append (Cons x xs) ys = Cons x (append xs xs)
                              ^^^^^^^^^^^^^^^^^^^^^

The compiler has identified that the type of the code we wrote is Vect (n + n) t, but the definition of append expects Vect (n + m) t, which doesn’t match.

This is a sample of the power of dependent types: making stronger guarantees at the type level, so that the compiler can catch and prevent many more errors and bugs. For the purposes of this overview we’ll stop here, but there are a ton of other interesting applications, such as:

  • A type-safe printf function
  • Defining state machines as data types, so that only valid state transitions are possible
  • Allowing functions to make formal assertions about their inputs or outputs, such as "the input array is sorted"
  • Using interactive editing to get the compiler to automatically fill in parts of your solution
  • Implementing your unit tests as proofs, e.g.:

    -- The type is an assertion that length ['a', 'b'] evaluates to 2.
    -- Refl asks the compiler to check that this is trivially true, by
    -- evaluating it down to 2 = 2. The code gets executed as part of the
    -- compilation process, so if it compiles, then this "unit test" passes!
    testLength2 : length ['a', 'b'] = 2
    testLength2 = Refl

Why Not Dependent Types?

With great power comes great responsibility, and first-class/dependent types introduce a lot of power to a language. The catch with leaning on the compiler to make much stronger guarantees about your programs is that you need to convince the compiler that your code meets these guarantees, in rigorous, mathematical detail.

For example, I’ve implemented a Binary Search Tree in Idris (see here for the main data types, and here for the insertion/querying implementations), one which guarantees that any instance of a tree satisfies the ordering constraints. It works, but the implementation is definitely noisier and it took much, much longer to implement than it would have had I not given it a type with so many guarantees. As the code that you’re formally verifying becomes more complex, implementation quickly goes from easy, to time-consuming, to an active research area.

Finding language features and development practices which make dependent types easier to use is an exciting research area; you won’t use Idris in production tomorrow, but… maybe someday.

Practical Applications

You may not be using Idris in production today, but taking some time to learn it might make you see your current code in a new light:

The Power of Types

Early on in a new JavaScript, Python or Ruby project, the sheer speed of development might make the language feel far and away superior to a strictly typed language like C++, Java, or Rust. As the months and years go by however, and runtime errors keep cropping up, you may be longing for the guard rails that those types provide. Depending on the project’s priorities, a strictly typed language may be a better investment in the long term. And keep in mind that with the backwards-compatible TypeScript and with Python’s type annotations, sometimes a gradual transition is possible.

The Value of Descriptive Data Types

We might not be able to encode "an array of length n" in any popular programming language, or to get our compiler to formally guarantee that an array is properly sorted, but there can still be a lot of value in defining new types to encode these properties anyway.

Take email addresses for example, which typically would be stored as a string. If your sendReminder function is implemented like this,

void sendReminder(Appointment appointment, string email) { ... }

then any number of bugs could cause the value of email to be something which is not an email address. This is unfortunate, because the error is going to occur somewhere inside sendReminder(), leaving you to go on a hunt for the true source of the bug. Consider however if you were to define an EmailAddress data type:

class EmailAddress {
    public readonly string value; // Exposes the validated email as a property
    public Email(string value) { /* validates value via regex */ }
}
...
void sendReminder(Appointment appointment, EmailAddress email) { ... }

Now the precondition "email must be a valid email address" is encoded in the type of sendReminder itself. This precondition is not formally verified, so it depends on the implementation of EmailAddress being bug-free, but that’s an easy one to cover with a couple unit tests. And now, if there’s a bug somewhere, the stack trace will be right at the source, i.e. where someone tried to call new Email() with something that isn’t an email.

Object-oriented languages unfortunately tend to require a lot of boilerplate to define data types like these, but it’s still an easy way to trade a little bit more verboseness in exchange for a lot of robustness.

Conclusion

There are many more interesting aspects of Idris to explore (like its Quantitative Type Theory which lets you express whether a variable can be used zero, one, or many times), but this has already been a lot! If you’re interested in giving Idris a go, I would recommend this introductory video, and the official "Crash Course" documentation.

+ more

Accurate Timing

Accurate Timing

In many tasks we need to do something at given intervals of time. The most obvious ways may not give you the best results. Time? Meh. The most basic tasks that don't have what you might call CPU-scale time requirements can be handled with the usual language and...

read more