When I’m not coding the "impossible" at Art+Logic, I take a lot of interest in new programming technologies and paradigms; even if they’re not yet viable for use in production, there can often be takeaways for improving your everyday code.
My current fascination is the Idris programming language, a research language built around making dependent types practical. This is a quick primer on what dependent types are, how they work in Idris, and how they can change the way you think about types in other languages; we’ll assume no prior knowledge of Idris or of purely functional languages in general, but a basic familiarity with functional programming will make things easier to follow.
What is a Dependent Type?
A dependent type is a type whose definition depends on a value. More generally, Idris has first-class types, meaning that types can be computed and manipulated like any other language construct (e.g. functions or values).
In Idris for example, it’s no problem to define a function with a type like this:
stringOrInt : Bool -> Type
The stringOrInt :
notation begins a type definition, and Bool -> Type
defines a function of one parameter (Bool
) with a return type of Type
. It could be implemented like so:
stringOrInt : Bool -> Type
stringOrInt False = String
stringOrInt True = Int
Idris (and Haskell, the language its syntax is based on) lets you implement a function by pattern-matching on its arguments. So the full definition above can be understood as
- Define a function
stringOrInt
which takes aBool
and returns aType
- If the argument matches
False
, then returnString
- If the argument matches
True
, then returnInt
The compiler is smart enough to know that you’ve covered all possible cases in the pattern match. This is how functions are defined in a purely functional language: declaratively, by describing the expression that calculates the result (rather than by describing a sequence of imperative steps directly).
So what we’ve got is a function which returns not a value but a type, something which isn’t supposed to even exist at runtime. What good is that? Well, we can use it in the type of another definition to do something that would be totally crazy in almost any other language:
dependent : (x : Bool) -> stringOrInt x
dependent False = "a string"
dependent True = 123
The (x : Bool)
syntax here gives the name x
to the parameter, so that it can be referred to in the return type. This dependent
function returns either a string or a number, depending on the input.
In this example, stringOrInt
is a dependent type: what it ends up being depends on the Bool
you provide to it. While stringOrInt
might not be very useful, this concept enables some powerful capabilities.
Why Dependent Types?
A practical example of the power of dependent types is for defining a data type for lists of a specific length. In Idris, a Vect
type can be defined like so:
data Vect : Nat -> Type -> Type where
Nil : Vect 0 t
Cons : t -> Vect k t -> Vect (S k) t
We’ll gloss over the syntax here, so this can be understood as
- Define a new data type
Vect k a
, wherek
is aNat
(a natural number) anda
is the type of the elements - Define a constructor
Nil
to create an emptyVect
for any element type, of length 0 - Define a constructor
Cons x xs
to prepend the elementx
to anotherVect
xs
, adding 1 to the length of theVect
((S k)
here meaning "add 1 tok
")
This Vect n t
type has something in common with generic array types like TypeScript’s Array<T>
: they each have a t
or T
parameter to define the type of the array elements (although since TypeScript’s types aren’t first-class, it appears in the special <>
syntax rather than being a regular old function parameter). What Vect
adds is a value which encodes the length of the list right in the type itself.
With this definition in hand, we can define Vect
values using the Nil
and Cons
constructors:
-- An Int Vect of length 0
noInts : Vect Z Int
noInts = Nil
-- A Bool Vect of length 2
twoBools : Vect 2 Bool
twoBools = Cons True (Cons False Nil)
-- A Bool Vect of length 3, built by adding a third element to twoBools
threeBools : Vect 3 Bool
threeBools = Cons True twoBools
This length allows you to be much more specific in the type of, for instance, an append
function:
append : Vect n t -> Vect m t -> Vect (n + m) t
append Nil ys = ys
append (Cons x xs) ys = Cons x (append xs ys)
Let’s break down what’s going on here:
- The type of
append
says it takes two parameters, aVect
of lengthn
and aVect
of lengthm
, each of which have elements of the same typet
. - The type says that the return type is
Vect (n + m) t
, meaning it’s aVect
with the combined lengths of both inputs. We’ve got math going on at the type level, and Idris is OK with this: when we implement or useappend
, the compiler will evaluate this to make sure the code conforms. Crazy! - If the first
Vect
isNil
(i.e. empty), then just return the secondVect
. - Otherwise, the first
Vect
must be an elementx
prepended onto some otherVect
,xs
. We can recursivelyappend
the remainingVect
s together and prependx
onto the result.
Using dependent types, we’re able to have append
make a much stronger guarantee than it otherwise could: not only will it return a Vect
with the same type of elements as the inputs, but it will guarantee that the result has the correct length, or else the code won’t even compile.
Consider what would have happened if we had made a simple mistake and accidentally wrote a second xs
instead of a ys
:
append : Vect n t -> Vect m t -> Vect (n + m) t
append Nil ys = ys
append (Cons x xs) = Cons x (append xs xs)
-- ^ Oops, this should be ys
This code will not compile, producing an error something like this:
Error: While processing right hand side of append. When unifying n + n
and n + m
.
Mismatch between: n
and m
.
Example:28:25--28:46
26 | append : Vect n t -> Vect m t -> Vect (n + m) t
27 | append Nil ys = ys
28 | append (Cons x xs) ys = Cons x (append xs xs)
^^^^^^^^^^^^^^^^^^^^^
The compiler has identified that the type of the code we wrote is Vect (n + n) t
, but the definition of append
expects Vect (n + m) t
, which doesn’t match.
This is a sample of the power of dependent types: making stronger guarantees at the type level, so that the compiler can catch and prevent many more errors and bugs. For the purposes of this overview we’ll stop here, but there are a ton of other interesting applications, such as:
- A type-safe
printf
function - Defining state machines as data types, so that only valid state transitions are possible
- Allowing functions to make formal assertions about their inputs or outputs, such as "the input array is sorted"
- Using interactive editing to get the compiler to automatically fill in parts of your solution
-
Implementing your unit tests as proofs, e.g.:
-- The type is an assertion that
length ['a', 'b']
evaluates to2
. --Refl
asks the compiler to check that this is trivially true, by -- evaluating it down to2 = 2
. The code gets executed as part of the -- compilation process, so if it compiles, then this "unit test" passes! testLength2 : length ['a', 'b'] = 2 testLength2 = Refl
Why Not Dependent Types?
With great power comes great responsibility, and first-class/dependent types introduce a lot of power to a language. The catch with leaning on the compiler to make much stronger guarantees about your programs is that you need to convince the compiler that your code meets these guarantees, in rigorous, mathematical detail.
For example, I’ve implemented a Binary Search Tree in Idris (see here for the main data types, and here for the insertion/querying implementations), one which guarantees that any instance of a tree satisfies the ordering constraints. It works, but the implementation is definitely noisier and it took much, much longer to implement than it would have had I not given it a type with so many guarantees. As the code that you’re formally verifying becomes more complex, implementation quickly goes from easy, to time-consuming, to an active research area.
Finding language features and development practices which make dependent types easier to use is an exciting research area; you won’t use Idris in production tomorrow, but… maybe someday.
Practical Applications
You may not be using Idris in production today, but taking some time to learn it might make you see your current code in a new light:
The Power of Types
Early on in a new JavaScript, Python or Ruby project, the sheer speed of development might make the language feel far and away superior to a strictly typed language like C++, Java, or Rust. As the months and years go by however, and runtime errors keep cropping up, you may be longing for the guard rails that those types provide. Depending on the project’s priorities, a strictly typed language may be a better investment in the long term. And keep in mind that with the backwards-compatible TypeScript and with Python’s type annotations, sometimes a gradual transition is possible.
The Value of Descriptive Data Types
We might not be able to encode "an array of length n" in any popular programming language, or to get our compiler to formally guarantee that an array is properly sorted, but there can still be a lot of value in defining new types to encode these properties anyway.
Take email addresses for example, which typically would be stored as a string
. If your sendReminder
function is implemented like this,
void sendReminder(Appointment appointment, string email) { ... }
then any number of bugs could cause the value of email
to be something which is not an email address. This is unfortunate, because the error is going to occur somewhere inside sendReminder()
, leaving you to go on a hunt for the true source of the bug. Consider however if you were to define an EmailAddress
data type:
class EmailAddress {
public readonly string value; // Exposes the validated email as a property
public Email(string value) { /* validates value
via regex */ }
}
...
void sendReminder(Appointment appointment, EmailAddress email) { ... }
Now the precondition "email
must be a valid email address" is encoded in the type of sendReminder
itself. This precondition is not formally verified, so it depends on the implementation of EmailAddress
being bug-free, but that’s an easy one to cover with a couple unit tests. And now, if there’s a bug somewhere, the stack trace will be right at the source, i.e. where someone tried to call new Email()
with something that isn’t an email.
Object-oriented languages unfortunately tend to require a lot of boilerplate to define data types like these, but it’s still an easy way to trade a little bit more verboseness in exchange for a lot of robustness.
Conclusion
There are many more interesting aspects of Idris to explore (like its Quantitative Type Theory which lets you express whether a variable can be used zero, one, or many times), but this has already been a lot! If you’re interested in giving Idris a go, I would recommend this introductory video, and the official "Crash Course" documentation.