blog

Photo of Spitfire manufacture. WWII World War 2 Castle Bromwich Aeroplane Factory, Birmingham 1940-46. Manufacturers: Vickers Armstrong by Birmingham Museums Trust on Unsplash

Factories, not Fixtures

For years, the most common way to provide test data for automated tests has been fixtures – hard-coded values, usually stored in text files. For example, here’s some YAML-formatted data for a city model object:

- model: city
  fields:
    id: 1
    name: Los Angeles

In an environment like a Django app, this fixture would typically be loaded into a test database, and then accessed like: la = City.objects.get(id=1).

But fixtures and the frameworks that rely on them have several drawbacks:

Their data is brittle, especially when including references like unique IDs. Changing or adding a property later may break tests. This drawback also means that they are not easily modifiable, which tends to lead to duplicate fixtures with unwieldy names like los-angeles-with-an-extra-long-name.yaml.

They are typically loaded en masse by test frameworks like Django’s. This can be slow if many unnecessary fixtures are being loaded for each unit test. It also creates brittle sets of data. For example, if an automated test is searching for objects with a matching city name and expects to find one instance, but later a new fixture is added that also matches, the test will fail.

Because fixtures are typically automatically loaded into a database by the test framework, it’s not particularly easy or fast to change the properties of an object for a single test case, which also tends to lead to an over-abundance of fixture files.

Factories, not fixtures

Test data factories solve these problems by making data and data loading more dynamic and programmable. A factory is written in code, and while it can simply mimic a fixture by supplying default values for an object, factory libraries offer many other useful features. Here’s a simple example factory:

Factory.define('city', City)
  .sequence('id')
  .attr('name', 'Los Angeles')

An instance of this factory could be created on the fly like: la = Factory.build('city').

Following the builder pattern, a factory generates data for all the attributes in its definition, constructs any necessary associated objects, and allows a test case to override these values. Here’s a slightly more complex example:

Factory.define('city', City)
  .sequence('id')
  .attr('name', 'Los Angeles')
  // Define 'ref' as dependent on the id and name properties
  .attr('ref', ['id', 'name'], function(id, name) {
    return id + '-' + name;
  })
nyc = Factory.build('city', {name: 'NYC'})

Some typical features in factory libraries are:

integration with common ORMs; Factory.create(...) will typically build and save the object to a database
factory inheritance, allowing similar factories to share properties; e.g. Factory.define('city').extend('Olympic').attr('year', null)
lazy attributes; e.g. .attr('created_at', function() { return new Date(); })
associations to other factories

Factories across languages

Factory libraries have been springing up over the past handful of years. Ruby’s factory_girl, which has been cloned to many other languages, was first released in 2008. Several new ones for JavaScript and Objective-C have just appeared this year.

Here’s a list of factory libraries for a variety of common languages:

Ruby: factory_girl
Python: factory_boy
JavaScript: Rosie, nodejs factory_girl
Objective-C: CMFactory, Foundry
Java: Model Citizen, PoJoBuilder, make-it-easy
PHP: Phactory, factory-girl-php
.NET: FactoryGirl.NET

Test data for unit tests

A note of caution: a one line factory invocation may hide a great deal of complexity and database integration. That may be fine for integration tests, but should be avoided for unit tests (see the blog post Factories breed complexity for a lengthier discussion). Prefer to use simpler, non-persisted objects in unit tests. Factory libraries may help here too by returning just the attributes as a hash or dictionary; e.g. factory_girl’s attributes_for method.