blog

Map of the Barbary Coast by the French cartographers Pierre and Alexander Lapie

Geopolitical Consequences of SQL Design

by

Or: the consequences of not using many-to-many relationships

Is Moroccan food African or Mediterranean?  We designed a simple hierarchical model of cuisines that grouped them by broad geographic area.  The purpose of the model was solely to populate a configurable, organized list for a select menu, where smaller areas were visually grouped.

Keeping things simple, we created a cuisine table with id, name, and parentcolumns:

Column     | Type     | Modifiers
-----------+----------+--------------------------
 id        | integer  | primary key
 name      | string   | not null
 parent_id | integer  | foreign key to cuisine(id)
Indexes:
 "cuisine_pkey" PRIMARY KEY, btree (id)
 "cuisine_name" UNIQUE CONSTRAINT, btree (name)
Foreign-key constraints:
"cuisine_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES cuisine(id)

Software screen capture

Cuisine names could not be duplicated, and each cuisine had zero or one parents.  Which worked great, until we realized that Morocco was in the data list twice.

We had a couple options: create a cuisine join table that allowed for many parents of a single cuisine, or allow duplicates in the name field.  The latter option seemed far more expeditious, so we changed the unique index to:

"cuisine_name_parent_id_key" UNIQUE CONSTRAINT, btree (name, parent_id)

Software screen capture

Which again worked, until we realized that SQL doesn’t apply the unique constraint if parent_id is NULL, and so duplicate names that didn’t have a parent were being allowed.

Digging in deeper, we discovered partial indexes in PostgeSQL, which can apply a unique constraint under certain conditions determined by a WHERE clause.  We now have two unique indexes applied to the name column:

"cuisine_name_parent_id_key" UNIQUE CONSTRAINT, btree (name, parent_id) "cuisine_name_key" UNIQUE, btree (value) WHERE parent_id IS NULL

And thus we avoid an internecine food fight (at the cost of reduced SQL transportability).

+ more