What approach do you take when adding dynamic attributes to model objects stored in a traditional database? Let’s say a bug model with static fields like name and species also needs to support any number of additional dynamic fields specified by different bug collectors. There are a number of possible solutions to this common problem, so let’s consider the options.
- Serialize a hash (
{ color: "brown", arms: 2 }
) into a column at save, and de-serialize at load. This serialization would be easy from a configuration standpoint, but would scale poorly and would be rubbish for searching on key values. Rejected. - Integrate a NoSQL datastore like redis. A NoSQL datastore would offer speed and a task-appropriate API at the expense of integrating, configuring, and maintaining a new storage component for a minor feature. Reasonable, but not appropriate in the context.
- Add a series of “CustomField#{n}” columns to the
bugs
table. No, just no. Don’t make me cry. - Dynamically add columns to the
bugs
table (or to per-collector dynamic tables) when new custom fields are defined. These columns would allow strict type validation at the DB layer, but would limit the total number of columns and would be difficult to maintain with standard ORMs. Decidedly too much trouble. - Design a standard relational mapping from the
bugs
table to a custom field table with key and value columns. This approach is widely used as it fits a standard RDBMS schema. Its only obvious downside is increasing the number of table rows and objects that need to be managed and read. - Use PostgreSQL’s hstore extension, which provides a key/value hash column type. If PostgreSQL is already your RDBMS or is an easy switch, using hstore essentially mixes in a NoSQL approach with minimal extra configuration.
Let’s consider the custom field table (cftable) and PostgreSQL hstore (hstore) approaches in more detail, in the context of an application running on Ruby on Rails.
Configuration
Assume that PostgreSQL is already configured (or watch the RailsCast on migrating to PostgreSQL) with the contrib extensions downloaded (part of some postgres homebrews, or sudo apt-get install postgresql-contrib-9.0
on Ubuntu).
For cftable, there’s no additional configuration. We’re just using standard database tables.
For hstore, we need the activerecord-postgres-hstore gem added to the Gemfile:
gem 'activerecord-postgres-hstore', github: 'engageis/activerecord-postgres-hstore'
after which we run bundler to install the gem:
bundle install
We then create and run a migration that adds hstore support to the PostgreSQL server:
rails g hstore:setup
rake db:migrate
which creates a simple migration that looks like this:
class SetupHstore < ActiveRecord::Migration
def self.up
execute "CREATE EXTENSION IF NOT EXISTS hstore"
end
def self.down
execute "DROP EXTENSION IF EXISTS hstore"
end
end
Schema
To compare the two approaches in code, we’ll create two bug tables, crickets
and horseflies
, for custom fields and hstores respectively. Each will have a name
column, and each will have a set of custom fields collectively called parts
.
For cftable, we’ll create a custom field table (duh) with a polymorphic association to any other table that wants to store custom fields. The migration with GIN indexes for full-text search looks like:
class CreateCustomFields < ActiveRecord::Migration
def up
create_table :custom_fields do |t|
t.text :key
t.text :value
t.references :custom_fieldable, :polymorphic => true
t.timestamps
end
execute "CREATE INDEX custom_fields_gin_key ON custom_fields (key)"
execute "CREATE INDEX custom_fields_gin_value ON custom_fields (value)"
end
def down
drop_table :custom_fields
execute "DROP INDEX custom_fields_gin_key"
execute "DROP INDEX custom_fields_gin_value"
end
end
and the table structure looks like:
Table "public.custom_fields"
Column | Type
-----------------------+----------------------------
id | integer
key | text
value | text
custom_fieldable_id | integer
custom_fieldable_type | character varying(255)
created_at | timestamp without time zone
updated_at | timestamp without time zone
The customfieldable*
columns support the polymorphic association, so that any other table row can be linked to a custom field value.
For hstore, we’ll add a parts
column of type hstore
to the horseflies
table. The migration, again with a GIN index, looks like:
class CreateHorseflies < ActiveRecord::Migration
def up
create_table :horseflies do |t|
t.text :name
t.hstore :parts
t.timestamps
end
execute "CREATE INDEX horseflies_gin_parts ON horseflies USING GIN(parts)"
end
def down
drop_table :horseflies
execute "DROP INDEX horseflies_gin_parts"
end
end
and the table structure looks like:
Table "public.custom_fields"
Column | Type
-----------------------+----------------------------
id | integer
key | text
value | text
custom_fieldable_id | integer
custom_fieldable_type | character varying(255)
created_at | timestamp without time zone
updated_at | timestamp without time zone
Model
We’ll need two model classes for the cftable approach, one for the cricket records and another for the custom field records.
The CustomField
model class just has an accessible key and value, and a child relationship to any other custom_fieldable
model.
class CustomField < ActiveRecord::Base
attr_accessible :key, :value
belongs_to :custom_fieldable, :polymorphic => true
end
The Cricket
model class has a parts
association linked to custom fields. It saves its associated custom fields whenever a cricket record is saved, and it destroys them when the cricket is accidentally stepped on.
class Cricket < ActiveRecord::Base
attr_accessible :name
has_many :parts, :as => :custom_fieldable, :autosave => true,
:class_name => "CustomField", :dependent => :destroy
end
A cricket’s array of parts can be accessed like:
cricket.parts.detect{ |p| p.key == 'arms' }.value
=> '2';
The Horsefly
model class is naturally simpler, with just a serializable parts
attribute (note that the serialization method is obviously distinct from Ruby’s standard hash serialization, which wouldn’t result in data accessible by the RDBMS).
class Horsefly < ActiveRecord::Base
attr_accessible :name, :parts
# Necessary for serializing parts as an hstore
serialize :parts, ActiveRecord::Coders::Hstore
end
A horsefly’s hash of parts can be accessed like:
horsefly.parts['arms']
=> '2'
Getting and setting values
Let’s encapsulate getting and setting the bug parts a little, so users of the models don’t need to directly access the parts attribute. To that end, we’ll add helpers to each of the model classes.
For both approaches, getting and setting parts should look like:
bug.part :arms, '2'
=> '2'
bug.part :arm
=> '2'
bug.part :thumb
=> nil # bugs don't have thumbs, silly
The helper methods could be written in a generic module for reuse in other models with custom fields, but for now we’ll add them to the Cricket
and Horsefly
model classes.
For the Cricket
model, the helper method needs to find or create a custom field record with the appropriate key:
# Find a part in memory by key.
# The speed of this method when there are large numbers of custom fields
# could be increased by using a SQL lookup unless parts_changed?
def part_record key
parts.detect { |p| p.key == key }
end
# Set or get a part.
# o.part :arm, 2
# o.part :arm
def part(*args)
key, value = args
key = key && key.to_s
if args.size == 1
p = part_record(key) and p.value
elsif args.size == 2
raise ArgumentError, "invalid key #{key.inspect}" unless key
# Find or create a custom field object with the appropriate key
p = part_record(key) || self.parts.new(key: key)
p.value = value
else raise ArgumentError, "wrong number of arguments (#{args.size} for 1 or 2)"
end
end
For the Horsefly
model, all the helper method is really doing is making sure the parts
attribute is a hash (rather than nil) and marking the horsefly object as dirty after a key value changes (since Rails’ ORM won’t notice the record change otherwise).
# Set or get a part.
# o.part :arm, 2
# o.part :arm
def part(*args)
key, value = args
key = key && key.to_s
if args.size == 1
parts && parts[key]
elsif args.size == 2
raise ArgumentError, "invalid key #{key.inspect}" unless key
parts_will_change!
self.parts = (parts || {}).merge(key => value)
self.parts[key]
else raise ArgumentError, "wrong number of arguments (#{args.size} for 1 or 2)"
end
end
At this point, comparisons of getting and setting cricket and horsefly parts aren’t very interesting:
cricket.part :arms, '4'
=> '4'
cricket.part :arms
=> '4'
horsefly.part :arms, '4'
=> '4'
horsefly.part :arms
=> '4'
Searching
So let’s move on to searching for bugs with certain keys and values, where the code for the two approaches starts to further diverge.
Searching for crickets based on the existence or value of a key uses standard SQL joins (note that because we’re referencing custom_fields as parts, the join and where methods take different table aliases):
# Find crickets with any arms
Cricket.joins(:parts).where(custom_fields: {key: 'arms'})
# Find crickets with 2 arms
Cricket.joins(:parts).where(custom_fields: {key: 'arms', value: '2'})
# Find crickets with a partial text search
Cricket.joins(:parts)
.where("custom_fields.key = :key AND custom_fields.value LIKE :value",
key: 'middle_name', value: '%calvin%')
# Delete all arms values for all crickets
CustomField.find_by_custom_fieldable_type_and_key('Cricket', 'arms')
.destroy
Searching for horseflies uses custom hstore functions and operators which require custom SQL expressions:
# Find horseflies with any arms
Horsefly.where(:parts :key , key: 'arms')
# Find horseflies with 2 arms
Horsefly.where(:parts > (:key => :value), key: 'arms', value: '2')
# Find horseflies with a partial text search
Horsefly.where('parts' :key LIKE :value),
key: 'middle_name', value: 'hamlet')
# Delete all arms values for all horseflies
Horsefly.delete_key(:parts, :arms)
Wrapping up
From a development standpoint, I can’t say that either approach is categorically better. I prefer the organizational structure that hstore columns provide by keeping the custom fields inside the related record. In another post, I’ll look at performance considerations of these two approaches.
For more on getting started using hstore in Rails: