blog

Performance of custom field schemas

by

Overview of SELECT performance on indexed KVP vs. hstore. Figure 17 from “Key/Value Pair versus hstore” by Michel Ott.

Overview of SELECT performance on indexed KVP vs. hstore. Figure 17 from “Key/Value Pair versus hstore” by Michel Ott.

In a previous post comparing options for storing custom fields in a database app, I considered the implications of each option on code design and clarity.  Of course, another important consideration is the relative performance of the different approaches.  How does searching for values in KVP custom fields (a table with key and value columns) compare to PostgreSQL hstore custom fields (a column containing a special NoSQL-like key/value hash)?

Performance Comparison

A paper titled "Key/Value Pair versus hstore: Benchmarking Entity-Attribute-Value Structures in PostgreSQL" provides some insight.  It was published May 26, 2011, by Michel Ott of the HSR Hochschule fur Technik Rapperswil.

Ott’s benchmarking, which focused solely on SELECT statements using keys and values, finds hstore is a clear winner for large datasets of more than 500 records.  In these tests, using an index on the key column for both approaches, hstore is up to 4 times faster than KVP.  The speed ratio grows alongside the number of rows searched.

Adding an index to the KVP approach that combines the key and value columns increases KVP’s performance, though it still lags behind that of hstore.  However, the two KVP indexes both require quite a bit more disk space than the index on the hstore column.  For 250k records, each of the KVP indexes requires 4 – 6 times more disk space.

Comparison of index sizes for KVP and hstore fields. Figure 19 from “Key/Value Pair versus hstore” by Michel Ott

Comparison of index sizes for KVP and hstore fields. Figure 19 from “Key/Value Pair versus hstore” by Michel Ott.

A full performance comparison should also consider write speed, and preferably more complex searches (e.g. partial text matching).  I haven’t yet found such benchmarks or performed them myself, but I’ll post back here if I do.

Benchmarking Notes

Ott used a typical KVP schema, where a custom field table includes key and value columns and a reference to another table that stores the non-custom field data.

CREATE TABLE bench_kvp_id (bench_id BIGINT PRIMARY KEY);
CREATE TABLE bench_kvp (
 bench_id BIGINT REFERENCES bench_kvp_id(bench_id),
 key TEXT NOT NULL,
 value TEXT
);

For different tests, Ott used indexes on just the key column, and the key and value columns combined.

For the hstore tests, Ott used a single table with two columns: primary key (for the non-custom field record) and hstore.

CREATE TABLE bench_hstore (bench_id BIGINT PRIMARY KEY, bench_hstore HSTORE NOT NULL);

Ott indexed the hstore column using a GiST index (for full text searching).

Ott tested with PostgreSQL 9.0.4 on Ubuntu 10.04.12 LTS, using preconfigured PostgreSQL databases without any specialized performance tuning.  It’s possible that the performance of one approach or another could be impacted by certain configuration settings.

+ more

Accurate Timing

Accurate Timing

In many tasks we need to do something at given intervals of time. The most obvious ways may not give you the best results. Time? Meh. The most basic tasks that don't have what you might call CPU-scale time requirements can be handled with the usual language and...

read more