blog

Photo of 5 story library inside looking down by Tobias Fischer on Unsplash

Optimizing Core Data searches and sorts

by

Core Data provides a capable framework for connecting data to an iOS user interface.  The framework doesn’t so much reduce code as it does abstract the SQLite interface in order to support advanced features.  However, in doing so it masks certain implementation decisions that can result in inefficient, less-than-instant queries when a table reaches 5-10k records.  Since these queries are often run on the main thread for easy integration with the UI, they lead to unresponsive apps.

Solving these slowdowns requires understanding more about SQLite and Core Data’s connection to it.  For both technologies, there are simple analytical tools and standard solutions for common optimization problems.

com.apple.CoreData.SQLDebug

The first useful tool for investigating problems is -com.apple.CoreData.SQLDebug , a runtime flag that enables debug logging and timing of SQL statements.  To enable this flag:

Software screen capture

  1. Edit the product scheme
  2. Select the Run action on the left
  3. Select the Arguments tab
  4. Add “-com.apple.CoreData.SQLDebug 1” to Arguments Passed On Launch

With this flag enabled, SQL statements will be printed to the console like:

CoreData: sql: SELECT 0, t0.Z_PK FROM ZASCREENEE t0 WHERE  t0.ZPROFILE = ?
   ORDER BY t0.ZFIRSTNAME, t0.ZLASTNAME, t0.ZCREATED
CoreData: annotation: sql connection fetch time: 0.0061s
CoreData: annotation: total fetch execution time: 0.0065s for 15002 rows.

If you are targeting iOS, run your optimization tests on an actual device, since an iPad CPU will run significantly slower than your development machine.

explain query plan

Once you have the actual SQL statements that are running too slowly, use SQLite’s EXPLAIN QUERY PLAN command to get a description of the strategy that SQLite will use to execute the query.

In Terminal, run the sqlite binary with your SQLite database used in the iOS simulator. 

sqlite is probably installed at /usr/bin/sqlite3, and the location of your database is printed to the console when the SQLDebug runtime flag is enabled, like:

CoreData: annotation: Connecting to sqlite database file at
"~/Library/Application Support/iPhone Simulator/6.0/Applications/A91DD92E-B0E9-4FEA-BC53-F61DE19638E7/Library/Caches/MyApp.sqlite"

Example command to launch SQLite’s command line utility:

/usr/bin/sqlite3 "~/Library/Application Support/iPhone Simulator/6.0/Applications/A91DD92E-B0E9-4FEA-BC53-F61DE19638E7/Library/Caches/MyApp.sqlite"

Then get details about the query plan for a statement with:

EXPLAIN QUERY PLAN SELECT 0, t0.Z_PK FROM ZASCREENEE t0 WHERE  t0.ZPROFILE = ?;

which will describe the searches performed.  Note that if your SQL statement includes some Apple-specific custom function like NSCoreDataStringSearch , you will get a "no such function error".  These custom functions are described below.

A single output line that uses an index is good:

SEARCH TABLE ZASCREENEE AS t0 USING COVERING INDEX ZASCREENEE_ZPROFILE (ZPROFILE=?) (~10 rows)

But an output line that includes a TEMP B-TREE is probably bad:

USE TEMP B-TREE FOR ORDER BY

The SQLite query planner documentation provides much more detail on how queries are analyzed and how performance is affected.

Indexing slow queries

A temporary b-tree slows a query down significantly, and so the query can often be optimized by ensuring that it uses an index instead.  However, simply indexing an attribute used in the query may not be sufficient.

One problem is that SQLite cannot use multiple indexes in many queries (see the Query Planning document for far more details).  If a query filters with a where clause and sorts with an order by clause, only one index can be used.  If both clauses aren’t satisfied by the same index, a temporary b-tree will be invoked for one of them.

If these two clauses use separate attributes, one potential solution is to create a multi-column index that includes all the necessary attributes.  For example, a query like:

SELECT t0.Z_PK FROM ZASCREENEE t0 WHERE t0.ZPROFILE = ?
ORDER BY t0.ZFIRSTNAME, t0.ZLASTNAME

could use an index that covers all three referenced attributes – profile, firstName, and lastName – for both searching and sorting.Software screen capture

To create a multi-column index in Core Data:

  1. Select the data model
  2. Select the entity that needs indexing
  3. Show the Utilities view and select the Data Model inspector
  4. In the Indexes field, add an index like “profile,firstName,lastName”
  5. In the Versioning section, set the Hash Modifier field to some unique value to ensure that existing databases will be updated during migration

Case sensitive searching

By default, SQLite builds binary indexes, which means that two text strings are compared exactly.  This is fast, but useless if you want to sort or search for case insensitive or localized strings.  "Banana" will be sorted before "apple", "Ez" before "Éa", and "horse" will not match "Horse".

SQLite also provides a NOCASE index collation that is case insensitive, but Core Data does not expose an option to set an index or column collation.  As such, there is no way to use Core Data to create or migrate a SQLite database and also specify use of a case insensitive index.  However, even if there was such an option, the NOCASE collation only folds the 26 upper case characters of ASCII with their lower case equivalents, meaning that accented characters and other alphabets would not be supported.

Moreover, when Core Data converts string comparison predicates with "c" (for case insensitive) and "d" (for diacritic insensitive) modifiers to SQLite queries, it uses custom functions for string comparison.  For example:

[NSPredicate predicateWithFormat:@"firstName CONTAINS[cd] %@", name]

results in the following SQL:

CoreData: sql: SELECT 0, t0.Z_PK FROM ZASCREENEE t0
WHERE NSCoreDataStringSearch( t0.ZFIRSTNAME, ?, 385, 0)

The function NSCoreDataStringSearch cannot make use of any index because it requires loading each compared string into memory and evaluating it through custom C code.  You can run fast or flexible comparisons on case sensitive data, but not both.

Normalized text attributes

Alternatively, for both fast and flexible searches, you can perform the text normalization ("HéLLo" -> "hello") when the record is saved and store the normalized value in a dependent property, and then search on this column using normalized search text.  Apple provides the sample app DerivedProperty to illustrate how to maintain and query a dependent attribute, as well as documentation on updating dependent attributes immediately or on save.

Essentially, this option involves:

  • Adding an attribute to the entity to hold the normalized text, e.g. “firstNameNormalized”
  • Modifying the entity code (either overriding the original property setter or the entity willSave callback) to normalize the property text and set it to the dependent attribute
  • Using the normalized text attribute for sorting in Core Data fetches
  • Normalizing search strings and comparing them to the normalized text attribute with a simple predicate like [NSPredicate predicateWithFormat:@"firstNameNormalized = %@", [self normalize:searchText]]

Maintaining normalized text in additional columns denormalizes the affected table, but allows for useful searches and sorts that are easily an order of magnitude or two faster.

Batch fetching

Even once sorts and searches are optimized through the use of indexes, it’s still important to consider what data Core Data is actually loading.  By default, connecting a fetch of 10,000 rows to NSFetchedResultsController will load all those objects into memory. Instead, set the fetch request’s fetchBatchSize property to some small number related to the number of actual records your UI will be showing.  Core Data will first load and store in memory the primary keys of all the matched records, then issue queries as necessary to load the actual records by ID in the given batch size.

Be careful, though, as even with a batch size specified, calling NSFetchedResultsController’s fetchedObjects property will still (somewhat obviously) load every single record.

Related posts

I found these posts helpful in understanding Core Data with respect to SQLite:

+ more