Photo of freight waiting to go on boats by CHUTTERSNAP on Unsplash

Painless Imports With AFIncrementalStore


Before taking this advice, please see my follow-up.

AFIncrementalStore (AFIS) is a brilliant bit of open source code from Mattt Thompson. It is a great help in synchronizing a remote data source and local data storage handled with Core Data. Its magic is turning the Core Data API into a single interface to both what is on the device and on the server. Data consumers need not worry where the data resides before asking for the data. Just perform a fetch request, and AFIncrementalStore will query both the local store and the remote side to make sure that all data returned is up to date as best as it can.

There is a use case where this model fails, but that’s not a problem if you know how to deal with it.

The Large Import

Say your application has a fair amount of somewhat static data critical for the user to have at any moment. You can’t let AFIS do its thing and request it from the server just in time to fulfill the user’s needs, or the user might not have it when the network is unavailable. You decide to preload the critical data. You also will follow Apple’s guidelines for efficient imports, the critical feature of which is to avoid fetches from Core Data. Also assume there are no existing objects in the local store that need to be updated.

  • Download the object representations
  • Create NSManagedObject instances from the representations
  • Assign relationships among the objects
  • Save to the store

The first impulse is to use an NSFetchRequest to retrieve the objects and let AFIS handle the details, as usual. However this results in one thread per fetch as well as one server request and multiple Core Data fetches per object. In other words, this is very inefficient. Downloading around 57,000 object representations and saving the resulting objects to the local store took over 30 minutes, 18 of which involved AFIS.

The Solution

The best thing I found to do here is to cut AFIS out of the picture until the very end. I’m going to go very light on the details, but each of these steps has documentation and justification just a search away. You’ll create one thread (or other concurrency mechanism of your choice) to run the import and in that thread do the following:

Create a background NSManagedObjectContext with NSPrivateQueueConcurrencyType
Set the background context's parent to the master context which has a persistent
 store coordinator set up with AFIS
For each entity to be downloaded to the device
   Until 0 objects are returned
      Request one page of representations of that entity
For each representation
   Create an NSManagedObject instance in the background context
   Store instances by Core Data entity along with the object's unique identifier
    and raw representation
For each entity
   For each instance of that entity
      For each relationship in the entity
         Find the related object(s) and assign the relationship(s)Save to the background context

Two pieces of this are critical for performance. One is how quickly you can find objects by id. When assigning the relationships—even for modest numbers of objects like my 57,000—this operation will take a long time with linear look-ups. I used an NSDictionary keyed by the object’s unique id. I keep the original representations around because my entities don’t store the id of related objects, but the representations do.

The second is you only want to save the objects to the context once. This kicks off AFIS when the thread’s child context pushes the new objects to the parent context and does the work to move those objects between threads.

With these changes my import now takes about 10 minutes, a 3x speed-up, and much of that time is the download phase.

Another trick can effectively eliminate the time for an import, but that’s another story.

+ more

Accurate Timing

Accurate Timing

In many tasks we need to do something at given intervals of time. The most obvious ways may not give you the best results. Time? Meh. The most basic tasks that don't have what you might call CPU-scale time requirements can be handled with the usual language and...

read more