blog

Image of cocktail glasses on bar by https://www.deviantart.com/paradigm-shifting/gallery/27928644

Client-side Fulltext Searching with Fullproof

by

Recently, I was engaged in a genial argument with a friend of an older generation, each of us taking an opposing stance on some obscure trivia neither of us was entirely certain about – but which we were both ready to defend with all the wit and rhetoric at our disposal. When we had finally exhausted all attempts to make the other budge on the matter, we turned to an authoritative 3rd-party source to lay the matter to rest for us – a Google search.

What else? My friend had a shelf full of encyclopedias, but it didn’t occur to either of us to consult them when the answer was a 3-second search away. We’ve come to take that near-instantaneous access to information for granted in most contexts, and we feel annoyed or confused when we don’t have access to it. Many well-designed, content-heavy sites will have similar search functionality to help users quickly navigate to whatever they’re looking for; but generally, this relies on either a 3rd-party service (like Google), or else delegates the full-text searching to the backend (the database, or a specialized search server like SOLR).

This is usually the best way to go about this – but what if you have an application that’s expected to provide this sort of search functionality when offline (either temporarily, or intended for always-offline)?

It is here that Fullproof comes into its own. I discovered this JavaScript library while working on a complex, web-based user guide that needed to be offline-only. Its contents would come from an easily-updatable (either by us or by the client) XML file, which we would transform into HTML via XSLT (the subject of a future post, perhaps), and then style and interact programmatically with via CSS and JavaScript, as per usual.

One requirement was that we be able to perform full-text searching on the user guide. The ugly-hack approach to this might involve regex’s or all sorts of data- tags strewn throughout the document. Neither of these would be good solutions under most circumstances – even a complicated regex (and they can get very complicated, very quickly) isn’t going to offer you results for mild mispellings or near-matches, and will potentially make supporting unicode a nightmare; and sprinkling data- tags throughout your markup to indicate areas of interest is highly limited and clutters your markup.

We needed a real search engine – one that can parse and normalize our text, and offer us scoring engines that can catch misspellings and near-matches. Consider the following example (from fullproof’s home page):

Software screen capture
Searching MAME Roms with match scoring – we get our results in a ranked order that can be controlled to a degree via what indexes we instantiate.

The Fullproof github page includes a capable tutorial, the contents of which we won’t duplicate here – I suggest you go give it a look, along with the supporting documentation. Instead, let’s look at some specifics regarding setup, and pitfalls encountered when trying to employ fullproof.

Where is the data coming from?

One of the first hurdles for one expecting the full-suite treatment performing fulltext searches on a database is that fullproof isn’t a document management system (as it notes at the top of its github page). It ONLY searches – you’ll need to figure out how to feed it the data you want it to work with. My suggestion is that you have precomposed data available in some form, rather than scraping your page(s) at runtime. In our case, because we’re pulling data from XML in the first place, we can restructure that content into JSON and slap it into our page as static data in a script element, to be pulled back out and fed to Fullproof when appropriate. Something like:

this.gContent =
    $('script[type="text/JSON"]')
[0].innerHTML(); // Our guide content

Start the Engine

We’ll then initialize our engine and its indexes. This was the major stumbling block at first, particularly because we’re opening a file on the local file system (which leads to all sort of security complications). We need to eschew the HTML5 options fullproof supports and make do with memory-based storage.

this.engine = new fullproof.ScoringEngine(
        [new fullproof.StoreDescriptor("memorystore", fullproof.store.MemoryStore)]);

If we try to use WebSQL (which isn’t supported by IE or Firefox), we’ll run into a security exception in Chrome (possibly other webkit-based browsers as well, although I didn’t have opportunity to test them). Same story for indexedDB for Firefox and Chrome (Chrome will complain about same domain problems when loading our xsl as well, which is why we have to statically compile our html output of the XSLT transform for that browser – as noted, a topic for another post).

Now, we can instantiate our indexes. The fullproof tutorial notes that any number of indexes are specifiable, and the search engine will fall back through them in the specified order when attempting to fulfill a search. Thus, we specify the least altered/aggresive index first (just removing diacriticals and repeated letters), and then make a stemmed index (metaphone).

this.indexes = [
        {
            name:"normalIndex",
            analyzer:new fullproof.ScoringAnalyzer(fullproof.normalizer.to_lowercase_nomark,
                fullproof.normalizer.remove_duplicate_letters),
            capabilities:new fullproof.Capabilities().setStoreObjects(false).setUseScores(true)
                .setDbName(this.dbName).setComparatorObject(fullproof.ScoredEntry.comparatorObject),
            initializer:function(injector, callback){that.initIndexes(injector, callback, that.gContent);}
        },
        {
            name:"stemmedIndex",
            analyzer:new fullproof.ScoringAnalyzer(fullproof.normalizer.to_lowercase_nomark,
                fullproof.english.metaphone),
            capabilities:new fullproof.Capabilities().setStoreObjects(false).setUseScores(true)
                .setDbName(this.dbName).setComparatorObject(fullproof.ScoredEntry.comparatorObject),
            initializer:function(injector, callback){that.initIndexes(injector, callback, that.gContent);}
        }
    ];

You’ll notice the calls to that.initIndexes this is our initializer function that actually ‘injects’ our data into the search engine index.

/**
 * Initialize index(es).
 * @method initIndexes
 * @param {Object} injector Provided by index.
 * @param {Function} callback Function to call when indexes are finished initializing.
 * @param {Object} gContent Guide content to inject.
 * @param {Number} indexLength Number of indexes we're initializing.
 */
ASearch.prototype.initIndexes = function(injector, callback, gContent){
    var that = this;
    var synchro = fullproof.make_synchro_point(callback, gContent.length);
    for (var i=0, len = gContent.length; i < len; i++)
    {
        injector.inject(gContent[i].text, i, synchro);
    }
};

These injectors are, as noted, provided by the index in question, and are called when open the engine with the specified indexes, as below:

try{
    this.engine.open(this.indexes,
        fullproof.make_callback(this.ready, true), fullproof.make_callback(this.ready, false));
}catch(e)
{
    progress.find('.bar').addClass('bar-danger');
    progress.find('p').text('Failed to open search indexes. Search not available.');
    this.debug && console.log(e);
}

You’ll notice we have this set up in a try-catch block – fullproof with throw an exception if the engine fails to open, which we can then (hopefully) use to debug the problem, or provide some meaningful error message for the user.

An aside: If you’re curious about this.debug && console.log(e);, this is simply a convenient way of toggling logging on and off at a single point, without needing to go through your code and route all of the console.log calls out. Instead, you simply set this.debug to false, and those console.log calls will never be made.

You’ll notice we’ve specified two callbacks, each to the same function (this.ready) passing true in the case of success, and false in the case of failure. In our code, this simply sets the search side panel to available, and enables the search button and field if true is passed, or displays a user-friendly error message otherwise. It would be in this callback function that you’d do whatever is needed to mark for your app that search is now ready to go.

Things are looking up

Finally, for the sake of completeness, let’s take a quick look at what a (stripped-down) version of the actual lookup is like.

this.engine.lookup(value, function(resultSet){
    if (!resultSet || !resultSet.getSize())
    {
        that.debug && console.log("No Results Found.");
        return;
    }
// Higher scoring items should be displayed first
resultSet.setComparatorObject({
    lower_than: function(a,b) {
        return a.score > b.score;
    },
    equals: function(a,b) {
        return a.score === b.score;
    }
});
// Filter for unique values in resultSet
var values = [],
    len = resultSet.data.length;
while(len--)
{
    (values.indexOf(resultSet.data[len].value) !== -1) ? resultSet.data.splice(len, 1) :
        values.push(resultSet.data[len].value);
}
// For each result in the resultSet, create a direct link to the appropriate section, and
// display a collapsible div showing context.
resultSet.forEach(function(entry){
    var result = that.gContent[entry.value];
    // Create links, etc. here
    that.debug && console.log(that.gContent[entry.value]);
});
});

So, we call this.engine.lookup with the value we want to search for, receving a resultSet (an object of type fullproof.ResultSet), which we then proceed to set a comparator on (in order to get the order we desire), filter to make sure only unique entries are returned, and for each, create some element to represent the result.

Besides the UI elements involved, that’s about all there is to it! It works quite well (surprisingly so, for an entirely client-side solution to search). If you happen to be involved in a project that can’t rely on a server-side solution to search, fullproof might well be the answer you’ve been looking for.

+ more