Unlocking the Web Audio API

“It’s going to be the coolest thing ever.”

You know enough by now to be doubtful when a client makes this statement, but you’re willing to entertain the idea that this may not, in fact, be a tragedy in the making.

“It’s going to be a music machine – like, full keyboard and everything – but each of the keys is going to be mapped to – wait for it – cat sounds! We’ll call it the ‘Meowsic Machine’! Oh, and we need it to be accessible to everyone via the Web. Which is easy, right?

You are reminded that the universe can be a cruel place.

It’s now your job to make this happen. Over the course of a few posts, we’re going to look at the Web Audio API, and build the Meowsic Machine together. In the process, we’ll also enjoy a dalliance with Vue.js, and dip our toes into the deep-end with Web Workers. Today, we take the first step in this historic journey—convincing the browser to actually let us play sounds.

PeriodicWave… Isn’t that something the crowd does at a sports game?

The Web Audio API is rather intimidating at first glance if you’re not used to dealing with the low-level vagaries of audio – you feed in an appropriate file to the right OS API, it gets converted to PCM via magic, cat sounds come out of the speakers, right?

(If you’re coming from the world of dealing regularly with said vagaries, you’re probably thinking "this API is weird – there’s a mix of high-level and low-level constructs, and why is setting up a standard ring buffer so hard?" We’ll get to those details/criticisms/workarounds in a future post).

It is, however, an unquestionably better API for playing sounds—especially dynamically created sounds – than any web standard to date. It’s also fairly well supported across the board by browsers, so there’s no reason not to dig in and get to work.

KISS

So, let’s start at the very beginning. If you haven’t already, check out MDN’s Basic concepts behind Web Audio API article. It’s a great introduction into some of the basics of not just the API, but playing sounds in general.

We just want to play some cat sounds, though, not delve into the Nyquist-Shannon Sampling Theorem, or reprise the history of the 44.1kHz sampling frequency. Can’t we just skip to the good stuff?

So, let’s assume we have a sound file, meow.mp3. We’re going to rely on the browser have the right codec to decode this file, and we’re not going to try and loop it, alter its gain, or perform any transformations of it—we’re just going to play it.

We could do something this simple with the Audio Element—but we want to do bigger and cooler things in the future. It is worth noting, however, that an audio element can be used as the source for an Web Audio Context – we may delve more into this in the future.

For now, let’s get this party started:

const _audioCtx = new (window.AudioContext || window.webkitAudioContext)();

    /**
     * Allow the requester to load a new sfx, specifying a file to load.
     * @param {string} sfxFile
     * @returns {Promise<ArrayBuffer>}
     */
    async function load (sfxFile) {
        const _sfxFile = await fetch(sfxFile);
        return await _sfxFile.arrayBuffer();
    };

    /**
     * Load and play the specified file.
     * @param sfxFile
     * @returns {Promise<AudioBufferSourceNode>}
     */
    function play (sfxFile) {
        return load(sfxFile).then((arrayBuffer) => {
            const audioBuffer = _audioCtx.decodeAudioData(arrayBuffer);
            const sourceNode = _audioCtx.createBufferSource();
            sourceNode.buffer = audioBuffer;
            sourceNode.connect(_audioCtx.destination);
            sourceNode.start();

            return sourceNode;
        });
    };

Okay, let’s unpack it:

First, we create our audio context—if you’ve dealt with the Canvas API this is familiar—this is going to be our audio processing graph object. We look for the standard AudioContext object first, and if we can’t find, we try to fall back to the browser-prefixed version, webkitAudioContext, to broaden our browser support.

Then, we declare an async function. This is part of the ECMAScript 2017 spec, but if you’re able to use the Web Audio API, you’re probably able to use this too. Async/Await is a wonderful bit of sugar over Promises, and you should take advantage of it if you can.

Within this async function, load, we rely on the Fetch API to go and get the file for us (my, aren’t we linking to MDN an awful lot in this post!). There’s no reason we couldn’t use XMLHttpRequest here instead, but fetch is wonderfully compact and again, if you can use Web Audio, you can almost certainly use fetch.

We await the response from fetch, and then get the response body as an array buffer and return it.

Then, in play, we perform both the loading and playing, so our api becomes just a call to play with the file location and wait on the promise returned from our async load function. Then, we need to decode the audio data, turning it into PCM, and then create an AudioBufferSourceNode from the return (that’s the call to _audioCtx.createBufferSource()).

We then set our buffer source to draw from the audio buffer we’ve created out of the array buffer we created out of the file (whew), and connect it to the destination of the audio context (e.g. the audio context output, like speakers), and finally we can call start on the sourceNode to have it start pumping its audio into the audio context.

Easy-peasy.

Except… if you do this with your meow.mp3 file, you won’t hear anything. The file will successfully be fetched, loaded, decoded… but not played. What’s going on?

Now Hear This

You’ve hit the browser’s autoplay policy. Nobody likes opening a new tab and having whatever website they’ve just navigated to start blaring their addiction to spongebob to the whole office. So, most browsers have various autoplay policies which restrict what the page can do with media before the user has interacted with it.

For our purposes today, this boils down to not being able to play audio until the user has interacted with the page in some easily measurable way – that is, clicked/tapped on it.

So, in order to unlock our audio, we’ll need to listen for that interaction, and wait to play our sounds until after we’ve received it. We don’t want to have to download and play real audio to make that happen, so we’ll need to create an empty sound buffer that we can use as our stalking horse.

Also, if you tried the above example on iOS, it would have failed regardless of autoplay policies, because iOS is a special snowflake and hasn’t updated their audio APIs in a while. Let’s handle that too.

Oh, and it would be nice not to need to re-download a file every time we want to play it if we anticipate needing to play it multiple times. Let’s throw that in there for good measure.

KISS Redux

Let’s update our example above, and we’ll walk through the new bits together:

(function() {
    const _af_buffers = new Map(),
        _audioCtx = new (window.AudioContext || window.webkitAudioContext)();
    let _isUnlocked = false;

    /**
     * A shim to handle browsers which still expect the old callback-based decodeAudioData,
     * notably iOS Safari - as usual.
     * @param arraybuffer
     * @returns {Promise<any>}
     * @private
     */
    function _decodeShim(arraybuffer) {
        return new Promise((resolve, reject) => {
            _audioCtx.decodeAudioData(arraybuffer, (buffer) => {
                return resolve(buffer);
            }, (err) => {
                return reject(err);
            });
        });
    }

    /**
     * Some browsers/devices will only allow audio to be played after a user interaction.
     * Attempt to automatically unlock audio on the first user interaction.
     * Concept from: http://paulbakaus.com/tutorials/html5/web-audio-on-ios/
     * Borrows in part from: https://github.com/goldfire/howler.js/blob/master/src/howler.core.js
     */
    function _unlockAudio() {
        if (_isUnlocked) return;

        // Scratch buffer to prevent memory leaks on iOS.
        // See: https://stackoverflow.com/questions/24119684/web-audio-api-memory-leaks-on-mobile-platforms
        const _scratchBuffer = _audioCtx.createBuffer(1, 1, 22050);

        // We call this when user interaction will allow us to unlock
        // the audio API.
        var unlock = function (e) {
            var source = _audioCtx.createBufferSource();
            source.buffer = _scratchBuffer;
            source.connect(_audioCtx.destination);

            // Play the empty buffer.
            source.start(0);

            // Calling resume() on a stack initiated by user gesture is
            // what actually unlocks the audio on Chrome >= 55.
            if (typeof _audioCtx.resume === 'function') {
                _audioCtx.resume();
            }

            // Once the source has fired the onended event, indicating it did indeed play,
            // we can know that the audio API is now unlocked.
            source.onended = function () {
                source.disconnect(0);

                // Don't bother trying to unlock the API more than once!
                _isUnlocked = true;

                // Remove the click/touch listeners.
                document.removeEventListener('touchstart', unlock, true);
                document.removeEventListener('touchend', unlock, true);
                document.removeEventListener('click', unlock, true);
            };
        };

        // Setup click/touch listeners to capture the first interaction
        // within this context.
        document.addEventListener('touchstart', unlock, true);
        document.addEventListener('touchend', unlock, true);
        document.addEventListener('click', unlock, true);
    }

    /**
     * Allow the requester to load a new sfx, specifying a file to load.
     * We store the decoded audio data for future (re-)use.
     * @param {string} sfxFile
     * @returns {Promise<AudioBuffer>}
     */
    async function load (sfxFile) {
        if (_af_buffers.has(sfxFile)) {
            return _af_buffers.get(sfxFile);
        }

        const _sfxFile = await fetch(sfxFile);
        const arraybuffer = await _sfxFile.arrayBuffer();
        let audiobuffer;

        try {
            audiobuffer = await _audioCtx.decodeAudioData(arraybuffer);
        } catch (e) {
            // Browser wants older callback based usage of decodeAudioData
            audiobuffer = await _decodeShim(arraybuffer);
        }

        _af_buffers.set(sfxFile, audiobuffer);

        return audiobuffer;
    };

    /**
     * Play the specified file, loading it first - either retrieving it from the saved buffers, or fetching
     * it from the network.
     * @param sfxFile
     * @returns {Promise<AudioBufferSourceNode>}
     */
    function play (sfxFile) {
        return load(sfxFile).then((audioBuffer) => {
            const sourceNode = _audioCtx.createBufferSource();
            sourceNode.buffer = audioBuffer;
            sourceNode.connect(_audioCtx.destination);
            sourceNode.start();

            return sourceNode;
        });
    };

    _unlockAudio();
}());

That’s the ticket! Now, when we load our meow.mp3 by calling play, we attempt to fetch the audiobuffer from our impromptu cache if possible, or fallback to fetching it. (Note that it’s important we cache the decoded audiobuffer, rather than the fetched array buffer—decoding is expensive! However, if we’re planning to fetch large compressed files, these will be decoded into PCM and potentially eat up a huge chunk of memory – keep an eye on the tradeoff!)

Then, in load, we also perform the decoding, falling back to a decode shim if the browser throws an error (as it does in iOS), due to out-of-date API implementation.

Otherwise, play looks pretty much the same.

And finally, we’ve wrapped all of this in an IIFE, with _unlockAudio getting called as soon as the IIFE is executed, adding touch/click listeners to the document so we can unlock the audio API as soon as the user has indicated they’re willing to interact with our site.

Phew! That’s a fair amount of work to play a single file! Depending on the needs of your project, you may want to explore some of the libraries that can handle some of the grunt work for you, like Howler.js and SoundJS. There’s no magic, though, and you may need to dive into the true depths for your use case.

The Meowsic machine project is on its way—next time, we prepare to begin working on the UI with Vue.js.

Images from Wikipedia Commons: cassette, padlock

Christopher Keefer

Christopher Keefer

Christopher Keefer is a Senior Software Engineer at Art+Logic. He generally spends his spare time on the computer too, so there isn't much hope for him.
Christopher Keefer

Latest posts by Christopher Keefer (see all)

Tags:

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.