blog

Image of PDF logo

Generating PDFs: wkhtmltopdf & Heroku

by

So, it has come to this.

Reports, yes, your application will have to have reports – in brand colours, with images and logos abounding, and probably festooned with graphs of various sizes, shapes and degrees of relevance to what was once a nice, streamlined set of data. This report has just become a part of the application ‘product’, meant not just to communicate, but also to entice and enthrall. Form has become just as important as function… and, did I forget to mention? It also needs to be exportable.

Exportable, portable, downloadable, shareable – because as I mentioned, it’s not just data now. It is now something clients/users need to be able to ‘have’, to attach to emails, send to their marketing departments, and incorporate into their powerpoints.

There’s a few ways to make this happen, but generally speaking, it’s time to break out the PDFs.

The Portable Document Format (PDF) has been a staple of web applications for some time, now – it’s well-suited to the sharing of information in a self-contained package, and it’s increasingly well-supported as many modern browsers bake support for the display of PDFs directly into the browser.

There are a number of ways to produce a PDF. We could use one of the various libraries for our language of choice – say, PyPDF2 or pdfrw for Python. PDF is a binary file format with a published specification, so if we really had a lot of free-time on our hands and/or a masochistic streak, we could even do the necessary bit-twiddling ourselves. If we needed to just output some simple text or static images, these might be the way to go (except for the bit-twiddling. Most projects will not benefit from you suddenly deciding you need to devote the next n months to creating a new PDF library from scratch, no matter how fun that might sound).

In the scenario I posit above, however, we’re transitioning from an HTML page which displayed the data, to a full, glossy report – but it still needs to show up in the browser first, and support PDF as an export option. We could have two completely separate code-bases to support the web page display, and to build the PDF, but wouldn’t it be nice if we could just render our web page to a PDF?

wkhtmltopdf

If that line of thinking seems familiar, you’ve probably also heard of wkhtmltopdf. wkhtmltopdf is an open source command line tool (and C library) that relies on the WebKit rendering engine, which underlies Safari and, until the still fairly recent fork to Blink, powered the Chrome browser as well. The rendering engine uses the same HTML and CSS (and, optionally, JavaScript, with some limitations) that would be used to render your web page, and sends the output on to be converted into images and eventually, incorporating a number of specifiable configuration details, into a PDF.

Assuming this sounds just like what the doctor ordered, installing wkhtmltopdf on most linux distros is just an apt-get, yum install, etc. away; or you can download a pre-compiled binary if building from source is giving you trouble (and it might, particularly when attempting to build it on a server without any GUI libraries in place).

Now that we have the binary available, we could just call out to the command-line from our language of choice; we can make our lives a little easier by using a wrapper libary, however. I first used pdfkit years ago with Ruby and it has been my go-to since for both Ruby and Python, but there are any number to choose from if pdfkit doesn’t strike your fancy.

For the purpose of this article, however, we’re going to assume you’re taking my suggestions, so let’s add pdfkit to our requirements.txt file (update the version as needed):

pdfkit==0.6.0

Heroku, wkhtmlopdf-pack, and which wkhtmltopdf

So, this gets you set for working in your local ‘nix dev box, but what about when it comes time to put this on a staging server or go live? If you’re using a similar environment to your development system, the setup will be much the same – but let’s add another piece to this puzzle, and posit that we’re going to need to run on the still very popular Heroku platform.

In that case, we can’t rely on the standard wkhtmltopdf binaries, as they aren’t compatible. Instead, we need to ‘vendor in’ a pre-compiled binary for their platform – that’s the purpose of the various Heroku buildpacks that you may be familiar with if you’ve used Heroku before.

We could compile the necessary binary for ourselves in a heroku dyno via Heroku run, but it would be nicer not to spend the time/money if we don’t have to; and luckily, we’re not the first to think so.

Enter wkhtmltopdf-pack. Based on a similar project for Ruby, this python package simply installs a pre-compiled wkhtmltopdf binary that’s compatible with Heroku’s Cedar-14 platform, quick and easy, so let’s add that to our requirements.txt file as well.

wkhtmltopdf-pack==0.12.3.0
pdfkit==0.6.0

Now, we’re faced with one additional hurdle, which is that we’ll want to use the wkhtmltopdf-pack binary on heroku, but the wkhtmltopdf binary on our ‘nix dev system. So, let’s add an environment setting indicating the name of the binary we’re looking for on heroku. Go to your dashboard, find the app to set this for, and click on settings. Then, add this config variable:

Key: WKHTMLTOPDF_BINARY    Value: wkhtmltopdf-pack

And now, let’s add a little logic to ask the system which wkhtmltopdf binary we should use. This can go into whatever bootstrap config your framework might use (so, for instance, in Django’s settings.py), or simply before your use of wkhtmltopdf/pdfkit.

# Ensure virtualenv path is part of PATH env var
os.environ['PATH'] += os.pathsep + os.path.dirname(sys.executable)
WKHTMLTOPDF_CMD = subprocess.Popen(
    ['which', os.environ.get('WKHTMLTOPDF_BINARY', 'wkhtmltopdf')], # Note we default to 'wkhtmltopdf' as the binary name
    stdout=subprocess.PIPE).communicate()[0].strip()

The value of WKHTMLTOPDF_CMD should now point to the binary we’ll want to use.

pdfkit Example

Finally, let’s take a look at a quick example of what using pdfkit with this binary might look like.
PDFKit offers us fairly easy access to the various configuration options for the binary, and let’s us set which binary we want to use, as shown below. See the documentation for more details.

    pdfkit_config = pdfkit.configuration(wkhtmltopdf=settings.WKHTMLTOPDF_CMD)
    wk_options = {
        'page-size': 'Letter',
        'orientation': 'landscape',
        'title': title,
        # In order to specify command-line options that are simple toggles
        # using this dict format, we give the option the value None
        'no-outline': None,
        'disable-javascript': None,
        'encoding': 'UTF-8',
        'margin-left': '0.1cm',
        'margin-right': '0.1cm',
        'margin-top': '0.1cm',
        'margin-bottom': '0.1cm',
        'lowquality': None,
    }
    # We can generate the pdf from a url, file or, as shown here, a string
    pdfkit.from_string(
        # This example uses Django's render_to_string function to return the result of
        # rendering an HTML template as a string, which we can then pass to pdfkit and on
        # into wkhtmltopdf
        input=render_to_string('reportPDF.html', context=params, request=request),
        # We can output to a variable or a file - in this case, we're outputting to a file
        output_path=os.path.join('filepath', 'filename'),
        options=wk_options,
        configuration=pdfkit_config
    )

 

Further Reading

+ more

Accurate Timing

Accurate Timing

In many tasks we need to do something at given intervals of time. The most obvious ways may not give you the best results. Time? Meh. The most basic tasks that don't have what you might call CPU-scale time requirements can be handled with the usual language and...

read more