When I started my own personal site, I faced a dilemma: I’m a software developer, why should I use a content management system (CMS) on my personal website when I’m supposed to be an expert at making websites? Every CMS I’ve used has angered me in one way or another. The end result was always me hacking together my personal site with PHP (the one thing the language was originally designed for) or playing around with other custom backend solutions for it.
A static website with a dynamic backend though seemed wasteful to me. Every time somebody visits the page, the web server must regenerate the content for the page. The engineer inside of my head was screaming “surely there’s a better way to do this!” I analyzed many different solutions and finally found one that allowed me to design the backend however I wanted, yet allowed the site to be hosted statically: wget. This is no ground-breaking discovery, as some websites have been doing this for years (decades now?). However, I didn’t think to implement this until now.
wget is a very handy command-line application for retrieving files over a variety of protocols, namely HTTP. One of the primary uses is downloading single files in a scripted install or set up. For example, downloading a gzipped tar file containing source code, extracting it, and then compiling the source code.
With a few extra parameters, wget turns into a web archiver, which is another one of its primary uses. You can call it to recursively retrieve all the files and resources for a website onto your local computer, creating an offline backup of the entire website that you specified. This can put a lot of load on the web server, so a lot of website owners attempt to thwart this use. There are many flags that can be sent to wget to minimize the stress it puts on the web server and to work around websites that try to block wget’s usage.
In my set up, I have a local web server running on my machine that hosts the dynamic “static” website that I want to push to the remote web hosting environment. However, I only pay for static web hosting, so I cannot and do not want to upload the PHP files. Using wget and a few flags, I can quickly “compile” these PHP files into static HTML files.
wget -m -p -nv http://localhost/
-m flag, or
--mirror, turns on a couple of flags that are useful for creating a mirror of a website. For our purposes, the important thing is that it enables infinite recursion of all the linked pages on our site.
-p flag, or
--page-requisites, tells wget to download all the resources needed to locally view the page (e.g., images, stylesheets).
If you don’t specify
--no-verbose, you are going to get a ton of information blasted in your face, as wget operates in verbose mode by default. In no-verbose mode, which differs from the completely quiet (
-q) mode, you will still get a message for each successful file download as well as error messages.
After running this command, you will have a local copy of your website in your current folder under a sub-folder named after the website’s domain name (e.g., localhost). You can test out viewing the files locally to see if they were successfully downloaded. One of the flags that you may need to add depending on what URL scheme you use for links is
--convert-links). This will convert all URLs into a format that works locally. For example, if you have
&lt;a href="http://localhost/contactus/"&gt;, that’s not going to work when you upload it to your remote host. Using
-k, wget will convert this to
../contactus/index.html (assuming you are one page below the web root). I did not like this option because it did not assume the web server defaults to displaying index.html, so it clutters the URL by appending it to each link.
You are free to choose whatever backend languages, technologies, frameworks, architecture, etc. that you wish. If your web server can process it, then it will work. In my case, I created an object-oriented semi-template system that slightly resembles Jinja templates and has methods for managing linking pages and resources.
The final step in my process that makes it easy to push my changes to the remote web host, is to integrate rsync. This takes my compiled HTML files and synchronizes them, as well as the page resources, to the remote server.
All together, here is the shell script that pushes my personal site to the remote web host:
wget -m -p -nv http://localhost/personal2/
rsync -rtv $LOCAL_WEBROOT/ user@mywebsite:/remote/www/root