Stop Wasting CPU Cycles on Static PHP Sites

wget

wget is a very handy command-line application for retrieving files over a variety of protocols, namely HTTP. One of the primary uses is downloading single files in a scripted install or set up. For example, downloading a gzipped tar file containing source code, extracting it, and then compiling the source code.

With a few extra parameters, wget turns into a web archiver, which is another one of its primary uses. You can call it to recursively retrieve all the files and resources for a website onto your local computer, creating an offline backup of the entire website that you specified. This can put a lot of load on the web server, so a lot of website owners attempt to thwart this use. There are many flags that can be sent to wget to minimize the stress it puts on the web server and to work around websites that try to block wget’s usage.

Static Generation

In my set up, I have a local web server running on my machine that hosts the dynamic “static” website that I want to push to the remote web hosting environment. However, I only pay for static web hosting, so I cannot and do not want to upload the PHP files. Using wget and a few flags, I can quickly “compile” these PHP files into static HTML files.

wget -m -p -nv http://localhost/

The -m flag, or --mirror, turns on a couple of flags that are useful for creating a mirror of a website. For our purposes, the important thing is that it enables infinite recursion of all the linked pages on our site.

The -p flag, or --page-requisites, tells wget to download all the resources needed to locally view the page (e.g., images, stylesheets).

If you don’t specify -nv, or --no-verbose, you are going to get a ton of information blasted in your face, as wget operates in verbose mode by default. In no-verbose mode, which differs from the completely quiet (-q) mode, you will still get a message for each successful file download as well as error messages.

After running this command, you will have a local copy of your website in your current folder under a sub-folder named after the website’s domain name (e.g., localhost). You can test out viewing the files locally to see if they were successfully downloaded. One of the flags that you may need to add depending on what URL scheme you use for links is -k (--convert-links). This will convert all URLs into a format that works locally. For example, if you have &lt;a href="http://localhost/contactus/"&gt;, that’s not going to work when you upload it to your remote host. Using -k, wget will convert this to ../contactus/index.html (assuming you are one page below the web root). I did not like this option because it did not assume the web server defaults to displaying index.html, so it clutters the URL by appending it to each link.

Backend

You are free to choose whatever backend languages, technologies, frameworks, architecture, etc. that you wish. If your web server can process it, then it will work. In my case, I created an object-oriented semi-template system that slightly resembles Jinja templates and has methods for managing linking pages and resources.

Upload

The final step in my process that makes it easy to push my changes to the remote web host, is to integrate rsync. This takes my compiled HTML files and synchronizes them, as well as the page resources, to the remote server.

All together, here is the shell script that pushes my personal site to the remote web host:

LOCAL_ROOT=dirname $0 LOCAL_WEBROOT=$LOCAL_ROOT/localhost/personal2 cd $LOCAL_ROOT wget -m -p -nv http://localhost/personal2/ rsync -rtv $LOCAL_WEBROOT/ user@mywebsite:/remote/www/root

Spot the Vulnerability: Loops and Terminating Conditions

by Paul Hendry | Jan 7, 2022 | Developer Blog, Home Display

Spot the Vulnerability: Loops and Terminating Conditions In memory-unsafe languages like C, special care must be taken when copying untrusted data, particularly when copying it to another buffer. In this post, we\'ll spot and mitigate a past vulnerability in Linux\'s...

Accurate Timing

by Jason Bagley | Sep 24, 2021 | Developer Blog, Home Display

In many tasks we need to do something at given intervals of time. The most obvious ways may not give you the best results. Time? Meh. The most basic tasks that don't have what you might call CPU-scale time requirements can be handled with the usual language and...

Exploring Dependent Types in Idris

by Paul Hendry | Aug 27, 2021 | Developer Blog, Home Display

When I'm not coding the "impossible" at Art+Logic, I take a lot of interest in new programming technologies and paradigms; even if they're not yet viable for use in production, there can often be takeaways for improving your everyday code. My current...

Stop Wasting CPU Cycles on Static PHP Sites

wget

Static Generation

Backend

Upload

Recent posts

Categories

Spot the Vulnerability: Loops and Terminating Conditions

Accurate Timing

Exploring Dependent Types in Idris

Solutions

About