Cache Busting via Gulp.js

Have you ever thought about how many HTTP requests your app is wasting? Many developers think the native caching mechanisms of browsers are sufficient. However, did you know every time a page is loaded, an HTTP request is still typically made for every single resource to confirm that the server doesn’t have a new version? The web server will return a 304 Not Modified if the resource hasn’t changed. You may not think this is a big deal, but every HTTP request has a cost, and they add up. As you can see, when you load a blog post here on my site, many files return a 304 if you’ve already visited:

A List of HTTP 304 Unmodified Responses
A List of HTTP 304 Unmodified Responses

Now you may think this doesn’t matter much today in the age of high bandwidth, but all these 304’s aren’t costly because of their size. They’re costly because of latency. You see, over the years, bandwidth has indeed skyrocketed, but way back in the days of modems we’d already nearly hit a theoretical limit on latency. As Paul Irish outlined in Delivering the Goods in 1000ms, latency is an issue for all connections:

From https://www.youtube.com/watch?v=E5lZ12Z889k
From https://www.youtube.com/watch?v=E5lZ12Z889k

The pain of latency is really magnified with mobile networks:

Wireless Latency from https://www.youtube.com/watch?v=E5lZ12Z889k
Wireless Latency from https://www.youtube.com/watch?v=E5lZ12Z889k

Factor in the pain and complexity of TCP slow start, and the verdict is clear: Worrying about page size isn’t enough. For the ultimate performance, we must minimize the number of HTTP requests our apps make.

Declaring a War on Latency

So, it’s settled. If you want to provide users the ultimate in performance, you need to eliminate as many HTTP requests as possible. Thus, you must consider setting far future expires headers. This basically tells your visitors browsers “Never ask for this file again. Ever. I mean it.” Now the details on how to do this vary by web server (Here’s how in IIS, Apache) but you basically set an expiration date for your assets in the distant future so the browser won’t check the server for those resources again. Of course, eventually you’re going to change the file, so how do you tell everyone’s browsers that you didn’t mean it? That’s where cache busting techniques come into play. And while there are various ways to pull this off, I prefer using Gulp.

Gulp makes it easy to quickly create powerful build scripts that read much like English. Piping streams together feels really natural for anyone familiar with Linux. And processing files in memory not only gives Gulp a significant performance advantage over Grunt, it also means the configuration is lighter and much easier to read.

Both Gulp and Grunt offer a massive list of plugins, so in either case the hardest decision is deciding which mix of plugins best solve your problem. I’ve just created a cache-busting process in Gulp using two different methods. Let’s explore two approaches and consider their merits.

Option 1: Dynamically Inject Script Tag on the Server

This option utilizes the gulp-rev plugin. Gulp-rev versions your assets by appending a hash to filenames. So for example, if you run script.js through gulp-rev, it’ll spit out something like script-ddc06e08.js. Handy.

The big problem, of course, is how do you assure that any files which referenced script.js now reference script-098f6bcd.js instead? Well, handily, the gulp-rev plugin optionally generates a manifest.json file. This file contains JSON that maps the source filename to the new dynamically generated filename. So, using the above example, the manifest.json file would look like this:

You can probably guess why this is handy. Now all you need to do is read this file to get the new valid filename. Assuming you’re doing server-side rendering, you can simply use your server-side language to open this file, grab the value, and change the src attribute of the corresponding <script> tag accordingly. There’re some other similar approaches provided in the docs, but they all operate with a similar philosophy of relying on the manifest.json mapping file. I wasn’t in love with this approach or any of the approaches they outlined, so I created my own approach below.

Option 2: Replace Script References via Regex

This approach is similar to the approach above, but the <script> src attribute is set immediately by Gulp instead. To pull this off, I dropped the gulp-rev plugin and simply created my own suffix using today’s date. I like the clarity of being able to see when the file was last built. But here’s the real win, instead of generating the <script> tag dynamically on the server like we did in step #1, we use gulp-replace to change the relevant src attribute immediately upon build.

Here’s an example gulpfile:

Let’s dissect this file.

  1. Lines 1-5 pull in the necessary gulp plugins and define a task called js.
  2. Line 6 defines the new filename we want to use. The filename will end up being script-1-24-2015.js for example, assuming that’s the current date. I left the guts of the getDate() function out for brevity, but it simply returns a string in the above format.
  3. Line 7 specifies a glob that will retrieve all .js files in the scripts directory. This list of files is looped through on the following two lines.
  4. Line 8 concatenates all the files into a single file.
  5. Line 9 writes the concatenated file to the specified destination. Again, I left out the initialization of paths.build on this example for brevity.
  6. Line 11 defines a new src that needs our attention. This is the file that needs to reference the new dynamically named bundle we just generated. We specify the path to the filename, and also include a second parameter, base. This parameter is necessary so we can overwrite the file in the final step.
  7. Line 12 is where the magic happens. We simply use a regex to replace the current script tag src with a new value that matches the filename created on line 6. This way our file references the new filename. We access the script tag by using a regex that finds the script tag by id. In this case, we’re looking for a script tag with an id of bundle.
  8. Line 13 the updated file is written to disk.

I prefer approach #2 for two reasons:

  1. It occurs on build instead of load. In my opinion, a build task shouldn’t add overhead to every request (such is the case with approach #1 above since the path is dynamically generated on each request).
  2. It’s more discoverable. All the code that is manipulating your application for your build resides within your Gulp file with option #2.

And now with this wired up, I can save my users from making HTTP requests just to see if resources have changed. Bandwidth and time are saved on every page load. When the time comes to update the file, I have a simple build task to generate a new filename and update the corresponding file(s) that reference it.

One final piece of note: If you prefer for the filename to only change when the file contents have changed, you can continue to use the gulp-rev plugin and simply read the hashed filename that it assigns from the config file (since the hash only changes when the file contents change). In my use case, the large bundled file nearly always changes with each release, so I preferred the simplicity of a date based filename. Another approach often considered is to simply append a cache busting querystring on a reference to a static filename, but that may cause issues with some proxies, so I recommend truly changing the filename to bust cache as outlined above.

Have another approach to cache busting that you prefer? Please chime in via the comments.

9 replies on “Cache Busting via Gulp.js”

  1. The code output is showing a bunch of ASCII. I don’t think that was your intention. Hard to read.

  2. Problem of your 2nd approach is that every time there is a build, all the cache is invalidated. Maybe you only changed some css, but youre expiring also javascripts, libraries, templates…

    Thats why the best way to bust it is computing the file hash, which is what gulp-rev does …

    You can still make that happen on the build process though. You just require(‘manifest.json’) and inject every file into the templates, via regex, injection, whatever you want.

  3. I’m not too familiar with gulp, and I’m trying to implement what you’ve done here. What is `paths.build` and `Admin/Default.aspx` supposed to represent? Would really appreciate a response. Thanks, Cory!

    1. Hi Alan – Good catch. I forgot to declare the paths object in my example. I just updated it.

      Admin/Default.aspx represents the HTML file for my app that where I’d like to change the src attribute. In retrospect, I should have called in index.html for clarity.

      Thanks for the comment!

      1. Hi again Cory!

        I’m still having issues with this implementation. I’ve tried numerous other workarounds, but they don’t fully work.

        My issue is with lines 10-12. As you’ve explained, this part should allow us to rename the original files. This hasn’t been so. I’m not sure what’s causing the issue. Here’s my SO post: http://stackoverflow.com/questions/41787251/how-to-rename-the-original-files-of-scripts-in-index-html-using-gulp

        BTW, I really found your courses to be REALLY helpful. I feel a little star struck at the moment. LOL.

Comments are closed.