NU.nl is a well known news website in its homeland, The Netherlands, and is actively expanding into other countries. On an average day NU.nl will serve up 7 million page views; peak traffic days are more than triple that number. In short, it is one of the top 10 Dutch web sites in terms of traffic. Previously, on our corporate blog, Erik Snoeijs discussed the technologies deployed while building out the back-end of NU.nl in is article “NU.nl; the back end”. In this article we want to look at the front end that we architected for NU.nl, and how we designed the system to handle both regular traffic and peaks.
When Ibuildings started working on NU.nl, around a year ago, we knew that one of the most important aspects of the project would be performance. With an average of 7 million page views a day NU.nl is the most frequent visited Dutch news website. If that wasn’t enough we also needed to support more extreme days, times when the traffic could double or even triple in a single day. We didn’t have to wait long before one of those days hit. On February 25th, 2009, less than 90 days after the new infrastructure was rolled out, it was stress tested when a Turkish Airliner crashed at Schiphol. On that day the new site set a single day traffic record by serving up 21 million page views in a 24 hour time period, all without any noticeable slowdown and without having to bring additional hardware online to handle the additional load. We don’t have the time to describe all of the technologies that came together to allow NU.nl to weather this storm but we do want to share with you a few of the important pieces and some of the lessons we learned.
The Framework: “CodeIgniter”
Originally, our initial assessment of the project led us to believe that Zend Framework would be the best framework to build on. However, after discussing it with the client it was decided that the new NU.nl would be built on CodeIgniter as it was up to the task and was their corporate standard. One advantage of CodeIgniter is that it has very small footprint due to its very basic MVC approach. This allowed us great freedom in building layers on top of it while still remaining lightweight.
When we started gathering requirements for the system we recognized that even though PHP is fast, it would be best to have another mechanisms in place to alleviate the need to execute a PHP script for every page request. While no dynamic language will approach the speed of serving straight HTML, we needed to find a way to serve the pages as close to that speed as possible while preserving the dynamic nature of the site. The answer we arrived at was to design a caching strategy that would cache fragments of a page (which we called snippets) on disk.
When you visit NU.nl and look at the structure of the pages you will see that they have a common layout. Each page consists of a header, a navigation bar on the left, a sidebar on the right, a footer and of course the main content area. Although these fragments are part of every page, they are not the same for every page. However, except for the main content, all fragments are only used in a limited number of variations, meaning they can be re-used. An example of this would be the list of other articles below an article, this list changes every time a new article is published within the article’s section, and should be the same for all articles within that section. Wouldn’t it be great if we could somehow include that common snippet in every article page for a given section?
Once we had completed our mechanism for caching the individual snippets, we turned our attention to the problem of how to efficiently combine the snippets into a finished page. We quickly discarded both PHP’s native include as well as Apache’s Server Side Include. Our first benchmarks showed that both techniques performed approximately the same. Adding opcode caching gave PHP the edge but the improvement was still marginal at best. Even with caching PHP was a factor 2 or more slower than plain HTML.
A layer of Varnish
Knowing that we would need better than that, we began looking into a reverse-proxy solution that could be placed in front of the front-end servers. One of the options we considered was Varnish. Many people we discussed the project with had heard good things about Varnish, but we were unable find anyone who had actually deployed it. We started looking into Varnish which at that time had a stable 1.2 and an experimental 2.0 branch. When we were looking at the differences between the 1.2 and 2.0 branches, we noticed they added support for a subset of the Edge-Side-Includes language in the 2.0 branch. Edge-Side-Includes (ESI) shares a lot of similarities with Server-Side-Includes, however the includes are handled at the proxy level. Because Varnish caches the snippets in-memory the snippets are also combined in-memory without the need for contacting the front-end servers until one or more of the snippets expire.
Varnish and ESI were exactly what we were looking for to speed up our snippet based caching system. After performing some benchmarks with both Varnish and other reverse proxy solutions we realized that Varnish was the right choice, It was the right combination of features and performance for NU.nl.
The other moving pieces
The other moving pieces of the system are the ATK-based CMS, used to maintain the content and a special “glue” piece that brought it all together. As part of the NU.nl project, we created a special task scheduler. When a user performs certain actions in the CMS, for example publishing a news item, the CMS adds tasks to the scheduler. Each front-end server starts executing these tasks right away. An example of such a task could be the creation of the static article snippet, another one would be the refreshment of the list of other news items for the article’s section. Once these snippets have been generated they are saved on disk as plain HTML files. New snippets are automatically fetched by Varnish when they are requested for the first time, for existing snippets we send a purge request to Varnish so that it picks up the new version of the snippet right away.
In the end the combination of all these techniques made sure that NU.nl can handle loads three times their normal without issue or downtime. As part of the project, Ibuildings monitored both Varnish and front-end servers during the peak loads to ensure stability and responsiveness. The system preformed so well that we are confident that there is additional capacity, yet untapped in the existing infrastructure.
There are many architectures available that we could have used to build NU.nl. The tools that were used were all open source and could have been assembled in several different configurations. Ibuildings, drawing on the collective experience of its consultants, was able to select the correct pieces and assemble them in such a manner that the client’s system was able to weather a huge storm. In the case of NU.nl, not only was the client pleased that their application remained online and serving pages but the users were able to get important and timely news as this tragic event unfolded.