In this age of cloud computing, it has become far easier to choose, or even move, the geographical location of your web server. It has also brought high-availability solutions such as hosting simultaneously in multiple locations into the reach of a even very modest applications. But where is the best place to host your site? And are the improvements worth the effort of distributing your servers across the world?

Armed with a few simple tools, you can come a long way to answering these questions fairly quickly.

To begin, let’s tackle the first question first. Where is the best place to host a site? As with most things, the answer is “it depends”. In the simplest case, all of your site’s users are geographically close to one another, so you should host your site as close to them as possible. There are a number of anecdotal examples that show reducing load times for a web page can improve user satisfaction and even sales conversion rates. Thanks to Canadian company Strange Loop Networks, some of the key headlines from these reports have been collated into a single poster. So if you are lucky enough to have all your users together and you’re not hosting close to them, do it now for a quick win.

If your site doesn’t fall into the “all my users are in the same place” category, the best way to find out where to host is to benchmark. My tool of choice for this is the API provided at webpagetest.org. This is an open source project, so you can download the source and set up your own version, but there is also a hosted version available to use for free, just with limits on the number of requests one person can make per day. In order to use the hosted version you must first contact the site to request an access key, but once you’ve acquired that, the API itself is extremely simple to interact with.

Preparing the Setup

In order to run speed tests, you first need a test server setup. When running through this process myself I was trying to establish a good place to host servers for the UN’s World Food Programme. Inviqa have been tasked with re-architecting the API that underlies their online fundraising efforts. The API must be responsive globally, but there is a particular focus on keeping the administration tasks performed in Asia quick.

Four potential server locations were identified, each was one of Amazon’s EC2 regions. A server instance was booted in each of the four locations using a standard EC2 image. No modifications were made to the server, other than to place an identical 7KB PNG image, taken from one of the client’s existing websites, into a web accessible folder. Tests were then run, in the manner outlined below, against each of these four instances from nine locations around the world.

the image used for testing

You can find the code for all the code examples in this article on github at: https://github.com/ams2435/WebTest

Running the Tests

Using functions built directly in to PHP, we can communicate with the API in a very simple manner. The file_get_contents() function is capable of reading directly from a URL and the API we’re interacting with takes all of its parameters from the query string, so we can grab data with just one function call. This is a quick and dirty solution to the problem, but it works well for this example.

The parameters we will pass to the API are:

Parameter Description Value used
f The format of the response xml
k The access key for the API The access key we obtained to access the API
fvonly Tell the API whether we only require a “first view” of the test URL. The API can run a “second view” of the same URL for each test, which will show us the load times after any browser caching has taken place 1, for these tests we are not interested in finding out the speed of the cached version
runs How many times to repeat the test 5
location The location and speed profile for the test runs Various locations, each with a DSL speed profile – the equivalent of using a 1.5Mbps broadband connection.
The full list of locations can be obtained at http://www.webpagetest.org/getLocations.php
url The URL to test Various URLs representing each test server

 

Since we are requesting our result in XML format, we will also want to convert the returned string using the simplexml_load_string() function. Putting this together gives us the following code snippet:

You can see from the above that we are sending off the request to the API and receiving a testId in response. This is due to the asynchronous nature of the web test API; the test will not have actually been completed yet. We can use the returned testId to ask for the results once the test has completed, so we need to store this somewhere so that we know what results we still need to gather.

You may already have noticed that the above code is wrapped in a protected function, this function is called from the following piece of code:

We are simply storing the IDs returned from the _runLocation() function into a database, along with the URL, location and date of the run. You can see from the code that we’re running through two nested foreach statements so we can perform the tests in a number of location/URL combinations and then compare the results. The two functions shown are simply wrapped up in a WebTest_Runner class that also has an addLocation() function which stores each location that is passed to it for use in the above loop. The constructor to the class also takes an array of URLs to test. This leaves the triggering of tests looking as simple as this:

The code shown here triggers six tests using the web test API, one for each of the URLs in each of the locations specified. We will then have a testId in our database for each of these tests. Remember though that we have asked for five runs for each test, so our six test ids will actually relate to 30 individual speed tests.

Gathering the Results

The next thing we have to do is obtain those test results now the tests have completed. This is how we can do that:

This will return us either the test results we have asked for, or if the results are not yet available, an error code. We can continue to poll the API for these results as many times as we like, but in order to perform any analysis of the results it is better to store them in our own database. The following function does just that:

This function simply loops through all the existing testIDs we have not yet collected results for, and fetches the results. It then stores those results in a local database, looping through each of the five runs per test id and storing them all individually. Finally this function marks each result set we successfully download and process as having had its results gathered. The API access shown in the previous code snippet is happening here inside the WebTest_Result class.

Once this process has been repeated a number of times, you will have built up enough test data to be able to start analysing it. As the data is stored in an SQL database, we have the full power of SQL to analyse the results. Making use of the MIN(), MAX() and AVG() functions of SQL will give results that should begin to show a picture of how your site is responding from different locations.

The results below are those gathered by Inviqa for the re-architecting project.

The times shown are the full browser load time for the image, including DNS time, connection time, data transfer time and rendering time. This is clearly not a real world test, but since we have the same simple setup in each location, it is sufficient when simply comparing the different locations. The results were collected at intervals over a three-day period and average load times in milliseconds were recorded for each location.

Server response times by location (click for a bigger version)

Tokyo Server US West
(Northern California) Server
EU West Server US East Server
Sydney, Australia 600 855 1217 981
Jiangsu, China 1124 1536 3634 1888
Geneva, Switzerland 1213 896 537 692
Seoul, South Korea 736 802 1223 1056
London, England 1038 725 454 598
Los Angeles, USA 757 450 905 602
New York, USA 874 537 601 329
Sao Paolo, Brazil 1265 989 1036 749
Singapore 580 894 1442 1148

 

The first thing to notice is that the results back up our basic assumption that you should host as close to your users as possible. It is clear that should all of your users be in Singapore, the best of the tested locations is the closest (Tokyo). However, it isn’t always as simple as the actual distance, due to the way the data trunk cables are laid out across the world. Straight line distance between Seoul, South Korea and the EU West data centre in Ireland is approximately 9,000km, the distance from Seoul to the US East data centre in Virginia is in excess of 11,000km, however, as the results above show, the latency to US East from Seoul is actually lower than that to EU West. If you wish to delve deeper into the trunk cable locations, a map can be seen at: http://www.submarinecablemap.com/. The map shows you the different potential routes data can take across oceans, something I have found interesting to look at on more than one occasion.

Selecting a Server Setup

Once you have collected comparable results for the locations you’re interested in, it’s time to select a server setup.

Sticking with a Single Location

The simplest solution is to pick a single location that offers a good speed compromise for all your users. You can see from the results that in some cases the additional expense and complexity of geographically distributing your application would bring a speed increase of fractions of a second, which isn’t much of a return on your investment. I would suggest that if you’re targeting users in London and Los Angeles, rather than hosting in both EU West and US West, taking a 150ms hit from both locations and hosting in US East is a better option. After all, if you find once the application is up and running that 150ms is a deal breaker, you can always move to a more distributed approach at that point. This approach may not follow the Google advice that every millisecond counts, but not everyone has the resources that Google have to implement complex and expensive solutions, and there are other effective ways to speed up a site before resorting to a multi-region setup.

Making it More Robust

Despite picking one location, you don’t have to put all your server eggs in one data centre basket. If you decide to go with one of the Amazon regions, you will find they are all split into “Availability Zones”. An Availability Zone within a region is a completely separate data centre; Amazon offers a load-balancing service which will balance your load across these two data centres. Then if one Availability Zone becomes unavailable, all of your load will transparently go to the other zone, making it very easy to create a highly-available solution with a level of redundancy that is far more difficult and expensive to achieve without a cloud hosting provider. In order to distribute an application across different Availability Zones, it is usually necessary to do some data replication. Within each region, all of Amazon’s Availability Zones have high-speed connections to each other, meaning that doing this kind of data replication quickly is entirely possible. The diagram below shows a potential server setup across two availability zones. You can see from the diagram that the web servers should make all of their database writes to a single master in one of the zones, however, all the reads will be done from a database in the same availability zone.

Example server setup across Availability Zones

In this setup, if Availability Zone 2 becomes unavailable, nothing needs to be done for the application to continue running. If Availability Zone 1 becomes unavailable, the database slave must be promoted to be the master, but then the rest of the setup will continue working as expected. If you’re using both a MySQL database and EC2, you might even want to look at the Amazon RDS offering. RDS can do this data replication across availability zones for you with virtually zero maintenance effort, including automatic slave promotion and recovery.

Utilising a Content Delivery Network

No matter where you decide to locate your servers, there are other tricks you can employ to improve page load times. A good place to start is to employ a content delivery network (CDN). This is essentially distributing across many different locations, but only static resources are distributed. It is very often the case that the slowest part of a page load is receiving all the images, JavaScript and CSS files associated with the page. By adding these resources to a CDN, the user will load them from the nearest location to them and see a performance increase, but you haven’t had to do anything complicated to make the application truly multi-region. There are many different providers offering CDN solutions, they work by using a DNS trick to route the client to the fastest responding location for a domain name. This means you could set “images.mydomain.com” as the root address of your CDN, then reference all the images in your site with that subdomain; they will then always load from the fastest location for each user.

In addition to employing a CDN for your own static resources, if the resources are common resources, for example the base jquery.js file, loading it from a public CDN will offer the potential benefit of a user already having a cached copy before theyvisit your site for the first time – this really is a free performance boost as it’s difficult to ask for better than a locally-cached copy of a resource in terms of speed. This works due to many different sites using the exact same URL for the resource, so as far as the browser is concerned it has already loaded it. Google’s public libraries are very popular for this, a full list of those available can be seen at https://developers.google.com/speed/libraries/devguide.

Full Multi-region Implementation

If you have a compelling reason to go for a full multi-region setup, it is possible to use the same principles that CDNs employ to direct your users to the nearest server setup you have. The technique is called “latency-based routing”. There are a number of providers of such routing, and a recent entry into the market by Amazon, as part of its Route 53 DNS offering, is likely to make the practice more common. The advantage of using Amazon’s Route 53 service for this is how simple it is to integrate with the other Amazon services. To implement latency based routing, you set your application up in as many different locations as you require and then tell the routing service where all of your locations are. When a user hits your master URL, the location that will serve the user the quickest is automatically calculated, and the user is directed to that location.

Users are directed to the quickest location when employing latency-based routing with servers distributed around the world

The real difficulty with setting up your application this way is trying to share the data between all the locations – it is not simply enough to assume that a user will always hit the same location. Even if your locations never go down, someone might need to get at their data while they are travelling, or something could happen to a key piece of infrastructure that suddenly makes a different location respond quicker to a user. You can see from the diagram above just how complicated the synchronisation of data could be. For the four server locations that have been selected, represented by large dots on the diagram, six different synchronisation paths exist. It is this synchronisation of data that makes full multi-region setups incredibly complex. While it is possible to set up a master-slave replication in the same way we would to have the application work across multiple availability zones, the network latency is likely to be far too much of a limiting factor; any speedup that had been achieved by hosting multi-region will be lost. It is therefore necessary to utilise a master-master replication policy, which tends to require a much greater level of maintenance and is very easy to get wrong!

Keeping a Master Region

Maintaing a master region is a less daunting route to creating a multi-region setup. This is very much like having a single location setup, but with some extra server power making sure that the really important transactions are kept speedy. This approach relies heavily on having an asynchronous architecture within your application. The key here is making it appear to the end user as though the actions have been completed quickly, even if they haven’t yet been completed. Rather than waiting for a set of database queries to run before returning a response to the user, respond immediately to say the request is being handled. You can then push the request onto a queue, which the master location will pick up and process. While this doesn’t eliminate the need to communicate across the world on each request, the user is not waiting for that communication to take place, thus keeping the experience fast and keeping satisfaction high.

A major drawback of this solution is ending up with an inconsistent experience for users. While most requests can easily be queued for actioning later, some things cannot wait. If a user wants to pay for a product, we can’t ask them to wait a few minutes while we catch up before taking the credit card details! This means that these immediately-required processes will have slower response times than the rest of the application, due to having to make the user wait for the requests to travel to the master region. Still, with some clever usage of caching, this solution can be really attractive. If you truly have a requirement for global distribution of servers, this option strikes a nice compromise between complexity and performance.

Conclusion

We began our journey with two questions to answer. Hopefully I’ve been able to show you that picking a location to host your site is fairly simple once you know where your users are. The thing I’d most like you to take away from reading this article is that selecting the location for your server is an important decision. There is a vast range of different providers and locations that you could pick, and selecting the right one could make a huge difference to your end users.

Is it worth distributing your servers across the world? I think in most cases the answer to this question is no. It’s important to make your site work as fast as possible; sales can be increased and users made happy by even modest improvements to response times. However, the added complexity and cost of distributing your site across the world must be considered carefully. If you will be selling a product specifically to customers in the UK, do you need the website that sells it to run just as quickly for users in Australia? If you have a compelling case for a multi-region deployment, keeping a single master region is my advice. Slave front-ends in locations that need speeding up and a content delivery network to distribute your static content will keep your users happy most of the time.

Finally, if you’ve never run a speed test against your site using webpagetest.org or a similar service, I urge you to do it at least once. I’ve only really talked about the page load times reported by the API here, but the web interface is excellent for running one-off tests. The wealth of information and recommendation the site is able to offer is well worth the couple of minutes it takes to type your site’s URL into a text box and wait for the test to run.

Code Samples

Remember that all the code examples in this article are available on github at: https://github.com/ams2435/WebTest