In an earlier article written for techPortal, , the Hierarchical-Model-View-Controller architecture was explored. Using an example web application called Gazouillement and the Kohana Framework, the article investigated how structuring code using an HMVC methodology can help overcome some common scalability challenges in complex software architectures. The article concluded by demonstrating the relative simplicity of horizontally scaling the HMVC Gazouillement example application, after analysis of the execution bottlenecks.

The previous article was intended to be a reintroduction to HMVC for the web application era. HMVC is not a new concept: it was originally referenced in a Java World article over ten years ago and based on an idea that dates back forty years. Todays rise in notoriety of HMVC might be due to the popularity it is enjoying in modern frameworks. Or it could be that the similarity in size and scope of modern web applications to their desktop cousins has given developers reason to revisit the HMVC architecture. Given the present interest in HMVC, this is a great time to explore the subject further and answer a few of the questions arising from the previous article.

Hierarchical-MVC has been shown to make large web applications easier to scale out, but there is a price to pay— namely overall performance. This article will investigate ways of improving performance within HMVC web applications using asynchronous processing and some good old caching techniques. Predominantly this article will use examples written for the Kohana Framework; however all the concepts portrayed here could apply to any framework or web application.

What’s wrong with Hierarchical-MVC?

So far there has been nothing but praise for Hierarchical-MVC, but like most things it is far from being a silver bullet. As with everything in life, nothing comes for free. In this case, a clean logical structure applied to code costs in overall performance.

Lets consider the following code that could be found in a controller to load a user from a resource. Example A uses a direct database data provider, whereas Example B uses the HMVC method to get the same resource.

$user_a and $user_b contain the same data representation. But with all things being equal, the data in $user_a will be loaded and available much faster than the $user_b model. $user_a is using an injected database resource to load its contents, whereas $user_b is creating a request to an internal MVC triad for its data. Database connections are not known for their speed, yet the HMVC methodology still loses out. What is happening here?

First, lets look at the traditional method used to load the user model in Example A. To get the requested data the User model uses a database. Lets look at the instructions within the User::load() method.

  1. Initialise database connection with correct credentials
  2. Use the database defined for this application
  3. Select everything about the user john@doe.name from the users table and return the results
  4. Parse the result into the User model

 

This should be familiar to the vast majority of PHP developers as a standard method for retrieving data from a database. Now lets examine the execution for the second user in Example B, the Request::execute() method.

  1. Parse the uri defined within the request object and match to predefined route
  2. Load the controller matched within the route
  3. Ensure the detected action exists in the controller and execute
  4. Initialise database connection with correct credentials
  5. Use the database defined for this application
  6. Select everything about the user john@doe.name from the users table and return the results
  7. Parse the result into a json response and return
  8. Decode the json response into a variable

 

$user_b has double the number of instructions required to complete the execution over $user_a. Additionally the execution of steps 4 through 7 within Request::execute() are almost identical to the entire execution stack of User::load(). This is because the MVC triad containing the user logic stores user data in a database. HMVC is only abstracting the interface, not replacing the underlying persistent storage technology.

Example A and B process instruction set juxtaposed with time, visually demonstrating that Example A is almost twice as fast as Example B
(Select image to enlarge)

The instruction set shown above outlines how a framework such as Kohana handles HMVC. Change the example above to another framework and it is likely that the instruction set will grow further[1]. Move the MVC triad containing the User logic to another server and the stack increases even further again[2]! We can see from this example that HMVC is not going to provide a fast architecture, even if it does scale easily. So Hierarchical-MVC is not going to perform as well as traditional MVC in all situations. However there is much that can be done to reduce the performance discrepancy.

Gimme the cache!

When tasked with optimising a web application, developers will turn to caching as a tried and tested remedy to performance related ailments. Using a cache removes the need to repeatedly process the same instruction set after the first process iteration, reducing the total number of execution steps for a given controller. There is still the same performance problem for the first iteration of code execution, but subsequent iterations can be served by a cached result if available. The example below applies caching to the earlier code examples A & B.

Example A and B process instruction set juxtaposed with time, including caching, demonstating both A and B are equal with valid cache available
(Select image to enlarge)

The example above balances the performance between the HMVC and traditional method of loading the User for all subsequent requests for either model. Memcache is a good choice in this case as it is distributed, allowing the entry to be read by other MVC triads. Other caching technologies could be used, however Memcache ensures all MVC triads within any domain can access the cached record. Localised cache technologies such as APC or Xcache can be used in Hierarchical-MVC architectures, however they should only be used where the cache is only required within the MVC triad being executed.

At this point the original problem of performance has been solved with working solution. But this solution is not very scalable as the caching information is explicitly defined within the controller receiving the data. If the application is scaled out and MVC triads moved to alternate domains, changes to the caching rules would require developers to alter code in disparate parts of the system, thus removing an important advantage of a HMVC architecture.

One of the fundamental features of Hierarchical-MVC is the exclusive use of HTTP interaction with MVC triads through their controller. If the caching rules could be transferred with the model data, the caching logic can be controlled by the MVC triad providing the data. This ensures the MVC triad responsible for the data is also responsible for the caching rules, rather than the controller or model receiving it as shown above. This sounds appealing, but how can this work in practice?

When retrieving data from a database, there is no easy way to append caching data to the result as metadata without encoding it into the result itself. What would be good is if HMVC could supply the caching information as metadata alongside the resulting record. Fortunately the interaction between MVC triads is using the HTTP protocol as an interface. This provides a long established method for supplying caching information within HTTP responses using the Cache-Control header. The HTTP/1.1 specification (also known as RFC 2616) provides a set of caching rules for the HTTP protocol. It is advisable that developers wishing to implement the HTTP protocol cache controls fully read and understand RFC 2616 prior to implementation as the rules are complex. PHP developers should use the HTTP PECL extension, which has implemented HTTP Cache-Control logic within HttpResponse::setCache().

All caching interpretation within HMVC should be contained within the client as it is with HTTP. The client in HMVC is the code that executes the request and parses the corresponding response. In the upcoming Kohana Framework version 3.1, the Kohana_Request_Client class executes requests and contains all of the caching logic. In other frameworks it should probably reside within the dispatcher or similar class depending on the implementation. Once the framework is ready to handle Cache-Control headers, it is a simple task to include the caching information within responses. Lets take a look at the Messages controller from Gazouillement, updated since the previous article.

The messages controller makes use of a RESTful pattern and there are some caching instructions applied to the response. Lets step through this controller to examine the properties and methods in detail.

The controller has some refactored and new properties that define the behaviour of this controller. $_accept_formats provides the supported response formats, which will be validated against the Accept header of the request. The resolved response format is stored separately in $_response_format and the user context of the request is stored in the original $_user property.

Before the controller action requested is executed by Kohana, the request client will execute the Kohana_Controller::before() method. Zend Framework developers can use the Zend_Controller::preDispatch() method to achieve the same logic. Most other web application frameworks have an equivalent method. This method sets up the controller before each action is executed, not to be confused with the standard constructor that will only be invoked upon instantiation. The before() method defined above ensures that the request Accept header matches the supported response formats and that the user parameter is valid for this context. If either of the criteria fail inspection, the correct response code is returned from the controller.

Once the Kohana_Controller::before() method has finished executing, the request action method will be executed. In this method the messages are loaded from the relevant model and processed before being applied to the response body. Then the Cache-Control header is set to five minutes and the content type is set to the appropriate value. The Kohana_Controller::$response object is rendered appropriately by the client, so there is no need to render it within controller actions.

Finally, the Controller_Messages::_prepare_response() method formats the response correctly according to the resolved response format. This has no effect on caching of the response ultimately, but it does demonstrate a method for supporting multiple response formats within controllers.

Now the controller is defined it is ready to receive requests. For the most part the HTTP interaction is hidden from developers, abstracted by the framework or server language the application is implemented using. It is important to understand the raw interaction, so for clarity the HTTP request and resulting response are shown below. Users of Kohana can access the true HTTP representation within the upcoming version 3.1 by invoking the render() method on either the request or response object.

Request


GET /samsoir/messages HTTP/1.1
Accept: application/json

Response


HTTP/1.1 200 OK
Cache-Control: max-age=300
Content-Type: application/json
Content-Length: 90

{messages: [{msg: 'This is a test message', date: '2010-10-10 14:51:34 GMT+1'}], total: 1}

The resulting response contains the messages for the user samsoir in json format. Included within the response header is the Cache-Control directive providing an instruction to cache the resource for five minutes. The controller responsible for messages has now taken control of the cache settings, ensuring all cache logic is maintained within the MVC triad responsible for the data. All clients interpreting this response will cache the resource for five minutes unless the cache is invalidated.

Caching should be applied to web applications in layers. Using the Cache-Control header for caching instructions has an added benefit of allowing any HTTP interface to store the cache correctly. Reverse proxy servers such as Varnish and Squid can also provide an additional layer of cache if placed between the client and controller. By adding a proxy server that can cache to the architecture, it is straightforward to add an additional caching layer to Gazouillement without changing any controller logic.

How do other controllers and models get messages from this controller now it has caching headers applied? If the client executing the request understands HTTP cache control headers, then no additional code will be required. For existing clients that cannot parse HTTP Cache-Control headers natively, it is highly recommended to use the PECL HTTP extension. Below is an example using Kohana Framework 3.1, which does understand Cache-Control directives.

Messages controller example to demonstrate how caching increases performance of the Gazouillement messages controller
(Select image to enlarge)

Because the Kohana_Request class understands how to handle the HTTP Cache-Control header, the request client will only execute a new request for resources if the cached response has become stale or invalidated. The code responsible for loading the messages resource does not have to change or add any additional cache control logic. All of the caching information is provided in a format that can be interpreted by numerous consumers, without augmenting the standard structure of the data returned. We have now implemented a scalable way of caching MVC triad responses that respects the Hierarchical-MVC conventions, ensuring all caching logic for each triad is maintained within it own domain.

Parallel Processing

So far we have optimised the execution time of single requests to Hierarchical-MVC resources to ensure that they can perform as quickly as more traditional methods for loading data. For many this may be enough to get their applications performing to expected metrics. But all requests are still happening synchronously. A better solution would be to load HMVC resources asynchronously in one operation. Unfortunately PHP does not lend itself to symmetric multiprocessing, restricting PHP code to executing in sequence on a single thread even when executed on multi-core systems.

The architecture of Hierarchical-MVC enables applications to run across multiple systems and software languages because of the HTTP interfaces used between the triads. Because resources are requested rather than directly processed, there is scope to run requests in parallel even in a language that does not support symmetric multiprocessing such as PHP.

Once again lets look at the Gazouillement application, this time at the index controller used for the users’ homepage. This controller pulls a number of resources from other parts of the system before presenting them to the client.

Controller_Index represents a typical Hierarchical-MVC controller that needs to load many different resources from across the system. It uses a RESTful interface when loading resources, and reverse routing to reduce overhead when MVC triads move location. The problem with this controller is that it is very linear. Each instruction must complete before the next is executed, and this means that the total processing time is at the mercy of the connections to the other resources.

Diagram demonstrating the long execution time of the Controller_Index::action_index() method using synchronous processing
(Select image to enlarge)

It would be better if the loading of those external resources could be processed asynchronously rather than in sequence. To do asynchronous processing of resources, it is important to examine exactly what can be processed and when. Bundling all three of the resource calls up into one asynchronous process within the controller action would cause a logical failure. The request for relations and messages have a strong dependency on the user being available, therefore the user has to be loaded before the messages and relations can be loaded. But once the user is available, the other related resources can be loaded asynchronously as neither have any strong dependencies on other resources within the action.

PHP provides two standard models for creating asynchronous HTTP requests, curl_multi_exec and HttpRequestPool— the latter is part of the PECL HTTP extension. It should be noted that the HttpRequest class uses Curl internally, but does provide a nice Object-Orientated interface to HTTP operations.

Using the HttpRequestPool it is possible to optimise the execution of the Controller_Index class defined before.

Diagram demonstrating the shorter execution time of the Controller_Index::action_index() method using asynchronous processing
(Select image to enlarge)

The Controller_Index action is now executing the requests for messages and relations in parallel, resulting in improved performance for this controller and a better experience for users. It is important to remember to resolve dependencies first when implementing asynchronous requests. There is no guarantee of which request will complete first, so asynchronous requests must have all resources required available before execution. As the User model is required by the messages and resources requests, it must be loaded ahead of the asynchronous process.

We have successfully introduced parallel processing into the execution stack of the Gazouillement application and improved the overall performance by caching the HMVC responses using the caching headers specified in HTTP/1.1. However there is a compromise to be addressed when this method of parallel processing used. When using HttpRequestPool we are creating a full HTTP request for a resource (see footnote[2]). This is not an issue when requesting resources that are located elsewhere, however it creates additional overhead if requesting from a local MVC triad. Therefore what is needed is a way of creating parallel requests within the same MVC triad— enter the Request Worker Pool. A future article (coming soon!) will discuss High Performance HMVC with Request Worker Pools, Clustering & the Cloud, addressing this last problem and providing the answer to localised parallel processing with HMVC.

In this article we have revisited the Hierarchical-MVC architecture with the aim of improving the overall performance of the entire process stack within Gazouillement. The first optimisation vector was directed at applying caching using HTTP Cache-Control headers, ensuring all caching logic was encapsulated within the MVC triad responsible for the data. After this optimisation Gazouillement only loaded resources that were absolutely necessary, reducing the total number of executions required to complete a request. The next optimisation was to implement asynchronous requesting where appropriate, allowing the Gazouillement application to parallel process within controller actions. Combined these two optimisation techniques ensure that Hierarchical-MVC architectures can scale horizontally whilst maintaining overall performance.

  1. The view action helper with Zend Framework 1.x re-bootstraps the application when creating a request to another controller action.
  2. In addition to the steps outlined, creating a request to another server adds the overhead of opening a HTTP connection to the server, processing the request and bootstrapping the application.