After recently attending the MongoUK conference in London, it is clear that MongoDB is fast finding application amongst cutting-edge web developers. As a relatively new concept for persistence, NoSQL (Not Only SQL) and more specifically document-oriented databases, such as MongoDB, are starting to enter the web applications landscape. Its strength lies in speed and ability to cope with dynamic data, making its goals align closely with requirements of many websites around today.

This tutorial will show you how to incorporate MongoDB into new or existing object-oriented applications, by showing how to interact and integrate with applications and how to deploy applications using MongoDB. Credit for this approach must go to Matthew Weier O’Phinney who spoke on this topic at the DPC 2010 conference in June. I would suggest reading this tutorial if you are considering using MongoDB for an application and are looking for a starting point on which to build an idea of its features. Alternatively, you may also be interested in this tutorial if you’ve found yourself getting tied to the persistence layer in the past and are looking for ways to reduce that technology lock-in. If you’ve implemented MongoDB into many of your systems, then maybe this tutorial will, open your eyes on a new way of integrating it. Whatever your background, if you understand object-orientation and want to start using MongoDB, then this tutorial is for you.

I had been wanting to research MongoDB and attempt to build something with it, however most of the tutorials I found seemed to make the assumption that all business models are arrays. Personally I prefer objects for models so I searched for a better solution, but found nothing of an OO implementation. Recalling the talk I attended at DPC, I decided to implement my application using the method suggested, mixed with some of my experience with Propel. This article is about the lessons I learned along the way.

MongoDB Features

MongoDb is a really useful technology when deployed in the correct situation. That said, it is by no means a silver bullet to persistence issues. Schema flexibility – from the dynamic-schema system it uses – is very useful for dynamic data but it is not relational. Therefore if the data for your website is highly relational, for example a record shop, then MongoDB may not be the best choice. It may be ideal for an area of the record shop, for example the custom-built analytics tracking. The best example of a data structure that is suited for MongoDB is when the fields are unknown, similar to a use case which would make entity-attribute-value (EAV) model a good choice.

One of the main benefits of using MongoDB is speed. It is also very easy to shard if the need arises, since it is designed for easy horizontal scaling. This and its mirroring servers options reduce the size of task for implementing high availability.

Storage

Document oriented storage uses the concept of storing each item as a document within your collection (database). This means that each document you retrieve from your database completely describes the entire item, rather than just a part of it represented by a row in a relational system. This also allows the system to implement a dynamic schema, which means that each item has its own schema. The advantages of this design are that each document can be slightly different, drastically different or completely the same, there are no restrictions. For example we could store objects containing all the data to display a page, allowing the specifics of each page to differ. This approach is generally more suited to discrete data sets. An example of a non-discrete data set could be a football league’s data. Scenarios such as advertising campaigns where the application could be considered a discrete system with strict requirements on speed, flexibility of data and full querying would be ideal.

Using MongoDB in an Application

Another example of a useful application of MongoDB could be where caching is required but it needs to be more structured than a simple in memory cache. For this you can ignore writing to disk and make use of the best speed MongoDB has to offer. CMS systems often require very fluid data structures which often result in relational database anti-patterns being deployed and then the software has to try and compensate for these performance penalties. Instead, a document database such as MongoDB could be used to overcome this issue. The primary application of MongoDB is suggested by the NoSQL term of “not only SQL” existing alongside a RDBMS to provide its benefits where suitable. This especially suits the service-oriented architectural approach where each service is separate from each other, often resulting in discrete data structures. An example of a service could be storing user information in a system where users aren’t tied into other data, and MongoDB would be ideal for this.

This tutorial will guide you through making a very simple URL storing and tagging tool. It’s accessed by the command line to keep the examples simple, but perhaps you could extend this into your own version of delicious.com.

Setup

For an easy, step by step guide to setting up MongoDB, please visit: http://www.mongodb.org/display/DOCS/Quickstart and follow the instructions for your platform. Then we need to install the drivers for php, again there is a really comprehensive installation guide found at: http://www.php.net/manual/en/mongo.installation.php . In this tutorial I connect to the default installation of MongoDB on localhost:27017 with no authentication, for details on connecting and authenticating please read: http://www.php.net/manual/en/mongo.connecting.php .

Taking our First Steps

The first step in this tutorial is to create the basic business logic for the application. As a software engineer, I like to think about the business model first before trying to create the persistence model. This allows you to make your business logic fit the task, rather than concern itself with how to persist everything. So in this step, we need to create the Plain Old PHP Objects (POPO, taken from the idea of POJO from the Java world).

Link Object

The above diagram shows the Link object we’re going to build in step 1. The diagram notation will be familiar to people who have studied OO before and especially people who understand UML. The code to achieve this is written below, making a basic business model. The __toString method is purely to allow us to print it out with ease.

The next step is to add methods to retrieve and insert all relevant data about the object. In this example all of our relevant data is “tags” and “url”. The methods to interact with the data structures are toArray() and fromArray(), which we will add to our class. Do remember not to bypass any security / sanity checks that are generally expected of public methods; you may wish to use these methods elsewhere in your application.

Now we have a way to interact with the business model from outside the class without compromising its integrity. The next step is to construct a database-agnostic way of interacting with the data source. I have traditionally tried to use ORMs that handle this for me but document storage is so similar to the way an object works that I am starting to change my viewpoint. I’ve borrowed the naming convention I use for these classes from the ORM propel. That is to name them with the name of the class and the postfix “Peer”. So for this example the object we’re interacting with is called a “Link” so the peer class would be “LinkPeer”. The following diagram will display the methods we make available for the peer class.

Link and LinkPeer Objects

As you can see from the LinkPeer specification, the methods it contains are mostly CRUD actions. The fetchAll function is also included for convenience. One key thing to remember in this tutorial is that we always pass the object into the Peer class where possible. This ensures that we store the correct object to the data persistence layer and helps with retrieving. Now we need to start to construct this class. The following code shows the class construct, its properties and its construct method.

The db is the connection to the database that will be used. In this example we’re passing in the object and allowing the Peer class to retrieve the collection. This may not be desired functionality for other applications. The following methods retrieve a collection from the database object.

Getting a collection is much like selecting a table to work with. You can have multiple collections in a database and you can have multiple databases on a server. Unlike relational database tables, collections cannot be joined. Also unlike schema-oriented databases, you can store any structure within the collection. This is purely a convenience method and in a real application this would often be inherited from a base peer class.

Storing an item in MongoDB automatically produces an id that is stored in the “_id” field. As we know the unique identifier for links – i.e. the “url” – we can ensure that each item stored to the database changes the “url” property of the link to the “_id”. Its also important to remember to convert them back afterwards. The next step is to add two methods, one to convert the array to and from a database compatible version and the other is to extract the id from the array.

The getIdFromDataset literally extracts the id from the dataset if it exists. The translateDataset method converts backwards and forwards between the data compatible with the database. The $toDb parameter denotes which way the data is to be prepared for. If incorrect data is specified then false is returned.

With all the preparation in place, we can start to build the CRUD methods. The first methods are CREATE – in this application “insert” – and UPDATE. In MongoDB these are actually the same action, since we specify the id.

Once again, it’s important that any interaction passes an object as far as it will go. The above example shows you that the insert action is exactly the same as the update action, both of which use the save method on the MongoCollection. The important thing to note about the interaction with MongoDB, in the above example, is that we are literally passing in arrays to the MongoCollection. This is one of the best features about MongoDB’s PHP API.

Now for RETRIEVE, which will hopefully return any data, we’ve inserted into the collection. As I said in the introduction, this example isn’t going to dive into complex retrieval of data. However we do have to deal with returning objects from the Peer class and we do that through a factory method.

The fetchAll method simply queries the collection without supplying a query string and then calls the factory method on all the returned results. As for the fetchByUrl, we simply create a data structure with the required attributes specified and query the collection for it. In a more ideal scenario we would use the Link object to create the data structure we use to query the collection. Finally the factory method changes the database returned data structure back to an object-friendly format. It then creates a new object and applies the values. To take this a step further we could ensure the peer class doesn’t create copies of any of the persistent objects unless necessary but that is beyond the scope for this tutorial.

Finally we’ll implement the DELETE action, which will remove an object from the database.

The delete method accepts only link objects, which helps to ensure the correct business object is being removed from the persistence layer. The “fsync” option is used to ensure that the action is synchronized to disk before responding. All of the actions we do to the collection can accept this option. Each application should review this option – if it’s not used there’s a significant speed boost – as it means that you can lose data if it is set. In any real application then logging and correct error reporting would be implemented. This is once again not in scope of the tutorial.

Using MongoDB In the Real World

This tutorial has shown you how to get started with MongoDB and I hope it has illustrated how easy it is to work with. Before you start using MongoDB, make sure MongoDB is right for you and the application that you will build. It has some fantastic advantages to it, but it is definitely not a silver bullet and its uses are generally more specific than those of standard relational databases. The full application code and test harness for the examples shown in this article are available on github – go to http://github.com/paul-matthews/Mongo-Db-Example to download it. If you’re using MongoDB or about to get started, leave a comment – it is always interesting to hear about the experiences of others.

Resources

Useful resources for further reading or filling in the gaps: