After recently attending the MongoUK conference in London, it is clear that MongoDB is fast finding application amongst cutting-edge web developers. As a relatively new concept for persistence, NoSQL (Not Only SQL) and more specifically document-oriented databases, such as MongoDB, are starting to enter the web applications landscape. Its strength lies in speed and ability to cope with dynamic data, making its goals align closely with requirements of many websites around today.
This tutorial will show you how to incorporate MongoDB into new or existing object-oriented applications, by showing how to interact and integrate with applications and how to deploy applications using MongoDB. Credit for this approach must go to Matthew Weier O'Phinney who spoke on this topic at the DPC 2010 conference in June. I would suggest reading this tutorial if you are considering using MongoDB for an application and are looking for a starting point on which to build an idea of its features. Alternatively, you may also be interested in this tutorial if you've found yourself getting tied to the persistence layer in the past and are looking for ways to reduce that technology lock-in. If you've implemented MongoDB into many of your systems, then maybe this tutorial will, open your eyes on a new way of integrating it. Whatever your background, if you understand object-orientation and want to start using MongoDB, then this tutorial is for you.
I had been wanting to research MongoDB and attempt to build something with it, however most of the tutorials I found seemed to make the assumption that all business models are arrays. Personally I prefer objects for models so I searched for a better solution, but found nothing of an OO implementation. Recalling the talk I attended at DPC, I decided to implement my application using the method suggested, mixed with some of my experience with Propel. This article is about the lessons I learned along the way.
MongoDB Features
MongoDb is a really useful technology when deployed in the correct situation. That said, it is by no means a silver bullet to persistence issues. Schema flexibility – from the dynamic-schema system it uses – is very useful for dynamic data but it is not relational. Therefore if the data for your website is highly relational, for example a record shop, then MongoDB may not be the best choice. It may be ideal for an area of the record shop, for example the custom-built analytics tracking. The best example of a data structure that is suited for MongoDB is when the fields are unknown, similar to a use case which would make entity-attribute-value (EAV) model a good choice.
One of the main benefits of using MongoDB is speed. It is also very easy to shard if the need arises, since it is designed for easy horizontal scaling. This and its mirroring servers options reduce the size of task for implementing high availability.
Storage
Document oriented storage uses the concept of storing each item as a document within your collection (database). This means that each document you retrieve from your database completely describes the entire item, rather than just a part of it represented by a row in a relational system. This also allows the system to implement a dynamic schema, which means that each item has its own schema. The advantages of this design are that each document can be slightly different, drastically different or completely the same, there are no restrictions. For example we could store objects containing all the data to display a page, allowing the specifics of each page to differ. This approach is generally more suited to discrete data sets. An example of a non-discrete data set could be a football league's data. Scenarios such as advertising campaigns where the application could be considered a discrete system with strict requirements on speed, flexibility of data and full querying would be ideal.
Using MongoDB in an Application
Another example of a useful application of MongoDB could be where caching is required but it needs to be more structured than a simple in memory cache. For this you can ignore writing to disk and make use of the best speed MongoDB has to offer. CMS systems often require very fluid data structures which often result in relational database anti-patterns being deployed and then the software has to try and compensate for these performance penalties. Instead, a document database such as MongoDB could be used to overcome this issue. The primary application of MongoDB is suggested by the NoSQL term of "not only SQL" existing alongside a RDBMS to provide its benefits where suitable. This especially suits the service-oriented architectural approach where each service is separate from each other, often resulting in discrete data structures. An example of a service could be storing user information in a system where users aren't tied into other data, and MongoDB would be ideal for this.
This tutorial will guide you through making a very simple URL storing and tagging tool. It's accessed by the command line to keep the examples simple, but perhaps you could extend this into your own version of delicious.com.
Setup
For an easy, step by step guide to setting up MongoDB, please visit: http://www.mongodb.org/display/DOCS/Quickstart and follow the instructions for your platform. Then we need to install the drivers for php, again there is a really comprehensive installation guide found at: http://www.php.net/manual/en/mongo.installation.php . In this tutorial I connect to the default installation of MongoDB on localhost:27017 with no authentication, for details on connecting and authenticating please read: http://www.php.net/manual/en/mongo.connecting.php .
Taking our First Steps
The first step in this tutorial is to create the basic business logic for the application. As a software engineer, I like to think about the business model first before trying to create the persistence model. This allows you to make your business logic fit the task, rather than concern itself with how to persist everything. So in this step, we need to create the Plain Old PHP Objects (POPO, taken from the idea of POJO from the Java world).
The above diagram shows the Link object we're going to build in step 1. The diagram notation will be familiar to people who have studied OO before and especially people who understand UML. The code to achieve this is written below, making a basic business model. The __toString method is purely to allow us to print it out with ease.
class Link { private $url; private $tags = array(); public function __construct() { // Not required to do anything at this time. } public function getUrl() { return $this->url; } public function getTags() { return $this->tags; } public function setUrl($url) { $this->url = (string) $url; } public function addTag($tag) { $this->tags[] = (string) $tag; } public function removeTag($tag) { $success = false; // look through the current tags for the specified tag foreach ($this->tags as $key => $tagName) { if ($tagName == $tag) { // remove the tag from the list unset($this->tags[$key]); $success = true; } } return $success; } public function __toString() { // example: http:://www.google.com/ [usa, search] return sprintf('%s [%s]', $this->url, implode(', ', $this->tags)); } } |
The next step is to add methods to retrieve and insert all relevant data about the object. In this example all of our relevant data is "tags" and "url". The methods to interact with the data structures are toArray() and fromArray(), which we will add to our class. Do remember not to bypass any security / sanity checks that are generally expected of public methods; you may wish to use these methods elsewhere in your application.
... /** * toArray convert a link object to an array representation * * @access public * @return array the link representation */ public function toArray() { return array( // _id is the name of the url in this case as it makes it easier // for interoperability with the Database 'url' => $this->url, 'tags' => $this->tags, ); } /** * fromArray read the properties of the link from an array * * Caution: overwrites existing data * * @param array $link the properties as produced from the toArray() method * @access public * @throws Exception * @return this object */ public function fromArray($link) { // Ensure the url is set if (empty($link['url']) || !is_string($link['url'])) { throw new Exception('Incorrect data supplied'); } $this->url = $link['url']; // Only set tags if they conform to our structure $tags = array(); if (!empty($link['tags']) && is_array($link['tags'])) { foreach ($link['tags'] as $tag) { if (is_string($tag)) { $tags[] = $tags; } } } $this->tags = $tags; return $this; } ... |
Now we have a way to interact with the business model from outside the class without compromising its integrity. The next step is to construct a database-agnostic way of interacting with the data source. I have traditionally tried to use ORMs that handle this for me but document storage is so similar to the way an object works that I am starting to change my viewpoint. I've borrowed the naming convention I use for these classes from the ORM propel. That is to name them with the name of the class and the postfix "Peer". So for this example the object we're interacting with is called a "Link" so the peer class would be "LinkPeer". The following diagram will display the methods we make available for the peer class.
As you can see from the LinkPeer specification, the methods it contains are mostly CRUD actions. The fetchAll function is also included for convenience. One key thing to remember in this tutorial is that we always pass the object into the Peer class where possible. This ensures that we store the correct object to the data persistence layer and helps with retrieving. Now we need to start to construct this class. The following code shows the class construct, its properties and its construct method.
class LinkPeer { const COLLECTION = 'link'; const ID_FIELD = 'url'; const DB_ID_FIELD = '_id'; private $db; private $collection; public function __construct(MongoDb $db) { $this->db = $db; } ... } |
The db is the connection to the database that will be used. In this example we're passing in the object and allowing the Peer class to retrieve the collection. This may not be desired functionality for other applications. The following methods retrieve a collection from the database object.
... private function getCollection() { // if the collection is not already cached if (empty($this->collection)) { // selects the collection if it exists $this->collection = $this->db->selectCollection(self::COLLECTION); // or creates a new one if it doesn't if (empty($this->collection)) { $this->collection = $this->db->createCollection( self::COLLECTION ); } } return $this->collection; } ... |
Getting a collection is much like selecting a table to work with. You can have multiple collections in a database and you can have multiple databases on a server. Unlike relational database tables, collections cannot be joined. Also unlike schema-oriented databases, you can store any structure within the collection. This is purely a convenience method and in a real application this would often be inherited from a base peer class.
Storing an item in MongoDB automatically produces an id that is stored in the "_id" field. As we know the unique identifier for links – i.e. the "url" – we can ensure that each item stored to the database changes the "url" property of the link to the "_id". Its also important to remember to convert them back afterwards. The next step is to add two methods, one to convert the array to and from a database compatible version and the other is to extract the id from the array.
The getIdFromDataset literally extracts the id from the dataset if it exists. The translateDataset method converts backwards and forwards between the data compatible with the database. The $toDb parameter denotes which way the data is to be prepared for. If incorrect data is specified then false is returned.
With all the preparation in place, we can start to build the CRUD methods. The first methods are CREATE – in this application "insert" – and UPDATE. In MongoDB these are actually the same action, since we specify the id.
... public function insert(Link $link) { return $this->update($link); } public function update(Link $link) { $data = $this->translateDataset($link->toArray(), true); if ($data && $this->getCollection()->save($data)) { return true; } return false; } ... |
Once again, it's important that any interaction passes an object as far as it will go. The above example shows you that the insert action is exactly the same as the update action, both of which use the save method on the MongoCollection. The important thing to note about the interaction with MongoDB, in the above example, is that we are literally passing in arrays to the MongoCollection. This is one of the best features about MongoDB's PHP API.
Now for RETRIEVE, which will hopefully return any data, we've inserted into the collection. As I said in the introduction, this example isn't going to dive into complex retrieval of data. However we do have to deal with returning objects from the Peer class and we do that through a factory method.
... public function fetchAll() { $results = array(); // perform a find with a blank query foreach ($this->getCollection()->find() as $result) { // Create a Link out of each of the results $results[] = $this->factory($result); } return $results; } public function fetchByUrl($url) { $query = array('url' => $url); $mongoQuery = $this->translateDataset($query); $response = $this->getCollection()->findOne( $mongoQuery ); // if the database returns a result return the link if ($response) { return $this->factory($response); } // Otherwise return null return null; } private function factory(array $linkArray) { if (!isset($linkArray['_id'])) { throw new Exception('Missing data'); } $linkArray = $this->translateDataset($linkArray, false); $tmp = new Link(); return $tmp->fromArray($linkArray); } ... |
The fetchAll method simply queries the collection without supplying a query string and then calls the factory method on all the returned results. As for the fetchByUrl, we simply create a data structure with the required attributes specified and query the collection for it. In a more ideal scenario we would use the Link object to create the data structure we use to query the collection. Finally the factory method changes the database returned data structure back to an object-friendly format. It then creates a new object and applies the values. To take this a step further we could ensure the peer class doesn't create copies of any of the persistent objects unless necessary but that is beyond the scope for this tutorial.
Finally we'll implement the DELETE action, which will remove an object from the database.
... public function delete(Link $link) { $link = $this->translateDataset($link->toArray(), true); $linkKey = $this->getIdFromDataset($link); $key = array( '_id' => $linkKey, ); if ($linkKey) { $options = array( 'fsync' => true, // Forces the update to be synced to disk ); try { if ($this->getCollection()->remove($key, $options)) { return true; } } catch (MongoException $e) { // @TODO log the error } } return false; } ... |
The delete method accepts only link objects, which helps to ensure the correct business object is being removed from the persistence layer. The "fsync" option is used to ensure that the action is synchronized to disk before responding. All of the actions we do to the collection can accept this option. Each application should review this option – if it's not used there's a significant speed boost – as it means that you can lose data if it is set. In any real application then logging and correct error reporting would be implemented. This is once again not in scope of the tutorial.
Using MongoDB In the Real World
This tutorial has shown you how to get started with MongoDB and I hope it has illustrated how easy it is to work with. Before you start using MongoDB, make sure MongoDB is right for you and the application that you will build. It has some fantastic advantages to it, but it is definitely not a silver bullet and its uses are generally more specific than those of standard relational databases. The full application code and test harness for the examples shown in this article are available on github - go to http://github.com/paul-matthews/Mongo-Db-Example to download it. If you're using MongoDB or about to get started, leave a comment - it is always interesting to hear about the experiences of others.
Resources
Useful resources for further reading or filling in the gaps:
- http://uk3.php.net/manual/en/book.mongo.php - the MongoDB extension for PHP, a really good reference guide for interacting with the database.
- http://try.mongodb.org/ - an interactive tutorial for getting to grips with the database.
- http://rickosborne.org/blog/index.php/2010/02/09/infographic-migrating-from-sql-to-mapreduce-with-mongodb/ - fantastic example of a complex task for MongoDB, however it does highlight the fact that not all data is appropriate for MongoDB

18 comments














18 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.
Nice tutorial, but why, are you creating the collection explicitly? It is created on first find/insert/save/update or whatever, by mongo, so this step isn't needed.
You forgot to mention certain durability concerns. They should be explicitely noted.
Really good article, I always wanted to look into a NoSQL system like Mongo or Couch. This will get me up to speed pretty quick. So I now promised myself to put this on top of my list for 2011.
The only thing I am still wondering; what are the biggest differences between MongoDB and it's PHP driver and CouchDB in combination with PHPillow?
Michael:
Yes, you're completely correct.
It was added while I hunted for disk space issues but unfortunately I forgot to remove it. I'll do so as soon as possible.
I suppose there's a lesson to be learnt there: if you've got a full disk and you don't explicitly specify the "fsync" option, then it'll all appear to work until you try and read.
Your comment is much appreciated.
Regards,
Paul Matthews.
Hi,
I am Basmah and I am beginning to design a survey development tool which people can use to build surveys. I designed an ERD for it but then I decided to use some NoSQL database for performance and speed reasons. My tool will allow users to register, upload files, and create, edit or delete their surveys. They can have some custom settings or notifications or activating conditions too. After reading your article , I am in doubt that Will Mongo db allow me to do all these operations. Your suggestion would be really useful and appreciated. Thanks !!!
Really Awesome Tutorial.
Thanks
Continuing the Discussion
Using MongoDb in PHP Applications - http://techportal.ibuildings.com/2010/11/30/using-mongodb-in-php-applications/
Good howto for using MongoDB with PHP http://techportal.ibuildings.com/2010/11/30/using-mongodb-in-php-applications/
Using MongoDb in PHP Applications - http://techportal.ibuildings.com/2010/11/30/using-mongodb-in-php-applications/
Using #MongoDb in #PHP Applications | Paul Matthews http://bit.ly/iffhSl
Using MongoDb in PHP Applications - http://techportal.ibuildings.com/2010/11/30/using-mongodb-in-php-applications/ #weblabor