Here at Ibuildings, we encourage our developers to continue to develop themselves professionally and to gain relevant certifications. Most have the Zend PHP Certification (ZCE) and some have the Zend Framework Certification (ZFCE) and/or the MySQL Certifications. I have recently been working to gain the MySQL Certified Developer qualification which is taken as two exams; each one is an hour and a half and covering a wide range of topics relevant to developing applications with MySQL (there is a separate qualification for administrators). I found some of the course material pretty easy as I have used it repeatedly over the years but there was plenty that was fairly new to me; some features that I haven’t been using and haven’t noticed other PHP developers using or talking about very much. Having learned more about them, I thought they were worth highlighting. I’ve enjoyed finding out about these features and look forward to using some of these elements such as subqueries, views and stored procedures effectively in future projects.

As developers, we typically think about the application domain problem and logic and then design our architecture and consequently the database tables from that. We see the database as merely holding the data and our programs as the part that applies logic and processing to that data. We develop our scripts in isolation from the rest of the finished system and rarely think about their impact on other processes and whether we are making good use of what PHP and MySQL are good at. There may sometimes be excellent reasons for this, but I suspect it is more often the case that:

  • We are not really aware of what MySQL has to offer, or if we are we are not confident in how to use the features.
  • The use of ORM (Object Relational Mapping) is growing and this tends to distance us from the actual database and the SQL. It is perfectly possible now to create an application without writing an SQL query at all.
  • We like to keep control in our hands and using some of these features mean the database doing more and our code a lot less which feels threatening.
  • We tend to develop in an isolated environment where the only thing running is our script; where we don’t have to worry about competing for resources and speed over a network. Increasingly our live applications are actually running on a network of servers and complex applications need all the help they can get to meet the requirements of page load times and to avoid contention for resources.
  • As PHP developers we traditionally create a single web application with a database back-end that is just used for that application so re-use doesn’t seem such an issue. However modular or service-level applications are growing; I’ve worked with several of them in recent years.

If all our programs are doing are simple CRUD (Create, Read, Update and Delete) operations on single tables then just using the most basic features of MySQL is probably sufficient, but if it is more complicated then we could be missing out on features that will really improve the performance and resilience of our applications. We could also save overall development time through re-use of database elements. Our overall aim is to produce fast, reliable, robust applications and we let ourselves and our clients down if we don’t use all the tools at our disposal.

MySQL Tips and Features

So what are these features I’m referring to? I propose to review the options here and then cover a few of them in a bit more detail. I’ll start with the things that some of you may already be using and then move on to the more obscure. As a general rule the aims here are to move less data from database to program, to have each query execute as quickly as possible, and to keep our data as secure as we can.

Fetch a Limited Dataset

Limiting the number of columns you select, limiting the number of rows with LIMIT and careful use of the WHERE clause all reduce the amount of data being transferred. Another useful and often underused option is using the GROUP BY clause along with aggregate functions to read summary data, rather than reading a whole dataset and then performing calculations on it in your program. Imagine you want to get some summary information about a company’s departments and their members of staff. You could produce a query that reads all the staff records and then process that in your program, but a more efficient approach is to get MySQL to do it for you by using the GROUP BY feature.

Link Tables Efficiently

Use joins to read from two or more related tables at once. Inner joins, either explicitly using the JOIN keyword or by listing tables in a comma separated list, get all the records where there is a matching record in each table. Outer joins (LEFT or RIGHT) select all the records from one of the tables and then the values for the subsequent tables if found, or give NULL values if there are no matches. There is also a self join for those occasions when you need to compare the values in different rows of the same table. Joins are a key concept and well worth mastering if you are not already comfortable with them. What is less well known is that joins can also be useful when doing multiple table updates and deletes, although they are used implicitly with a comma separated list in this context rather than with the keyword JOIN.

Sub-queries can also be used in a number of ways to create complex queries that select data from a number of tables. There can be a lot of overlap with joins, in that often the same query can be written using either a join or a sub-query, and a join may well be more efficient. A sub-query can produce a scalar value for use in an expression, a column of single field results, a single row with several fields or a whole table of data that takes the place of a regular table. To find out more the mysql documentation is to be found at http://dev.mysql.com/doc/refman/5.5/en/subqueries.html.

Analyse Your Queries – Can They Be Improved?

Use MySQL’s explain keyword when writing your queries to determine the most efficient way to write them when there are alternative ways of doing the same thing. EXPLAIN analyses the query and shows how many table rows have to be read to produce the output and which indices are used. Judicious use of indices can improve performance but poor use of them can actually make things worse, so they are worth giving careful thought to if you have sufficient access to/control of the database. Using EXPLAIN can also help you determine if changing the order of tables in a JOIN may improve the efficiency of a query.

It is very simple to use; when you think you have your query written, just run it with EXPLAIN in front of it. Then try some alternative ways of achieving the same output, or tweak the indices and compare the results.
eg.

Use Prepared Statements

Any security article covering PHP and MySQL will cover the desirability of using the prepared statements feature in PDO (PHP Data Objects) to guard against SQL injection. This is where the query is defined with place holders for changeable parameters which are then bound to it at execution. It also improves performance if the query is to be run a number of times with different data, as the query is only parsed when prepared rather than every time it is run as with a normal query repeated. I would view using these wherever possible as mandatory.

Transfer Data Efficiently

LOAD DATA INFILE (and the accompanying mysqlimport command line tool) is the most efficient way to insert a number of records into a table. Next time you need to do some sort of bulk import consider whether it is practical to have an appropriately formatted csv file transferred to the server (perhaps by FTP or equivalent ) and then use LOAD DATA INFILE to process it rather than reading it line-by-line in your script and inserting each record individually. You will have a much faster script that will have less impact on the server.

Enforce Data Consistency with Triggers and without Network Traffic

Triggers are defined statements that are run either before or after INSERT/UPDATE/DELETE operations on a table. They can be quite complex operations with the use of a block of statements and conditions. They can save the programmer having to remember to implement this logic each time the data is used and therefore help to force the integrity of the data. As an example, related records could also be deleted or updated to reflect the change or an entry created in an archive table when a record is deleted. They tend to have a bad reputation amongst developers; I suspect it is because they instinctively don’t like something they aren’t controlling. Perhaps if we thought of them in terms of the observer pattern we would be more at ease with them!
eg.

Pick the Right Database Engine for Your Tables

Database Engines are an important element to consider when designing a database. A default engine is set for a database, but can be changed per table. Think about what sort of table you are defining, and don’t just let MySQL default to the MYISAM engine. MYISAM tables are good for tables largely used for lookup and searches but if it is to be updated frequently the INNODB engine which comes with the option of transactions is a better choice as the use of transactions will improve data integrity.

Take Advantage of MySQL Views in Your Application

Views are stored select statements that can then be used in a query as if it was an ordinary table. They can select data from tables and also other views using most of the constructs available to any SELECT query. The only proviso is that there has to be some solid data behind it, so temporary constructs such as user variables and temporary tables are not available in views. The SELECT statement in the view is actually run when the view is used in a query; either the SELECT criteria merged with those of that query or run first to produce a temporary table which is then queried further by the end query.

Views can also be used to hide table structure changes, such as a table split into two. A view could be created with the old table name and effectively the same data, and then users would be able to query that as before without disruption.

Views particularly come into their own when reports are being generated. The view can be designed to select the base data the report(s) are based on, and then a report query can select and perform calculations based on this set of data – without the writer of that query having to know all the ins and outs of where the data is obtained. If the temporary table option is used then the underlying data tables are not being tied up for longer than necessary and are free for other users. A standard temporary table could be created for the same purposes but that has the disadvantage of being more likely to get out of date and using a view also means that the SQL for creating the “temporary table” is kept in the database and is accessible by all users with the relevant permission. Depending on the exact circumstances, this may or may not be appropriate.

To illustrate a view in action (although there are doubtless better ways of achieving what this does!)

If there is a clear one-to-one correlation between a view’s output rows and the underlying data then views can also be used in UPDATE and DELETE statements.

Procedures and Functions in MySQL

A few years ago, before joining Ibuildings, I spent some months working on a major application that used PHP with the Oracle database and the application made very heavy use of Oracle’s procedures and functions; the SQL in the code itself was largely confined to call statements to use the procedures. This functionality was new to me as it wasn’t available in MySQL at that time (it was introduced with version 5), so MySQL developers don’t have a history of using these in the same way that users of other databases do.

The MySQL documentation defines a stored routine as:

A stored routine is a set of SQL statements that can be stored in the server. Once this has been done, clients don’t need to keep reissuing the individual statements but can refer to the stored routine instead.

It is effectively a mini SQL program that is stored in the database; it has inputs, outputs and processing. Stored Routines is a term used to refer to two constructs that have many similarities but also some important differences.

Whilst MySQL provides a wide range of built in functions such as CONCAT() or YEAR(), it is also possible to write your own. Just like stored procedures, they can take input parameters and do some complicated processing but they must return a single value of a declared type. A function is used in an expression within an SQL statement just as the built-in functions are, for example:

A stored procedure is more complex, it has both input and output parameters and the results of a SELECT statement are returned to the calling code if not part of a cursor, just as with a standard SELECT query. It is actioned by using the CALL statement. Quite simply it would look like this:

What do Stored Procedures Have to Offer?

In a nutshell:

  • You can write considerably more complicated SQL than in a simple query, expressing more complex logic. There are control structures available, such as if/else and loops. With cursors you can iterate through the output of a SELECT statement.
  • By doing the logic processing within the database less data needs to be transferred from the database server, improving performance, although it does make more demands on the database server’s memory.
  • There is error handling available, so a failure doesn’t have to be a show-stopper, improving robustness.
  • The code can be re-used by a range of applications and each developer does not have to re-invent the wheel. With the use of parameters both in and out the procedures can be written to be context dependent, increasing their flexibility and usefulness.
  • As with the client/server contract between classes the calling program doesn’t have to know how the database is organised so it is easier to change this without breaking applications or spending a considerable amount of time tracking down where the data is used.
  • The procedure can be written to use the privileges of the user who creates it so enabling securer use of sensitive data which can be processed without being returned to the client and without the script’s users needing to have the level of privileges needed to access the data directly.

How to write a stored procedure is a tutorial subject in it’s own right, so I don’t propose to go into it here. The mysql documentation at http://dev.mysql.com/doc/refman/5.5/en/create-procedure.html is a useful starting point.

There are many different options for using MySQL with PHP; here I will give a quick example of calling a stored procedure from PHP using PDO, which is my preferred method. Suppose we had a stored procedure which takes an input string, does some clever stuff with some tables and then returns a single output value and a set of records from a select within the routine. In PHP you might see something like:

Getting the results of the select will look familiar, but getting the output value is a bit more complicated this makes use of MySQL’s user variables – denoted by @. This can be initialised beforehand with a SET query (SET @output := 0;). However using it without prior initialisation will initialise it to null and then the procedure populates it with a value. To get the contents of a user variable you need to select it.

Making the Most of MySQL

Although this has been a lightening tour, I hope I’ve given you a taste of the features in MySQL that I’ll be using in my applications in the future – and that I’ve whetted your appetite to find out more about them and to be more adventurous with MySQL. If there’s a particular feature that you think belongs on the list, or you can personally recommend a tutorial site, then leave us a comment and tell us what it is and why!