With the growing use of continuous integration and static analysis tools hooked into those processes, there is a particular analysis tool that is very useful but rarely mentioned: phploc, the PHP Lines of Code tool. This has been a feature of PHPUnit for some time but has been released as a separate project in the phpunit pear channel. The nature of PHPUnit means that many of these statistics can be collected while the tests are running, which is why it was added to that tool in the first instance.
The phploc project does more than just counting lines of code, it counts a whole selection of features of a codebase and provides these as a report. As an example I ran the tool over a wordpress installation, with this command:
This gives the following output:
phploc 1.3.2 by Sebastian Bergmann. Directories: 29 Files: 295 Lines of Code (LOC): 138661 Cyclomatic Complexity / Lines of Code: 0.19 Comment Lines of Code (CLOC): 43498 Non-Comment Lines of Code (NCLOC): 95163 Interfaces: 0 Classes: 168 Abstract: 0 (0.00%) Concrete: 168 (100.00%) Lines of Code / Number of Classes: 377 Methods: 1973 Scope: Non-Static: 1972 (99.95%) Static: 1 (0.05%) Visibility: Public: 1964 (99.54%) Non-Public: 9 (0.46%) Lines of Code / Number of Methods: 32 Cyclomatic Complexity / Number of Methods: 5.44 Functions: 1599 Constants: 272 Global constants: 272 Class constants: 0
Straight away we can start to form some impressions about this code. It uses OOP, since there are classes. But look closers, and we see that its not super-theoretical OOP with lots of complicated inheritance since neither abstract classes or interfaces are declared (although the non-public declarations show that some PHP 5 feature are in use). I was particularly impressed by the averages they include, for example giving me a feel for how big their methods are. It is also useful to see how many lines of comment there are in comparison to the number of lines; it seems quite generous on this project but that's definitely a positive feature of a publicly-released codebase. There's also a complexity measure - the number as it is means very little to me but I'm sure if this tool was used against a few familiar projects, I'd soon get a feel for what the various values indicate.
Using static analysis tools like these can tell us a lot about the topology of a software project, and it can be interesting to watch how the numbers change over time which is what makes them such useful inclusions in regularly-run batches, such as a continuous integration setup. Having an idea of what your project looks like, and what that means, will help you to understand the project moving forward.