Murray Picton

Building a PHPDoc parser in PHP

This is the second in a series of posts about how I built Doqumentor – The Runtime PHP Documentor. If you haven’t already, check out the first how I used the PHP Reflection API to document your code.

So, the next step in the process of building Doqumentor was to build the PHPDoc parser to parse the comments in my code and provide a useful result that is easy to use and provides the functionality required. First of all, I think we should go over what PHPDoc comments are:

/**
* PHPDoc parser for use in Doqumentor
*
* Simple example usage:
* $a = new Parser($string);
* $a->parse();
*
* @author Murray Picton
* @copyright 2010 Murray Picton
*/

What is a PHPDoc comment?

PHPDoc comments are defined in the PHPDocumentor website and can be found here. A PHPDoc comment block always starts /** and ends */, between these items is the comment that can be parsed in several ways. The comment block then either contains comments or variables, variables always start with an @.

First step is to retrieve my comment from my PHP file using the reflection API as described in my previous post. We do this in the following way:

$item->getDocComment(); //$item = a instance of the reflection API that supports this method

This returns a string of our comment including the /** start and */ end tags.

Getting the lines from the comment

Now I need to remove the *s for the start of each line and parse each line to get the variables out. I do this in the following way:

//Get the comment
if(preg_match('#^/\*\*(.*)\*/#s', $this->string, $comment) === false)
	die("Error");

$comment = trim($comment[1]);

//Get all the lines and strip the * from the first character
if(preg_match_all('#^\s*\*(.*)#m', $comment, $lines) === false)
	die('Error');

This should make sense to most programmers, I am not going to go through the regex of each of these items but simply, the whole section firstly strips the /** from the start and */ from the end of the comment, I then strip the whitespace and finally get out each line into the $lines array and removed the * from the start of the line.

Saving and parsing the variables

So, now we have our lines in an array, we need to loop through them and get out all the variables and save them away:

private function parseLine($line) {

	//Trim the whitespace from the line
	$line = trim($line);

	if(empty($line)) return false; //Empty line

	if(strpos($line, '@') === 0) {
		$param = substr($line, 1, strpos($line, ' ') - 1); //Get the parameter name
		$value = substr($line, strlen($param) + 2); //Get the value
		if($this->setParam($param, $value)) return false; //Parse the line and return false if the parameter is valid
	}

	return $line;
}

This is the function I use to parse each line. First of all, I trim the whitespace from the beginning and end of the line. Next, I simply check if the line starts with a @, if it does, it is a variable and I need to work out which variable I am setting. If it doesn’t then it is a standard part of the comment and we will see how I parse this later on. Once I know this is a variable, I get the parameter name by getting a substring from the 2nd character to the first space, I can then get the value by getting a substring of the rest of the line. Finally, I set the variable within my class.

Parsing the rest of the comment

The rest of the comment is parsed very simply. The first part of the comment is the short comment, this is followed by an empty line and then the full comment is found. To parse this, we quite simply need to check for a blank line when looping through the comment lines. This is done like this:

private function parseLines($lines) {
	foreach($lines as $line) {
		$parsedLine = $this->parseLine($line); //Parse the line

		if($parsedLine === false && empty($this->shortDesc)) {
			$this->shortDesc = implode(PHP_EOL, $desc); //Store the first line in the short description
			$desc = array();
		} elseif($parsedLine !== false) {
			$desc[] = $parsedLine; //Store the line in the long description
		}
	}
	$this->longDesc = implode(PHP_EOL, $desc);
}

My parseLine function returns false the line is empty. This means that I know the description that has been stored up until this point is the short description and everything after that can be stored in the long description. The rest of this function should make sense.

If you want to know more about my parsing class, it is available to download from Doqumentor – The Runtime PHP Documentor, I have released it under the GPL2 licence so it is available for use and improvement as you require. If you have liked this post, please subscribe to my RSS feed and let me know if you have any comments. Thanks for reading!