Skip to content

Processor Instructions

Ilhan Yumer edited this page May 16, 2022 · 7 revisions

The idea of processing instructions was introduced in the XML 1.0 specification and drifted into HTML via XHTML. A processor instruction allows a document to contain instructions that are intended to be read by a processor and replaced. The PHP language itself is an example of a processor instruction.

When it came to the HTML5-PHP library, we opted to include support of processor instructions because they can be fruitfully used on the server side. To remain complaint with HTML5, which does not allow processor instructions, you should make sure to remove them from the document before sending the document to a client.

That said, the HTML5-PHP library provides two ways of parsing processor instructions.

  1. (Default) Insert the processor instructions into the DOM
  2. Run the instructions through a processor (that you define) and put the results into the DOM.

Here's how those two modes work.

The Default Mode

Take the document:

<!DOCTYPE html>
<html>
  <?foo bar?>
</html>

The <?foo bar?> is a processor instruction. Processor instructions start with a <?, are followed with a node name (foo in this case), and close with a ?>.

When this is parsed using \HTML5::loadHTML() the processor instruction node will be one of \DOMProcessingInstruction with a nodeName property of foo and a data property of bar.

With a Custom InstructionProcessor

There is another way of handling processor instructions. You can process them at parse-time, and replace them in the DOM tree with DOM nodes. (In other words, you can "render" your processing instructions at parse time.) This section explains how that process works.

Using instruction processors

Processing instructions can be useful when we act on them. For example, manipulating the DOM. The instruction processor takes an instruction and acts on it. An instruction processor is defined by the interface \HTML5\InstructionProcessor with a single method of process. For example, let's create a dummy counter.

<?php

use \HTML5\InstructionProcessor

class foo implements InstructionProcessor {

  public $bar = 0;

  public function process(\DOMElement $element, $name, $data) {
    $this->bar++;

    return $element;
  }
}

This class is really simple. Every time there is a processor instruction a counter is incremented. The element for the processor instruction is returned. The returned element is what is attached to the DOM. If a processing instruction wants to be replaced with a different element, that element should be returned.

Passing the instruction processor to the parser

The instruction processor needs to be attached to the DOM tree builder to be used. To do this we need a custom parsing function. Because we already have the building blocks this is really quite simple.

function my_parser(\HTML5\Parser\InputStream $input) {

  // Create an instance of the processing instruction.
  $foo = new foo();
  $events = new DOMTreeBuilder();

  // Attach it to the event based DOM tree builder.
  $events->setInstructionProcessor($foo);

  $scanner = new Scanner($input);
  $parser = new Tokenizer($scanner, $events);
  $parser->parse();

  return $events->document();
}

To parse the document use my_parser instead of one of the built in parsers and the instruction processor will be called for each one.

For more details on how this works take a peak inside of \HTML5\Parser\DOMTreeBuilder.