Jump to content

Dodo

From mediawiki.org

Dodo is a pure PHP implementation of the HTML DOM, based on a rigorous PHP binding of the WHATWG WebIDL specification of the DOM API. It aims to be modern and correct, and where possible complete. Like the domino JavaScript library it was inspired by, it is intended for server-side DOM manipulation and so deliberately avoids implementing the portions of the DOM dealing with dynamic loading, browser layout, and other similar features. It does aim to be fast.

Installation

In MediaWiki

Dodo will become available in MediaWiki as a composer dependency of wikimedia/parsoid when it is mature.

Over time it is expected to gradually replace all usage of the native PHP DOM extension, which is buggy, ill-maintained, and out-of-date.

Everywhere else

Install the wikimedia/dodo package from Packagist:

composer require wikimedia/dodo

Semantic versioning is used.

The major version number will be incremented for every change that breaks backwards compatibility.

Architecture overview

For full reference documentation, please see the documentation generated from the source (or the source itself)

The API implemented by Dodo is mostly defined by the PHP binding for WebIDL. This is described in the IDLeDOM documentation.

Examples

use Wikimedia\Dodo\DOMImplementation;

function demo( DOMImplementation $impl ) {
    $doc = $impl->createHTMLDocument( "Test document" );
    $doc->getBody()->setInnerHTML( "<p>Look at me!</p>" );
    // Direct property access style is also supported,
    // although it will be slower
    return $doc->body->innerHTML;
}

In the above code sample, we first construct an HTML document with the given title, then parse an HTML string in order to populate the <body> element of the document. In order to demonstrate how property-style access is supported, we then re-serialize the body and return it.

Performance

Dodo has not yet been fully benchmarked, but we hope it will be competitive with the native PHP DOM extension.

There are two aspects of performance: memory usage and speed.

In order to minimize memory usage, the number of fields in each DOM Node has been minimized wherever possible. Fields like Node::nodeType and Node::nodeName are not actually stored in the DOM Node, but implemented via dynamic dispatch based on the type of the object.

For speed, Dodo uses a fairly common optimization that represents node children in a linked list (a circular linked list, in particular) and avoids creating the backing arrays required by the spec (eg, to implement Node::childNodes) unless they are requested. Writing your code to iterate using Node::firstChild and Node::nextSibling instead of iterating over the Node::childNodes array will be fastest (as it is in most browser DOM implementations as well).

In order to achieve maximum performance, the getters and setters required by the DOM spec are implemented as explicit methods, for example the nodeType property is accessed by Node::getNodeType(). The complex getter/setter behavior required by the DOM can't be implemented directly with PHP properties, but we do support property-style access (eg $node->nodeType) via the magic methods __get and __set. These impose a performance penalty, however, so for best performance client code will use explicit calls to the getter and setter methods rather than property-style access.

See also