Jump to content

Parsoid/API

From mediawiki.org
On Wikimedia wikis, Parsoid's API is not accessible on the public Internet. On these wikis, you can access Parsoid's content via RESTBase's REST API (e.g.: https://en.wikipedia.org/api/rest_v1/ ).

Parsoid provides the following REST API endpoints to Parsoid's clients to convert MediaWiki's Wikitext to XHTML5 + RDFa and back.

Common HTTP headers supported in all entry points

[edit]
Accept-encoding
Please accept gzip.
Cookie
Cookie header that will be forwarded to the Mediawiki API. Makes it possible to use Parsoid with private wikis. Setting a cookie implicitly disables all caching for security reasons, so do not send a cookie for public wikis if you care about caching.
x-request-id
A request id that will be forward to the Mediawiki Api.

v3 API

[edit]

Common path parameters across all requests

[edit]
domain
The hostname of the wiki.
title
Page title -- needs to be urlencoded (percent encoded).
revision
Revision id of the title.
format
Input / output format of content - wikitext, html, or pagebundle
wikitext
Plain text that is treated as wikitext. Content type is text/plain.
html
Parsoid's XHTML5 + RDFa output, which includes inlined data-parsoid attributes. The HTML conforms to the MediaWiki DOM spec. Content type is text/html.
pagebundle
A JSON blob containing the above html with the data-parsoid attributes split out and ids added to each node. Content type is application/json.

Pagebundle blobs have the form,

{
  "html": {
    "headers": {
      "content-type": "text/html;profile='mediawiki.org/specs/html/1.0.0'"
    },
    "body": "<!DOCTYPE html> ... </html>"
  },
  "data-parsoid": {
    "headers": {
      "content-type": "application/json;profile='mediawiki.org/specs/data-parsoid/0.0.1'"
    },
    "body": {
      "counter": n,
      "ids": { ... }
    }
  }
}

Common payload / querystring parameters across all formats

[edit]

For wikitext -> HTML requests

[edit]
body_only
Optional boolean flag, only return the HTML body.innerHTML instead of a full document.

For HTML -> wikitext requests

[edit]
scrub_wikitext
Optional boolean flag, which normalizes the DOM to yield cleaner wikitext than might otherwise be generated.

GET

[edit]

Wikitext -> HTML

[edit]

GET /:domain/v3/page/:format/:title/:revision?

revision
Revision is optional, however GET requests without a revision id should be considered a convenience method. If no revision id is provided, it'll redirect to the latest revision.
format
One of html or pagebundle

Some querystring parameters are also accepted: body_only

POST

[edit]

The content type for the POST payload can be: application/x-www-form-urlencoded, application/json, or multipart/form-data

Wikitext -> HTML

[edit]

POST /:domain/v3/transform/:from/to/:format/:title?/:revision?

from
wikitext
format
One of html or pagebundle

The payload can contain,

{
  "wikitext": "...",  // if omitted, a title is required to fetch wt source
  "body_only": true,  // optional
  "original": {
    "title": "...",  // optional, and instead of in the path
    "revid": n,  // optional, and instead of in the path
  }
}

Some other fields exist (including previous for expansion reuse). See Parsoid's API test suite for their use.

HTML -> Wikitext

[edit]

POST /:domain/v3/transform/:from/to/:format/:title?/:revision?

from
One of html or pagebundle
format
wikitext

The payload can contain,

{
  "html": "...",
  "scrub_wikitext": true,  // optional
  "original": {
    "title": "...",  // optional, and instead of in the path
    "revid": n,  // optional, and instead of in the path
    "wikitext": "...",  // optional, but the following three provide original data used in the selective serialization strategy
    "html": "...",
    "data-parsoid": { ... }
  }
}

Parsoid serializes HTML to a normalized form of wikitext. In order to avoid "dirty diffs" (differences outside the edited region of content) when serializing HTML generated from a given wikitext source, pass in the revision (either as revision in the path or original.revid in the payload) and optionally (as an optimization, because Parsoid will fetch / generate them if they're missing) the source, original.wikitext, and unedited html, (original.html and original['data-parsoid']). This strategy is known as "selective serialization"; an example of which can be seen in the test suite.

HTML -> HTML

[edit]

POST /:domain/v3/transform/pagebundle/to/pagebundle/:title?/:revision?

Parsoid exposes an API which transforms Parsoid-format HTML (encapsulated as a page bundle) to itself, performing a number of possible transformations. T114413 discusses some of the transformations, both actual and potential.

The payload is of the form:

{
  original: {
    html: {
      headers: {
        'content-type': 'text/html; charset=utf-8; profile="https://mediawiki.org/wiki/Specs/DOM/1.2.1"'
      },
      body: '<html>...</html>'
    }
  },
  updates: {
    transclusions: ...,
    media: ...,   // Could specific the exact image to update later.
    redlinks: { ... },
    variant: { ... }
  }
}

The original field is a pagebundle blob, as described above.

The updates field specifies the desired transformations, which are described in more detail below.

[edit]

XXX: write me

Variant
[edit]

See T43716.

XXX: write me

Content up/downgrade
[edit]

XXX: write me

Wikitext -> Lint

[edit]

POST /:domain/v3/transform/wikitext/to/lint/:title?/:revision?

Parsoid also exposes an API to get wikitext "syntax" errors for a given page, revision or wikitext.

The payload can contain:

{
  "wikitext": "...",  // if omitted, a title or revision is required to fetch lint errors
}

Examples

[edit]

For more intricate examples, see Parsoid's API test suite.

Wikitext -> HTML

[edit]

GET

[edit]

Some simple GET requests to a Parsoid HTTP server bound to localhost:8000.

http://localhost:8000/en.wikipedia.org/v3/page/html/User:Arlolra%2Fsandbox/696653152

Returns text/html

http://localhost:8000/en.wikipedia.org/v3/page/pagebundle/User:Arlolra%2Fsandbox/696653152?body_only=true

Returns application/json

POST

[edit]

POSTing the following blob,

{
  "wikitext": "== h2 =="
}

to,

http://localhost:8000/localhost/v3/transform/wikitext/to/html/

returns,

<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/"><head ...>...</head><body data-parsoid='{"dsr":[0,8,0,0]}' lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body mw-body-content mediawiki" dir="ltr"><h2 data-parsoid='{"dsr":[0,8,2,2]}'> h2 </h2></body></html>

HTML -> Wikitext

[edit]

POST

[edit]

POSTing the following blob,

{
  "html": "<html><body>foo <b>bar</b></body></html>"
}

to http://localhost:8000/localhost/v3/transform/html/to/wikitext/ returns

foo '''bar'''

Wikitext -> Lint

[edit]

POST

[edit]

POSTing the following blob

{
  "wikitext": "<div/>"
}

to http://localhost:8000/localhost/v3/transform/wikitext/to/lint returns

[
  {
    "type": "self-closed-tag",
    "params": {
      "name": "div"
    },
    "dsr": [
      0,
      6,
      6,
      0
    ]
  }
]

Using CURL, this works well, replace "LinterTest" with the appropriate wikipage and this will go to the most recent version using the -L follow redirect option

$ curl -X POST http://localhost:8080/rest.php/localhost/v3/transform/wikitext/to/lint/LinterTest -L -H "Content-Type: application/x-www-form-urlencoded" -d ""

Produces:

[{"type":"misnested-tag","dsr":[21,33,3,0],"params":{"name":"i"}},{"type":"misnested-tag","dsr":[78,90,3,0],"params":{"name":"i"}},{"type":"obsolete-tag" ...

Content Negotiation

[edit]
Accept
text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/HTML/2.0.0"

When making a parse requests (wikitext->HTML), passing an Accept header defining an acceptable spec version will induce Parsoid to return HTML that satisfies that version, following Semantic Versioning caret semantics, or error with a 406 status code.

Older entry points

[edit]

These versions have been deprecated.