Jump to content

API:Parsing wikitext

From mediawiki.org
(Redirected from API:Parse)

GET/POST request to parse content of a page and obtain the output.

API documentation

[edit]

action=parse

(main | parse)

Parses content and returns parser output.

See the various prop-modules of action=query to get information from the current version of a page.

There are several ways to specify the text to parse:

  1. Specify a page or revision, using page, pageid, or oldid.
  2. Specify content explicitly, using text, title, revid, and contentmodel.
  3. Specify only a summary to parse. prop should be given an empty value.
Specific parameters:
Other general parameters are available.
title

Title of page the text belongs to. If omitted, contentmodel must be specified, and API will be used as the title.

text

Text to parse. Use title or contentmodel to control the content model.

revid

Revision ID, for {{REVISIONID}} and similar variables.

Type: integer
summary

Summary to parse.

page

Parse the content of this page. Cannot be used together with text and title.

pageid

Parse the content of this page. Overrides page.

Type: integer
redirects

If page or pageid is set to a redirect, resolve it.

Type: boolean (details)
oldid

Parse the content of this revision. Overrides page and pageid.

Type: integer
prop

Which pieces of information to get:

text
Gives the parsed text of the wikitext.
langlinks
Gives the language links in the parsed wikitext.
categories
Gives the categories in the parsed wikitext.
categorieshtml
Gives the HTML version of the categories.
links
Gives the internal links in the parsed wikitext.
templates
Gives the templates in the parsed wikitext.
images
Gives the images in the parsed wikitext.
externallinks
Gives the external links in the parsed wikitext.
sections
Gives the sections in the parsed wikitext.
revid
Adds the revision ID of the parsed page.
displaytitle
Adds the title of the parsed wikitext.
subtitle
Adds the page subtitle for the parsed page.
headhtml
Gives parsed doctype, opening <html>, <head> element and opening <body> of the page.
modules
Gives the ResourceLoader modules used on the page. To load, use mw.loader.using(). Either jsconfigvars or encodedjsconfigvars must be requested jointly with modules.
jsconfigvars
Gives the JavaScript configuration variables specific to the page. To apply, use mw.config.set().
encodedjsconfigvars
Gives the JavaScript configuration variables specific to the page as a JSON string.
indicators
Gives the HTML of page status indicators used on the page.
iwlinks
Gives interwiki links in the parsed wikitext.
wikitext
Gives the original wikitext that was parsed.
properties
Gives various properties defined in the parsed wikitext.
limitreportdata
Gives the limit report in a structured way. Gives no data, when disablelimitreport is set.
limitreporthtml
Gives the HTML version of the limit report. Gives no data, when disablelimitreport is set.
parsetree
The XML parse tree of revision content (requires content model wikitext)
parsewarnings
Gives the warnings that occurred while parsing content (as wikitext).
parsewarningshtml
Gives the warnings that occurred while parsing content (as HTML).
headitems
Deprecated. Gives items to put in the <head> of the page.
Values (separate with | or alternative): categories, categorieshtml, displaytitle, encodedjsconfigvars, externallinks, headhtml, images, indicators, iwlinks, jsconfigvars, langlinks, limitreportdata, limitreporthtml, links, modules, parsetree, parsewarnings, parsewarningshtml, properties, revid, sections, subtitle, templates, text, wikitext, headitems
Default: text|langlinks|categories|links|templates|images|externallinks|sections|revid|displaytitle|iwlinks|properties|parsewarnings
wrapoutputclass

CSS class to use to wrap the parser output.

Default: mw-parser-output
usearticle

Use the ArticleParserOptions hook to ensure the options used match those used for article page views

Type: boolean (details)
parsoid

Generate HTML conforming to the MediaWiki DOM spec using Parsoid.

Type: boolean (details)
pst

Do a pre-save transform on the input before parsing it. Only valid when used with text.

Type: boolean (details)
onlypst

Do a pre-save transform (PST) on the input, but don't parse it. Returns the same wikitext, after a PST has been applied. Only valid when used with text.

Type: boolean (details)
effectivelanglinks
Deprecated.

Includes language links supplied by extensions (for use with prop=langlinks).

Type: boolean (details)
section

Only parse the content of the section with this identifier.

When new, parse text and sectiontitle as if adding a new section to the page.

new is allowed only when specifying text.

sectiontitle

New section title when section is new.

Unlike page editing, this does not fall back to summary when omitted or empty.

disablepp
Deprecated.

Use disablelimitreport instead.

Type: boolean (details)
disablelimitreport

Omit the limit report ("NewPP limit report") from the parser output.

Type: boolean (details)
disableeditsection

Omit edit section links from the parser output.

Type: boolean (details)
disablestylededuplication

Do not deduplicate inline stylesheets in the parser output.

Type: boolean (details)
showstrategykeys

Whether to include internal merge strategy information in jsconfigvars.

Type: boolean (details)
generatexml
Deprecated.

Generate XML parse tree (requires content model wikitext; replaced by prop=parsetree).

Type: boolean (details)
preview

Parse in preview mode.

Type: boolean (details)
sectionpreview

Parse in section preview mode (enables preview mode too).

Type: boolean (details)
disabletoc

Omit table of contents in output.

Type: boolean (details)
useskin

Apply the selected skin to the parser output. May affect the following properties: text, langlinks, headitems, modules, jsconfigvars, indicators.

One of the following values: apioutput, authentication-popup, cologneblue, fallback, json, minerva, modern, monobook, timeless, vector, vector-2022
contentformat

Content serialization format used for the input text. Only valid when used with text.

One of the following values: application/json, application/octet-stream, application/unknown, application/x-binary, text/css, text/javascript, text/plain, text/unknown, text/x-wiki, unknown/unknown
contentmodel

Content model of the input text. If omitted, title must be specified, and default will be the model of the specified title. Only valid when used with text.

One of the following values: Chart.JsonConfig, GadgetDefinition, Json.JsonConfig, JsonSchema, Map.JsonConfig, MassMessageListContent, NewsletterContent, Scribunto, SecurePoll, Tabular.JsonConfig, css, flow-board, javascript, json, sanitized-css, text, translate-messagebundle, unknown, wikitext
mobileformat

Return parse output in a format suitable for mobile devices.

Type: boolean (details)
templatesandboxprefix

Template sandbox prefix, as with Special:TemplateSandbox.

Separate values with | or alternative.
Maximum number of values is 50 (500 for clients that are allowed higher limits).
templatesandboxtitle

Parse the page using templatesandboxtext in place of the contents of the page named here.

templatesandboxtext

Parse the page using this page content in place of the page named by templatesandboxtitle.

templatesandboxcontentmodel

Content model of templatesandboxtext.

One of the following values: Chart.JsonConfig, GadgetDefinition, Json.JsonConfig, JsonSchema, Map.JsonConfig, MassMessageListContent, NewsletterContent, Scribunto, SecurePoll, Tabular.JsonConfig, css, flow-board, javascript, json, sanitized-css, text, translate-messagebundle, unknown, wikitext
templatesandboxcontentformat

Content format of templatesandboxtext.

One of the following values: application/json, application/octet-stream, application/unknown, application/x-binary, text/css, text/javascript, text/plain, text/unknown, text/x-wiki, unknown/unknown


Example 1: Parse content of a page

[edit]

GET request

[edit]

Response

[edit]
{
    "parse": {
        "title": "Pet door",
        "pageid": 3276454,
        "revid": 852892138,
        "text": {
            "*": "<div class=\"mw-parser-output\"><div class=\"thumb tright\"><div class=\"thumbinner\" style=\"width:222px;\"><a href=\"/wiki/File:Doggy_door_exit.JPG\" class=\"image\"><img alt=\"\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/7/71/Doggy_door_exit.JPG/220px-Doggy_door_exit.JPG\" width=\"220\" height=\"165\" class=\"thumbimage\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/7/71/Doggy_door_exit.JPG/330px-Doggy_door_exit.JPG 1.5x, 
            ...
        }
    }
}

Sample code

[edit]

Python

[edit]
#!/usr/bin/python3

"""
    parse.py

    MediaWiki API Demos
    Demo of `Parse` module: Parse content of a page

    MIT License
"""

import requests

S = requests.Session()

URL = "https://en.wikipedia.org/w/api.php"

PARAMS = {
    "action": "parse",
    "page": "Pet door",
    "format": "json"
}

R = S.get(url=URL, params=PARAMS)
DATA = R.json()

print(DATA["parse"]["text"]["*"])

PHP

[edit]
<?php
/*
    parse.php

    MediaWiki API Demos
    Demo of `Parse` module: Parse content of a page

    MIT License
*/

$endPoint = "https://en.wikipedia.org/w/api.php";
$params = [
    "action" => "parse",
    "page" => "Pet door",
    "format" => "json"
];

$url = $endPoint . "?" . http_build_query( $params );

$ch = curl_init( $url );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
$output = curl_exec( $ch );
curl_close( $ch );

$result = json_decode( $output, true );

echo( $result["parse"]["text"]["*"] );

JavaScript

[edit]
/**
 * parse.js
 *
 * MediaWiki API Demos
 * Demo of `Parse` module: Parse content of a page
 *
 * MIT License
 */
 
const url = "https://en.wikipedia.org/w/api.php?" +
    new URLSearchParams({
        origin: "*",
        action: "parse",
        page: "Pet door",
        format: "json",
    });

try {
    const req = await fetch(url);
    const json = await req.json();
    console.log(json.parse.text["*"]);
} catch (e) {
    console.error(e);
}

MediaWiki JS

[edit]
/**
 * parse.js
 *
 * MediaWiki API Demos
 * Demo of `Parse` module: Parse content of a page
 * MIT License
 */

const params = {
	action: 'parse',
	page: 'Pet door',
	format: 'json'
};
const api = new mw.Api();

api.get(params).done(data => {
	console.log(data.parse.text['*']);
});

Example 2: Parse a section of a page and fetch its table data

[edit]

GET request

[edit]

Response

[edit]
Response
{
    "parse": {
        "title": "Wikipedia:Unusual articles/Places and infrastructure",
        "pageid": 38664530,
        "wikitext": {
            "*": "===Antarctica===\n<!--[[File:Grytviken church.jpg|thumb|150px|right|A little church in [[Grytviken]] in the [[Religion in Antarctica|Antarctic]].]]-->\n{| class=\"wikitable\"\n|-\n| '''[[Emilio Palma]]'''\n| An Argentine national who is the first person known to be born on the continent of Antarctica.\n|-\n| '''[[Scouting in the Antarctic]]'''\n| Always be prepared for glaciers and penguins.\n|}"
        }
    }
}

Sample code

[edit]
parse_wikitable.py
#!/usr/bin/python3

"""
    parse_wikitable.py

    MediaWiki Action API Code Samples
    Demo of `Parse` module: Parse a section of a page, fetch its table data and save
    it to a CSV file

    MIT license
"""

import csv
import requests

S = requests.Session()

URL = "https://en.wikipedia.org/w/api.php"

TITLE = "Wikipedia:Unusual_articles/Places_and_infrastructure"

PARAMS = {
    'action': "parse",
    'page': TITLE,
    'prop': 'wikitext',
    'section': 5,
    'format': "json"
}


def get_table():
    """ Parse a section of a page, fetch its table data and save it to a CSV file
    """
    res = S.get(url=URL, params=PARAMS)
    data = res.json()
    wikitext = data['parse']['wikitext']['*']
    lines = wikitext.split('|-')
    entries = []

    for line in lines:
        line = line.strip()
        if line.startswith("|"):
            table = line[2:].split('||')
            entry = table[0].split("|")[0].strip("'''[[]]\n"), table[0].split("|")[1].strip("\n")
            entries.append(entry)

    file = open("places_and_infrastructure.csv", "w")
    writer = csv.writer(file)
    writer.writerows(entries)
    file.close()

if __name__ == '__main__':
    get_table()

Possible errors

[edit]
Code Info
missingtitle The page you specified doesn't exist.
nosuchsection There is no section section in page.
pagecannotexist Namespace doesn't allow actual pages.
invalidparammix
  • The parameters page, pageid, oldid, text can not be used together.
  • The parameters page, pageid, oldid, title can not be used together.
  • The parameters page, pageid, oldid, revid can not be used together.


Parameter history

[edit]
  • v1.38: Introduced showstrategykeys
  • v1.32: Deprecated disabletidy
  • v1.31: Introduced disablestylededuplication
  • v1.30: Introduced revid, useskin, wrapoutputclass

See also

[edit]