Wikidata Query Service/User Manual/MWAPI
MediaWiki API Query Service
[edit]MediaWiki API Service allows to call out to MediaWiki API from SPARQL, and receive the results from inside the SPARQL query.
Service
[edit]The query is initiated by SERVICE with URL wikibase:mwapi
.
Input
[edit]Query inputs are specified as service parameters, in a clause such as this:
bd:serviceParam mwapi:language "en" .
This defines the parameter "language
" in the query.
There are two special parameters, api
and endpoint
, which are mandatory and identify the endpoint host (wiki) and the specific API used. The API endpoints are pre-defined in configuration; see below.
On the Wikidata Query Service, all Wikimedia sites are allowed (all Wikipedia language editions, Wikimedia Commons, Wikidata itself, etc.).
Example:
bd:serviceParam wikibase:api "EntitySearch" .
bd:serviceParam wikibase:endpoint "www.wikidata.org" .
The rest of the parameters use predicates starting with mwapi:
and the name is the name of the parameter in the query, e.g. mwapi:srsearch
. The value is the value that will be used in the query; it can be a constant or a variable. If the variable is not bound when the query is run, the parameter is skipped or, if it is required (as defined by the configuration), an error occurs.
It is permissible to add input parameters not specified in the configuration; they will be passed to the service query. Please refer to the API documentation for the parameters each service has. The configuration also defines some default parameters. If you override them in the service parameters, you may want to merge the default value with your desired value to ensure the output looks as expected by MWAPI (e. g. use mwapi:ppprop "wikibase_item|page_image_free"
instead of just mwapi:ppprop "page_image_free"
).
Pagination
[edit]By default, WDQS will try to fetch all the results of the query, with continuations. You can set the page size by using service-specific limit parameters, but the service will use continuation to fetch the following pages. This can be controlled by the wikibase:limit
parameter, which will limit total number of rows fetched in one API request:
bd:serviceParam wikibase:limit 50 .
This will fetch only 50 rows, regardless of pagination settings. Note that SPARQL LIMIT
clause will cause the same effect, but will be applied only after the API call has finished - all the data has been fetched and then the extra data has been discarded. It is more efficient to use the internal limit.
A special value of "once" disables the continuation mechanism:
bd:serviceParam wikibase:limit "once" .
This makes the service to stop after the first API call and not use continuations. The number of results depends on service parameters.
There is a hard-coded limit of 10000 rows which always applies.
Two additional limit parameters exist to fine-tune some requests (see the original Gerrit patch):
bd:serviceParam wikibase:limitContinuations "3" .
to limit the number of internally requested continuations (by default 1000 continuations, hard-coded limit of 1000);bd:serviceParam wikibase:limitEmptyContinuations "200" .
to limit the number of empty consecutive continuations (when N empty consecutive continuations is reached, the global request to MediaWiki API is ended) (by default 25 empty continuations, hard-coded limit of 1000).
Output
[edit]MediaWiki API services are asked to return the results in XML, from which they are extracted to output parameters using XPath. There are two ways to specify output parameters: by referring to the configured parameter and by specifying XPath directly.
The configured output parameter could be used like this:
?title wikibase:apiOutput mwapi:title .
This results in the variable ?title
being bound to the XPath result defined by the "title
" output variable. Note that the configured output parameters expect certain input parameters to be present, and if you override the configured defaults the output parameters may break: for example, the configuration specifies a default parameter of prop=info|pageprops
, and if you override it with mwapi:prop "revisions"
, then the mwapi:lastrevid
output parameter will no longer be able to get the lastrevid
from the response. (You would fix this by looking up the default prop
in the configuration and then changing your query to use mwapi:prop "info|pageprops|revisions"
.)
The direct XPath expression is used like this:
?ns wikibase:apiOutput "@ns" .
In this case, the object of the triple is an XPath string and not mwapi:
URI.
In both cases, XPath is evaluated relative to item path, specified in service configuration. There's no difference in XPath evaluation for both forms.
The predicates that can be used are wikibase:apiOutput
, wikibase:apiOutputItem
and wikibase:apiOutputURI
. The first form results in a string taken literally; the second interprets the string as Item ID and constructs a full item URI; and the third interprets the string as an URI and constructs an URI value.
Predicate wikibase:apiOrdinal
allows to define variable that would be set to the number of the result in the original API output. Note that due to the multithreaded nature of the SPARQL engine the results are not necessarily returned in the order in which they were received by the API, which may be useful for queries like search where result order is important. Using wikibase:apiOrdinal
will allow to recover the original order. The object of the triple is currently ignored.
?num wikibase:apiOrdinal true.
Supported services
[edit]Currently the following services are supported:
Service | Inputs | Outputs | Description |
---|---|---|---|
Generator | generator, prop, pprop | title, item, pageid, lastrevid, timestamp | Call any generator API. Use "generator" parameter to specify, and specific generator parameters to further amend the search (see the example below). |
Categories | titles,cllimit | category, title | Get a list of categories on the page. |
Search | srsearch,srwhat,srlimit | title | Full-text search in wiki. |
EntitySearch | search,language,type,limit | item,label | Wikibase entity search, by title. |
Required parameters are in bold. Other input parameters may also be supported – please refer to the service documentation (linked in Service column) for supported API parameters and their meaning.
The full list of services can be seen in the production config file.
Configuration
[edit]The service is configured by a JSON file listing particular API configurations and a list of endpoints. Please see the example configuration.
On the top level, there is a list of endpoint hostnames under "endpoints
" and API configs under "services
", with the keys being service names for use with wikibase:api
.
Only endpoints listed in endpoints
are allowed. The list can also contain prefixes - i.e. .wikipedia.org
allows any hostname that ends with .wikipedia.org
.
Individual service configuration
[edit]The service configuration has two required elements - params
for input parameters and output
for output parameters. All the rest are currently ignored by the code, but @note
and @docs
are used to store description and link to documentation about particular API. It is safe to assume that config entries starting with @ will be always ignored by the code.
Input
[edit]Input parameters are keyed by parameter name, and the value can be either constant - in which case the parameter is fixed, always present and its value can not be changed, or object, which contains description of variable parameter.
Variable parameter is bound by the query template or variable. The configuration specifies its type, which is currently ignored but may be enforced later. The configuration can also specify the default
, which will be used in case the parameter is not specified or not bound. If the default is "",
the parameter will be omitted in case it is not specified or not bound. If no default is specified, the parameter is treated as required and omitting it will result in an error.
Output
[edit]Output is defined by items
, which specifies the XPath for extracting specific result elements for the query, and set of output variables under vars
. Each key in vars
is variable name (to be used with wikibase:apiOutput
) and the value is XPath to this particular value, calculated relative to the items path. Note that it is possible to also specify absolute paths for specific variables, but in that case you will get copies of that value for each result set.
Examples
[edit]Find all entities with labels "cheese" and get their types
[edit]- Properties used: subclass of (P279), instance of (P31)
SELECT * WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "www.wikidata.org";
wikibase:api "EntitySearch";
mwapi:search "cheese";
mwapi:language "en".
?item wikibase:apiOutputItem mwapi:item.
?num wikibase:apiOrdinal true.
}
?item (wdt:P279|wdt:P31) ?type
} ORDER BY ASC(?num) LIMIT 20
Find articles in Wikipedia
[edit]SELECT * WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "en.wikipedia.org";
wikibase:api "Search";
mwapi:srsearch "cheese".
?title wikibase:apiOutput mwapi:title.
}
} LIMIT 20
Find articles in Wikipedia speaking about cheese and see which Wikibase items they correspond to
[edit]- Properties used: instance of (P31), subclass of (P279)
SELECT ?item ?itemLabel ?type ?typeLabel WHERE {
{
SELECT ?item WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "en.wikipedia.org";
wikibase:api "Generator";
mwapi:generator "search";
mwapi:gsrsearch "cheese";
mwapi:gsrlimit "max".
?item wikibase:apiOutputItem mwapi:item .
}
} LIMIT 100
}
hint:Prior hint:runFirst "true".
?item wdt:P31|wdt:P279 ?type.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 100
Find category members
[edit]- Items used: parking lot (Q6501349)
- Properties used: topic's main category (P910)
SELECT * WHERE {
wd:Q6501349 wdt:P910 ?category .
?link schema:about ?category; schema:isPartOf <https://en.wikipedia.org/>; schema:name ?title .
SERVICE wikibase:mwapi {
# in
bd:serviceParam wikibase:endpoint "en.wikipedia.org";
wikibase:api "Generator";
mwapi:generator "categorymembers";
mwapi:gcmtitle ?title;
mwapi:gcmprop "ids|title|type";
mwapi:gcmlimit "max".
# out
?member wikibase:apiOutput mwapi:title.
?ns wikibase:apiOutput "@ns".
?item wikibase:apiOutputItem mwapi:item.
}
# FILTER (?ns = "14")
}
Get categories
[edit]- Items used: Kaisaniemen puistokuja 6 (Q23040282)
- Properties used: image (P18)
SELECT * WHERE {
BIND(wd:Q23040282 AS ?item)
?item wdt:P18 ?image.
BIND(CONCAT("File:", STRAFTER(wikibase:decodeUri(STR(?image)), "http://commons.wikimedia.org/wiki/Special:FilePath/")) AS ?fileTitle)
# Query the MediaWiki API Query Service for
SERVICE wikibase:mwapi {
# Categories that contain these pages
bd:serviceParam wikibase:api "Categories";
wikibase:endpoint "commons.wikimedia.org";
wikibase:limit 50;
mwapi:titles ?fileTitle.
# Output the page title and category
?title wikibase:apiOutput mwapi:title.
?category wikibase:apiOutput mwapi:category .
}
}
Get imageusage
[edit]- Items used: Kaisaniemen puistokuja 6 (Q23040282)
- Properties used: image (P18)
SELECT * WHERE {
BIND(wd:Q23040282 AS ?item)
?item wdt:P18 ?image.
BIND(CONCAT("File:", STRAFTER(wikibase:decodeUri(STR(?image)), "http://commons.wikimedia.org/wiki/Special:FilePath/")) AS ?fileTitle)
# Query the MediaWiki API Query Service for
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "fi.wikipedia.org";
wikibase:api "Generator";
mwapi:generator "imageusage";
mwapi:giutitle ?fileTitle;
mwapi:gsrlimit "max".
?image_used_in_fiwiki wikibase:apiOutputItem mwapi:item .
}
}
Get imageinfo via Allpages generator
[edit]Copied from phab:T157798#5919648
- Items used: Douglas Adams (Q42)
- Properties used: image (P18)
SELECT * WHERE {
BIND(wd:Q42 AS ?item)
?item wdt:P18 ?image.
BIND(STRAFTER(wikibase:decodeUri(STR(?image)), "http://commons.wikimedia.org/wiki/Special:FilePath/") AS ?fileTitle)
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "commons.wikimedia.org";
wikibase:api "Generator";
wikibase:limit "once";
mwapi:generator "allpages";
mwapi:gapfrom ?fileTitle;
mwapi:gapnamespace 6; # NS_FILE
mwapi:gaplimit 1;
mwapi:prop "imageinfo";
mwapi:iiprop "dimensions".
?size wikibase:apiOutput "imageinfo/ii/@size".
?width wikibase:apiOutput "imageinfo/ii/@width".
?height wikibase:apiOutput "imageinfo/ii/@height".
}
}
Get all images uploaded by user
[edit]SELECT * WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "commons.wikimedia.org";
wikibase:api "Generator";
mwapi:generator "allimages";
mwapi:gailimit 1000;
mwapi:gaiuser "zache";
mwapi:gaisort "timestamp";
mwapi:prop "url".
?title wikibase:apiOutput mwapi:title.
?pageid wikibase:apiOutput mwapi:pageid.
}
BIND(URI(CONCAT("https://commons.wikimedia.org/entity/M", STR(?pageid))) as ?sdc_item)
}
Get wikicode of the revision via Allpages generator
[edit]- Items used: Douglas Adams (Q42)
- Properties used: image (P18)
SELECT * WHERE {
BIND(wd:Q42 AS ?item)
?item wdt:P18 ?image.
BIND(STRAFTER(wikibase:decodeUri(STR(?image)), "http://commons.wikimedia.org/wiki/Special:FilePath/") AS ?fileTitle)
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "commons.wikimedia.org";
wikibase:api "Generator";
wikibase:limit "once";
mwapi:generator "allpages";
mwapi:gapfrom ?fileTitle;
mwapi:gapnamespace 6; # NS_FILE
mwapi:gaplimit 1;
mwapi:prop "revisions";
mwapi:rvprop "content".
?contentmodel wikibase:apiOutput 'revisions/rev/@contentmodel'.
?contentformat wikibase:apiOutput 'revisions/rev/@contentformat'.
?content wikibase:apiOutput 'revisions/rev/text()' .
}
}
Get list of pages linked from wikipage
[edit]SELECT DISTINCT ?item ?title WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "en.wikipedia.org";
wikibase:api "Generator";
mwapi:generator "links";
mwapi:titles "List of bird species described in the 2020s";
mwapi:gplnamespace "0" ;
mwapi:redirects "1" ;
mwapi:gpllimit "max".
?item wikibase:apiOutputItem mwapi:item .
?title wikibase:apiOutput mwapi:title .
}
}
See also
[edit]- MW2SPARQL - experimental SPARQL endpoint for Wikimedia wikis working on top of mediawiki database.
- SPARQL/SERVICE : mwapi - Wikibooks tutorial page