Wikidata Query Service/Implementation/Standalone

From mediawiki.org

WDQS can be run as a service for any Wikibase instance, not just Wikidata. You can still follow the instructions in the documentation, with the changes described below.

To generate the dump of your database, use dumpRdf.php script in the repo/maintenance directory of Wikibase extension. Depending on your requirements, you may still want to run munge.sh script, or you may load the resulting RDF directly into the database.

For development, you may also consider using the Docker-based setup at https://github.com/wmde/wikibase-release-pipeline .

Note that Blazegraph and Updater require a significant amount of memory to run, so it is recommended if running on a VM (or VM-like setup like Docker) or other memory-restricted environment to allocate enough memory; 4 to 8G should be a good guideline.

Required setup[edit]

So far, the following conditions should be fulfilled by Wikibase instance for WDQS to work properly: Given Wikibase install top URL as WIKIBASE_URL,

  • RecentChanges API should be accessible at WIKIBASE_URL/w/api.php
  • Entity data dump should be accessible at WIKIBASE_URL/wiki/Special:EntityData/Q123.ttl for entity Q123.

If your Wikibase instance has different URL scheme, the recommended way is to create web server redirects for these two, although these parts will be customizable as of WDQS 0.3.69 (via --apiPath /w/api.php and --entityDataPath /wiki/Special:EntityData/). See below about the URL customization.

You can also separately set Wikibase entity concept URL. The assumptions are, given base URL as CONCEPT_URL:

  • The entity prefix is CONCEPT_URL/entity/
  • The data prefix is CONCEPT_URL/wiki/Special:EntityData/

You can verify those looking at wd: and wdata: prefixes in the entity dump URL above, e.g. https://www.wikidata.org/wiki/Special:EntityData/Q4.ttl

WDQS Configurations[edit]

Two main things you may need to change are Wikibase endpoint (the URL at which your Wikibase instance is accessible) and concept URI (the URI which prefixes the RDF URIs describing data in your instance). Note that these by default are related but are controlled independently, and do not have to match. By default both settings are set up to match Wikidata data.

If you're running a copy of Wikidata but on your own domain, you may need to change Wikibase endpoint. If you are running your own dataset, you also need to change concept URI.

Setting Wikibase endpoint[edit]

This setting controls the URL at which Wikibase instance is found. See above for the list of URLs that are expected to work relative to this URL.

For Updater:

  • Use --wikibaseUrl URL option when running Updater to set up Wikibase URL.

For Munger:

  • No changes are needed since Munger does not communicate with Wikibase

For Blazegraph:

  • No changes are needed since Blazegraph does not communicate with Wikibase

Setting concept URI[edit]

For Munger:

  • Use --conceptUri URL option when running Munger. The rules for the URL are the same as for Update above.

Example:

bash munge.sh -f mydump.ttl.gz -d data/split -- --conceptUri https://my-wikibase:8081

For Updater:

  • Use --conceptUri URL option when running Updater. The URL should match the one seen in the TTL export in wd: prefix, e.g. if the prefix is defined as: @prefix wd: <http://test.wikidata.org/entity/> . then the URL will be http://test.wikidata.org Example:
bash runUpdate.sh -- --wikibaseUrl https://my-wikibase:8081 --conceptUri https://my-wikibase:8081

For Blazegraph:

  • Set wikibaseConceptUri Java property when running Blazegraph. If you only change hostname, you can use wikibaseHost instead. Example:
BLAZEGRAPH_OPTS="-DwikibaseConceptUri=https://my-wikibase:8081" bash ./runBlazegraph.sh
BLAZEGRAPH_OPTS="-DwikibaseHost=www.my-wikibasehost.org" bash ./runBlazegraph.sh

Setting entity namespaces[edit]

For Updater:

The updater looks for changes in namespaces 0 and 120 by default, which are the Item and Property namespaces on Wikidata. In a default Wikibase installation, Item and Property are instead namespaces 120 and 122. If your installation follows this setup, add the option --entityNamespaces 120,122 when running the updater. (If you have other entity namespaces, e. g. for lexicographical data, make sure to add them to the list.)

GUI configurations[edit]

In order to configure GUI, you have to use source (not built/minimized) version of GUI: https://github.com/wikimedia/wikidata-query-gui

In the wikibase/config.js file there are two settings you may want to change:

  • api.sparql.uri: The URL of the SPARQL endpoint (can be relative to the GUI main URL)
  • api.wikibase.uri: The URL of the Wikidata API endpoint (including path, e.g. https://www.wikidata.org/w/api.php.