Talk:Wikidata Query Service/User Manual/MWAPI

About this board

Document more Input params

8
Vladimir Alexiev (talkcontribs)

This page and Wikidata Query Service/User Manual#MediaWiki API give examples using the following params:

gsrsearch, gsrlimit, gcmprop, gcmlimit

Where are they documented?

The page says "It is permissible to add input parameters not specified in the configuration, they will be passed to the service query. Please refer to the API documentation for the lists of parameters each service has". I searched in API:Query#Generators and can't find them there.

It would be very useful for SPARQL devs to have a full list of params listed on this page, maybe with links to their definitions in the MW API page.

Smalyshev (WMF) (talkcontribs)

These are the same parameters you put in actual API request, e.g. when using API sandbox. There's no full list of parameters, because each API has its own parameters and those can be anything. So what I would suggest is using API tool - like API sandbox - first to assemble the API call and ensure it works properly, and then copy the parameter names/values from there to MWAPI call in WDQS.

Vladimir Alexiev (talkcontribs)

It would be very useful if you could illustrate finding info in the API sandbox. Eg I wanted to see the params for "Generator" but the sandbox field "action" doesn't have such choice. API:Query#Generators doesn't mention "gsrsearch".

Please make it easier for folk who know SPARQL but not MWAPI to use this exension. Thanks in advance!

120.21.201.69 (talkcontribs)

I agree with Vladimir - I found these through a random StackOverflow answer. Further attempts to understand how they work (or even what they are doing) have been fruitless, because seemingly there is zero - zip - nada - no documentation whatsoever. I can't even find the source code to read!

Mormegil (talkcontribs)
Vladimir Alexiev (talkcontribs)

I'm sure the interactive API and the automatically generated documentation are very useful as a reference, if you already know the MWAPI well. But they are hard to use if it's the first time you're trying MWAPI.

Mormegil (talkcontribs)

When you select action=query, a section for action=query appears in the main tree on the left. When you click the section, all available parameters for action=query are offered, and generator is among them. You can choose search in its dropdown; after that, the corresponding generator=search section appears in the main tree again. When you click on it, all parameters for generator=search appear, and among those, gsrsearch, gsrlimit. (gcmprop and gcmlimit are properties for generator=categorymembers, so, by choosing categorymembers for generator, and by clicking on generator=categorymembers which appears in the main tree, you’ll see offered those).

Vladimir Alexiev (talkcontribs)

Take a look at the Examples at the bottom of the last page:

Search for meaning.
api.php?action=query&list=search&srsearch=meaning [open in sandbox]
Search texts for meaning.
api.php?action=query&list=search&srwhat=text&srsearch=meaning [open in sandbox]
Get page info about the pages returned for a search for meaning.
api.php?action=query&generator=search&gsrsearch=meaning&prop=info [open in sandbox]

A novel user like me would have the following questions:

  • What's the difference between list=search and generator=search, and how do I know which to use in which case?
  • Why two links use srsearch but one uses gsrsearch, and how do I know which to use when?

What is even the difference between the 3 examples?

  • How is "search for" different from "search texts for"? The two examples return the same data
  • Ok, I get how the third call is different: it returns page title and metadata, not search hits. From this I can surmise that generator returns a list of pages, whereas list returns a list of search hits
Reply to "Document more Input params"

Text nodes in output

2
Mxn (talkcontribs)

Can this service handle output elements that contain text nodes? I'm struggling to get back any output for the pageviews property:

SELECT ?title ?wd ?pageviews WHERE {
  SERVICE wikibase:mwapi {
    bd:serviceParam wikibase:api "Generator" .
    bd:serviceParam wikibase:endpoint "en.wikipedia.org" .
    bd:serviceParam mwapi:titles "List of mountain peaks by prominence" .
    bd:serviceParam mwapi:generator "links" .
    bd:serviceParam mwapi:gplprop "ids|title|type" .
    bd:serviceParam mwapi:gpllimit "max" .
    bd:serviceParam mwapi:pvipmetric "pageviews" .
    bd:serviceParam mwapi:pvipdays "1" .
    bd:serviceParam wikibase:limit 50 .
    
    ?title wikibase:apiOutput mwapi:title.
    ?wd wikibase:apiOutputItem mwapi:item.
    ?pageviews wikibase:apiOutput "pageviews/pvip/text()".
  }
}

I've tried a variety of XPaths, even pageviews/pvip/@date, but the ?pageviews column always ends up empty.

Each item in the API response looks like this:

      <page _idx="220167" pageid="220167" ns="0" title="Aconcagua">
        <pageviews>
          <pvip date="2021-04-27" xml:space="preserve">1266</pvip>
        </pageviews>
      </page>
91.159.71.53 (talkcontribs)

This seems to work

SELECT ?title ?wd ?pageviews WHERE {
  SERVICE wikibase:mwapi {
    bd:serviceParam wikibase:api "Generator" .
    bd:serviceParam wikibase:endpoint "en.wikipedia.org" .
    bd:serviceParam mwapi:titles "List of mountain peaks by prominence" .
    bd:serviceParam mwapi:generator "links" .
    bd:serviceParam mwapi:gplprop "ids|title|type" .
    bd:serviceParam mwapi:gpllimit "max" .
    bd:serviceParam mwapi:prop "info|pageprops|pageviews" .
    bd:serviceParam mwapi:pvipdays "1" .
    bd:serviceParam wikibase:limit 50 .
    
    ?title wikibase:apiOutput mwapi:title.
    ?wd wikibase:apiOutputItem mwapi:item.
    ?pageviews wikibase:apiOutput "pageviews/pvip/text()".

   }
}
Reply to "Text nodes in output"
PAC2 (talkcontribs)

The following query get the number of items by gender in a Wikipedia article.

SELECT ?gender ?genderLabel (COUNT(?item) AS ?count) 
WHERE {
  SERVICE wikibase:mwapi {
     bd:serviceParam wikibase:endpoint "fr‧wikipedia.org"; 
                     wikibase:api "Generator";
                     mwapi:generator "links"; 
                     mwapi:titles "Sociologie";. 
     ?item wikibase:apiOutputItem mwapi:item.
  } 
  FILTER BOUND (?item)                                         # Safeguard to not get a timeout from unbound items when using ?item below
  ?item wdt:P21 ?gender .                                  
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }    
}
GROUP BY ?gender ?genderLabel

Try it!

I've been using thus query regularly this last two weeks and there was no bug but since yesterday night (March, 2nd 2021), this doesn't work anymore. Do you have any explanation for this behaviour change? PAC2 (talk) 20:18, 3 March 2021 (UTC)

Dipsacus fullonum (talkcontribs)

I doubt very much that this query have ever worked. The MWAPI endpoint is misspelled: ""fr‧wikipedia.org" instead of "fr.wikipedia.org"

Dipsacus fullonum (talkcontribs)

PS: I recommend using (COUNT(*) AS ?count) instead of (COUNT(?item) AS ?count). There is no need here to check that ?item is bound and that the value is error free which the latter version does. The change wont save much time here, but it is a good thing to remember to use when possible as it can save considerably time when counting large numbers.

Reply to "Mysterous bug"

Is it possible to get the list of articles created by a user in SPARQL

3
PAC2 (talkcontribs)

Following the examples in the page https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI#Examples, I try to get the list of articles created by a user inside SPARQL. I've tried the following request :

SELECT ?title WHERE {
  SERVICE wikibase:mwapi {
     bd:serviceParam wikibase:endpoint "fr‧wikipedia.org";
                     wikibase:api "Generator";
                     mwapi:generator "usercontribs"; 
                     mwapi:user "PAC2";
                     mwapi:show "new";.    
     ?title wikibase:apiOutput mwapi:title.
  } 
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }    
}

Try it!

But this doesn't work. Is it possible to do it or not ?

Dipsacus fullonum (talkcontribs)

As you can see at API:Query "usercontribs" is only available in API action=query calls as a list parameter, and not as a generator. List queries are not a directly supported service for MWAPI. But a possible hack is that it is possible to make a query API call with both a generator and a list section. The drawback is that only one result can be fetched (to the same variables) per API call because the output configuration will be for the generator part of the call.

There is a recent example of an MWAPI call with a combined generator and list query for usercontribs at https://www.wikidata.org/wiki/Wikidata:Request_a_query/Archive/2021/02#Wikidata_items_I_created

PAC2 (talkcontribs)

Thanks for the tip. That's mysterious but really powerful. PAC2 (talk) 06:56, 3 March 2021 (UTC)

Reply to "Is it possible to get the list of articles created by a user in SPARQL"

MWAPI source code or configuration

2
Zache (talkcontribs)

How the MWAPI is technically implemented? IE is it some Blazegraph extension or is there some externeal code in github etc?

Abbe98 (talkcontribs)
Reply to "MWAPI source code or configuration"

EntitySearch returns empty result

3
77.183.100.132 (talkcontribs)

Hi,

I tried to query wikidata with entietySearch and I get no result. A few weeks ago everything was working.

Also the first example in this article does not return any result.

Has anything changed or is this a temporary issue?

2607:FEA8:91E0:1170:C0A9:6CF7:3B40:995D (talkcontribs)

same here, entity search return empty. Even when one the examples on the article is used.

Kdutia (talkcontribs)
Reply to "EntitySearch returns empty result"

using MWAPI with wcqs-beta

3
Jarekt (talkcontribs)

Hi now that https://wcqs-beta.wmflabs.org is up and running I was experimenting with how to combine SDC SPARQL queries with information stored in SQL database like category membership, presence of specific templates, etc. I could not fine any way with exception of wikibase:mwapi service, I tried

SELECT  ?file ?wd ?fileStr {
  SERVICE wikibase:mwapi {
	 bd:serviceParam wikibase:api "Generator" .
     bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
     bd:serviceParam mwapi:gcmtitle "Category:Artworks with mismatching structured data P6243 property" .
     bd:serviceParam mwapi:generator "categorymembers" .
     bd:serviceParam mwapi:gcmtype "page" .
     bd:serviceParam mwapi:gcmlimit "max" .
     bd:serviceParam mwapi:gcmsort "timestamp" .
     ?pageid wikibase:apiOutputItem mwapi:pageid.
     ?ns     wikibase:apiOutput "@ns".
  }
  #?file schema:contentUrl ?url .
  FILTER (?ns = "6") # files only
  BIND (replace(str(?pageid),'http://www.wikidata.org/entity/','https://commons.wikimedia.org/entity/M')  as ?fileStr)
  BIND (str(?file)  as ?fileStr)
  ?file wdt:P6243 ?wd .
<nowiki>}</nowiki>

Try it!

but so far I did not managed to get it to work. I was thinking that since

SELECT  ?file ?wd ?fileStr {
  BIND (str(?file)  as ?fileStr)
  ?file wdt:P6243 ?wd .
} limit 10

Try it!

and

SELECT  ?fileStr {
  SERVICE wikibase:mwapi {
	 bd:serviceParam wikibase:api "Generator" .
     bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
     bd:serviceParam mwapi:gcmtitle "Category:Artworks with mismatching structured data P6243 property" .
     bd:serviceParam mwapi:generator "categorymembers" .
     bd:serviceParam mwapi:gcmtype "page" .
     bd:serviceParam mwapi:gcmlimit "max" .
     bd:serviceParam mwapi:gcmsort "timestamp" .
     ?pageid wikibase:apiOutputItem mwapi:pageid.
     ?ns     wikibase:apiOutput "@ns".
  }
  #?file schema:contentUrl ?url .
  FILTER (?ns = "6") # files only
  BIND (replace(str(?pageid),'http://www.wikidata.org/entity/','https://commons.wikimedia.org/entity/M')  as ?fileStr)
} limit 10

Try it!

both create ?fileStr like "https://commons.wikimedia.org/entity/M9094174" than I can combine them in order to query SDC statements within a category. Any idea how to get this to work?

Zache (talkcontribs)

I think that just converting the FileStr to URI should make it a proper M-item for SDC. However, my example query below is pretty slow so i think that it may needs to be splitted to two (like here).

SELECT  ?file ?p6243 {
  SERVICE wikibase:mwapi {
	 bd:serviceParam wikibase:api "Generator" .
     bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
     bd:serviceParam mwapi:gcmtitle "Category:Artworks with mismatching structured data P6243 property" .
     bd:serviceParam mwapi:generator "categorymembers" .
     bd:serviceParam mwapi:gcmtype "page" .
     bd:serviceParam mwapi:gcmlimit "max" .
     bd:serviceParam mwapi:gcmsort "timestamp" .
     ?pageid wikibase:apiOutputItem mwapi:pageid.
     ?ns     wikibase:apiOutput "@ns".
  }
  #?file schema:contentUrl ?url .
  FILTER (?ns = "6") # files only
  BIND (URI(replace(str(?pageid),'http://www.wikidata.org/entity/','https://commons.wikimedia.org/entity/M'))  as ?file)
  ?file wdt:P6243 ?p6243 
} limit 10

Try it!

Jarekt (talkcontribs)
Reply to "using MWAPI with wcqs-beta"

How I can read content of revision (wikitext) using MWAPI?

3
Summary by Zache

Solved and correct answer was updated to starting message

Zache (talkcontribs)

Hi, I tried to fetch revision like this, but i could not figure out how to access to actual content which should be under the key "*". Do you know how i should do that?

SOLVED: Example is now fixed based on answer below

SELECT * WHERE {
  BIND(wd:Q42 AS ?item)
  ?item wdt:P18 ?image.
  BIND(STRAFTER(wikibase:decodeUri(STR(?image)), "http://commons.wikimedia.org/wiki/Special:FilePath/") AS ?fileTitle)

  SERVICE wikibase:mwapi {
    bd:serviceParam wikibase:endpoint "commons.wikimedia.org";
                    wikibase:api "Generator";
                    wikibase:limit "once";
                    mwapi:generator "allpages";
                    mwapi:gapfrom ?fileTitle;
                    mwapi:gapnamespace 6; # NS_FILE
                    mwapi:gaplimit 1;
                    mwapi:prop "revisions";
                    mwapi:rvprop "content".
    ?contentmodel wikibase:apiOutput 'revisions/rev/@contentmodel'.
    ?contentformat wikibase:apiOutput 'revisions/rev/@contentformat'.
    ?content wikibase:apiOutput 'revisions/rev/text()' .
  }
}

Try it!

Dipsacus fullonum (talkcontribs)

There is no key "*". MWAPI request output in XML format from the API and uses the XPath query language to find the wanted elements in the XML output. The XML has the context as the text in a "rev" element that haves "revisions" as parent element, so you have to add the triple


  ?content wikibase:apiOutput 'revisions/rev/text()' .


to the "SERVICE wikibase:mwapi" call in your SPARQL query.

Zache (talkcontribs)

It worked! Thank you very much.

recursive category members?

2
SCIdude (talkcontribs)

Is there an example of how to get the WD items of all members of a WP category recursively? Other tools maybe?

Zache (talkcontribs)

You can do it with PetScan if you need just a list of wikidata id:s

1.) Select target categories and wiki in "Categories" tab 2.) Set "use wiki" value to "Wikidata" in "Other sources" tab so it will fetch the wikidata ids 3.) Select preferred format in "Output" tab

Example query - https://petscan.wmflabs.org/?psid=17439495

Reply to "recursive category members?"

read imageinfo metadata?

11
Jura1 (talkcontribs)

Can this be done? e.g. with

      bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
      bd:serviceParam wikibase:api "Generator" .
      bd:serviceParam mwapi:generator "imageinfo" .
      bd:serviceParam mwapi:gcmprop "metadata" .      
      bd:serviceParam mwapi:gcmtitle "File:Iphone 3GS grass.jpg" .
Smalyshev (WMF) (talkcontribs)

If API returns it as generator, then MWAPI service should support it.

Jura1 (talkcontribs)

Does that mean it already does (if queried correctly)

or it should do so in the future (once developed)?

Smalyshev (WMF) (talkcontribs)

Check out https://w.wiki/3p7 - is this something you've been looking for?


Smalyshev (WMF) (talkcontribs)

Unfortunately, looks like it's a bit tricky to extract metadata itself as it returns multiple values and current MWAPI syntax allows only single value per row (since SPARQL doesn't have arrays or any other structures).

Jura1 (talkcontribs)
Jura1 (talkcontribs)
Smalyshev (WMF) (talkcontribs)

Try this one: https://w.wiki/3sq

Smalyshev (WMF) (talkcontribs)
Jura1 (talkcontribs)

It seems to work, at least to get the two fields ().

I was trying to get just 1-5 files per category, but that part didn't quite work out.

Jura1 (talkcontribs)

Is there a way to get only 1-5 results for each category from the categorymembers-generator?

Reply to "read imageinfo metadata?"