Talk:Wikibase/Indexing
Add topicGremlin query examples
[edit]- Top 10 countries by population:
g.listOf('Q6256').filter{it.P1082 !=null}.order{it.b.P1082 as int <=> it.a.P1082 as int}.map('labelEn', 'P1082')[0..10]
- People born in a city with more than 100k inhabitants:
g.listOf('Q1549591').in('P19').filter({it.out('P31').has('wikibaseId', 'Q5').hasNext()})
- Largest 10 cities in Europe that have a female mayor:
g.listOf('Q515').filter{it.P1082 != null}.as('cities').out('P30').has('wikibaseId', 'Q46').back('cities').out('P6').out('P21').has('wikibaseId', 'Q6581072').back('cities').order{it.b.P1082 as int <=> it.a.P1082 as int}[0..10]
- Top 10 countries by population:
--Smalyshev (WMF) (talk) 22:57, 29 November 2014 (UTC)
Examples data model v2
[edit]Countries by population
[edit]g.listOf('Q6256').as('c').groupBy{it}{it.claimValues('P1082').preferred().latest()}.cap .scatter.filter{it.value.size()>0}.transform{it.value = it.value.P1082value.collect{it?it as int:0}.max(); it} .order{it.b.value <=> it.a.value}.transform{[it.key.wikibaseId, it.key.labelEn, it.value]}
List of occupations
[edit]tree[28640][][31,279] g.wd('Q28640').treeIn('P279').instances().dedup().namesList()
List of potential nationalities
[edit]Warning: big query, do not run unbounded.
tree[350][][17,131] AND (claim[31:6256] OR claim[31:15634554]) claim[31:5] AND claim[19] AND between[569,1750]
g.listOf('Q5').as('humans').claimValues('P569').filter{it.P569value != 'somevalue' && it.P569value > Date.parse('yyyy', '1750')} .back('humans').claimVertices('P19').toCountry().as('countries').select(['humans', 'countries']){it.labelEn}{it.labelEn}
People born before 1880 having no date of death
[edit]Warning: big query, do not run unbounded.
claim[31:5] AND noclaim[570] AND between[569,0,1880] g.listOf('Q5').as('humans').claimValues('P569').filter{it.P569value && it.P569value < DateUtils.fromYear(1880)} .back('humans').filter{!it.out('P570').hasNext()}[0..10]
instance of human; occupation writer; not occupation author
[edit]claim[31:5] AND claim[106:36180] AND noclaim[106:482980]
g.wd('Q36180').in('P106').has('P31link', CONTAINS, 'Q5').filter{!it.out('P106').has('wikibaseId', 'Q482980').hasNext()}[0..100]
Places in the U.S. that are named after Francis of Assisi
[edit](TREE[30][150][17,131] AND CLAIM[138:676555])
g.wd('Q676555').in('P138').filter{it.toCountry().has('wikibaseId', 'Q30').hasNext()}.namesList()
All items in the taxonomy of the Komodo dragon
[edit]TREE[4504][171,273,75,76,77,70,71,74,89]
g.wd('Q4504').as('loop').out('P171').loop('loop'){true}{true}.dedup().namesList()
All animals on Wikidata
[edit]TREE[729][][171,273,75,76,77,70,71,74,89] g.wd('Q729').as('loop').in('P171').loop('loop'){it.object.in('P171').hasNext()}{true}.dedup().namesList()
Bridges in Germany
[edit](CLAIM[31:(TREE[12280][][279])] AND TREE[183][150][17,131])
g.wd('Q12280').treeIn('P279').in('P31').as('b').toCountry().has('wikibaseId', 'Q183').back('b').namesList()
Bridges across the Danube
[edit](CLAIM[31:(TREE[12280][][279])] AND CLAIM[177:1653])
g.wd('Q12280').treeIn('P279').in('P31').as('b').out('P177').has('wikibaseId', 'Q1653').back('b').namesList()
Items with VIAF string "64192849"
[edit]STRING[214:'64192849']
g.E.has('P214value', '64192849').outV.namesList()
People who were born 1924-1925, and died 2012-2013
[edit](BETWEEN[569,+00000001924-00-00T00:00:00Z,+00000001926-00-00T00:00:00Z] AND BETWEEN[570,+00000002012-00-00T00:00:00Z,+00000002014-00-00T00:00:00Z])"} g.listOf('Q5').outE('P569').interval('P569value', DateUtils.fromYear(1924), DateUtils.fromYear(1926)) .outV.outE('P570').interval('P570value', DateUtils.fromYear(2012), DateUtils.fromYear(2013)).outV
Items 15km around the center of Cambridge, UK
[edit]AROUND[625,52.205,0.119,15] g.E.has('P625value', WITHIN, Geoshape.circle(52.205,0.119,15)).outV.namesList()
WHo was born in 427 BCE
[edit]g.E.has('property','P569').has('P569value', DateUtils.fromYear(-427)).outV.namesList()
Reconciliation from OpenRefine
[edit]
NOW IMPLEMENTED ! - SEE [OpenRefine-Wikidata interface]
Hey guys. I might be on a different page from what you have in mind for this feature, but it would be great if the service would be able to act as a reconciliation service for OpenRefine.
For those who haven't worked with OpenRefine, it's a web tool for data cleaning - the way I see it is a spreadsheet application with all the right buttons and features for data analysis. Reconciliation is a semi-automated process of matching text names to database IDs (keys) and it currently works out of the box with Freebase. This is most of the times enough for English-language data, as Freebase extracts some of its data from Wikipedia, but I found that it doesn't work so well with other languages. As Wikidata is much more multilingual and (hopefully) much more dynamic than Freebase, it would really help a lot if OpenRefine users could connect directly to Wikidata.
Some implementation notes:
- most of what reconciliation does can do can be done by calling the Wikidata API and parsing the return value with some scripts. Of course, this is not as straightforward for non-programmers.
- these is an OpenRefine extension that allows reconciliation against SPARQL endpoints and rdf dumps, so this might a quick way to have this functionality.
Looking forward to your input on this request.--Strainu (talk) 20:48, 12 December 2014 (UTC)
- It would be interesting to know what kind of requests such tool would need from Wikidata API. --Smalyshev (WMF) (talk) 07:15, 17 December 2014 (UTC)
- I'm not sure I understand the question: are you refering to the API between OpenRefine and Wikidata or to the information that one could extract from Wikidata. Could you please elaborate?--Strainu (talk) 22:53, 17 December 2014 (UTC)
- More what kind of queries OpenRefine needs to run - i.e. what types of queries, how big the result sets would be, etc. --Smalyshev (WMF) (talk) 05:24, 23 December 2014 (UTC)
- I'm not sure I understand the question: are you refering to the API between OpenRefine and Wikidata or to the information that one could extract from Wikidata. Could you please elaborate?--Strainu (talk) 22:53, 17 December 2014 (UTC)
- Well, it would work in two steps : first, it would do a search by name (possibly filtered by language) for the all the entries in the list. this can be arbitrarily large (up to several tens of thousands of elements) but I suppose it does support paging. Next, individual requests for different properties will be made for each element. In worse case scenarios, it would request all the properties of the wikidata entry, but more often it would be specific to 1 or 2 properties. These request can also be grouped in batches - - Strainu (talk) 10:59, 4 January 2015 (UTC)
- Search by name may be a bit tricky, since names are not unique, and it may already be covered by existing ElasticSearch. In general, this looks like something that is mostly already covered - you can already search by name and extract the data via JSON exports, isn't that the case? Maybe I'm missing something and other wikidata people could chime in here, or maybe even better on the IRC or mailing list. --Smalyshev (WMF) (talk) 22:09, 10 January 2015 (UTC)
- For the reference, I've built an OpenRefine interface for Wikidata, on top of the existing APIs: https://tools.wmflabs.org/openrefine-wikidata/ . It essentially uses the strategy described by Strainu. Feedback and pull requests welcome! − Pintoch (talk) 08:27, 29 March 2017 (UTC)
- {{Pintoch, that's great! I'll forward this to a few people using OpenRefine and hopefully gather some meaningful feedback!--Strainu (talk) 08:53, 29 March 2017 (UTC)
- For the reference, I've built an OpenRefine interface for Wikidata, on top of the existing APIs: https://tools.wmflabs.org/openrefine-wikidata/ . It essentially uses the strategy described by Strainu. Feedback and pull requests welcome! − Pintoch (talk) 08:27, 29 March 2017 (UTC)
Production use
[edit]While there may be some minor points re WDQ, it is used. It is used a lot, it is even vital for the Wikidata users. All the others are at this stage pie in the sky. This is not a tabula rasa situation.
There has also been a previous development for Wikidata query. Where can we read about that.. It was shelved for reasons that were not made public. It would have been standards compliant and it would have been in use for many months now. Thanks, GerardM (talk) 08:09, 26 December 2014 (UTC)
Tools
[edit]I would like to know if it will be possible to start a Gremlin query from tools developed by users and use the result for some automatic editing. --Molarus (talk) 23:32, 11 January 2015 (UTC)
- At some point we'll have a user facing query language but it likely will have restrictions that Gremlin doesn't have. We'll install this in labs and we'll release a relatively easy to install locally version that you can use for tooling. NEverett (WMF) (talk) 23:11, 28 January 2015 (UTC)
SQL-based service
[edit]Hello, one more suggestion: use read-only SQL DB as query engine. E. g. query "STRING[214:'64192849']" will look like:
- "SELECT EntityID FROM Claim WHERE Property = 214 AND StringValue = '64192849'"
Points:
- Robust, scalable well-known engine (for example MySQL).
- Many users know SQL language already.
- SQL has very rich query syntax.
- Very simple service implementation (redirect request to DB and translate result to JSON).
Negative:
- Query is more complex than can be.
This service can be used as Phase 1 or Low Level Service until better solution appears. Ivan A. Krestinin (talk) 12:03, 19 January 2015 (UTC)
WikiGrok Needs
[edit]- More detailed explanation on how "Item eligibility" and "Potential claims generation" supposed to work with regard to Query Engine
- Top 5/10 list of high priority queries, including
- Query description
- Output expectance (yes/no, list - where, in which format)
- Interactivity - online/offline, up-to-date reqs for lists - i.e. how frequently the list should be updated
- For lists - do we need exhaustive list or some sample?
- More queries with lower priority, along the same lines
Synchronization of BlazeGraph
[edit]Can someone point me at a description of how the BlazeGraph instance is synchronized from Wikidata? I'm interested in setting up my own clone so that I can hammer it without compunction. Is the synchronization pipeline based on open source scripts? Does it use public APIs? Thanks, Bovlb (talk) 23:53, 3 March 2016 (UTC)