Wikibase/Indexing/Benchmarks
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Wikidata query service is likely what you are looking for. |
Titan benchmarks
[edit]Made on einsteinium with external cassandra cluster.
Shorter lookups
[edit]These are short lookups that must be fast.
Checking random element without fetching property
[edit]w.measure(10000) { def a = g.V('wikibaseId','Q'+(random.nextInt(10000000) as String)).hasNext(); }
[18816, 13342, 15188, 12626, 12289]
Average: 14452.2
Time: 1.44522 ms
Checking random element
[edit]w.benchmark { 10000.times { def a = g.V('wikibaseId','Q'+(random.nextInt(10000000) as String)).labelEn.hasNext(); } }
[39330, 28555, 30037, 27755, 35049]
Average: 32145.2
Time: 3.21452 ms
Checking fixed node
[edit]This mostly measured cache performance.
w.measure(10000) { a = g.V('wikibaseId', 'Q30').labelEn.hasNext() }
[10889, 9779, 8969, 8930, 9467]
Average: 9606.8
Time: 0.9ms
Checking supernode
[edit]This mostly measured cache performance, but for supernode that has tons of incoming edges.
w.measure(10000) { def a = g.V('wikibaseId', 'Q5').labelEn.next(); }
[9611, 8339, 8174, 8360, 8815]
Average: 8659.8
Time: 0.8ms
Checking supernode out - first human
[edit]Navigating "wide" link out of supernode.
w.measure(100) { def a = g.V('wikibaseId', 'Q5').in("P31")[0].next(); }
[8689, 7015, 7194, 8082, 8515]
Average: 7899
Time: 0.7899 ms
Random human
[edit]This may stretch the cache a little more, but still be cacheable.
w.measure(10000) { def a = g.V('wikibaseId', 'Q5').in("P31")[random.nextInt(10000)].next(); }
[21395, 21192, 21288, 20017, 21699]
Average: 21118.2
Time: 2.11182 ms
Random human with name, bigger spread
[edit]This is probably outside of current cache size. Also, [] probably does linear scan, so it behaves worse quadratically, as expected.
w.measure(100) { def a = g.V('wikibaseId', 'Q5').in("P31")[random.nextInt(100000)].labelEn.next(); }
[27543, 24389, 24191, 23185, 26852]
Average: 25232
Time: 252.32 ms
Random human with name - cached
[edit]def a = g.listOf('Q5')[0].next()
Check if random entry is a human - non-cached
[edit]This is using "out" link to Q5.
w.measure(1000) { def a = g.V('wikibaseId', 'Q'+(random.nextInt(10000000) as String)).out("P31").has('wikibaseId', 'Q5').hasNext(); }
[6509, 3882, 4626, 4165, 3371]
Average: 4510.6
Time: 4.5106 ms
Check if random entry is a human - cached
[edit]This uses "link" property on the vertex itself. Surprisingly, not much difference!
w.measure(10000) { def a = g.V('wikibaseId', 'Q'+(random.nextInt(10000000) as String)).has('P31link', CONTAINS, 'Q5').hasNext(); }
[54131, 52634, 43485, 41180, 44011]
Average: 47088.2
Time: 4.70882 ms
Check if random entry is human and not disambiguation
[edit]Simplistic approach - just go by out links w.measure(1000) { def a = g.V('wikibaseId', 'Q'+(random.nextInt(10000000) as String)).as('x').out("P31").has('wikibaseId', 'Q5').back('x').filter{!it.out('P31').has('wikibaseId', 'Q4167410').hasNext()}.hasNext(); } [9069, 7610, 5076, 4825, 6499]
Average: 6615.8
Time: 6.6158 ms
More sophisticated condition handling using link property: w.measure(1000) { def a = g.V('wikibaseId', 'Q'+(random.nextInt(10000000) as String)).filter{'Q5' in it.P31link && !('Q4167410' in it.P31link);}.hasNext(); } [4489, 3696, 3677, 3597, 3480]
Average: 3787.8
Time: 3.7878 ms
Collect 1000 non-empty names
[edit]Using link property:
w.measure(1000) {t = []; g.V('P31link', 'Q5').labelEn.filter{it != null}[0..1000].aggregate(t).iterate(); assert t.size() == 1001;}
[29682, 29685, 31022, 30879, 28966]
Average: 30046.8
Time: 30.0468 ms
Using "in" edge. Now there's a big difference:
w.measure(100) {t = []; g.V('wikibaseId', 'Q5').in('P31').labelEn.filter{it != null}[0..1000].aggregate(t).iterate(); assert t.size() == 1001;}
[13203, 11387, 11429, 11385, 11359]
Average: 11752.6
Time: 117.526 ms
Find country
[edit]This would be heavily cached.
w.measure(1000) { def a = g.V('wikibaseId', 'Q1013639').toCountry().labelEn.next(); }
[2905, 2625, 2504, 2358, 2436]
Average: 2565.6
Time: 2.5656 ms
Find country of random neighborhood
[edit]This one may have less luck with caching.
w.measure(100) { def a = g.listOf('Q123705').shuffle()[0].toCountry().labelEn.hasNext(); }
[17432, 17212, 16752, 16681, 16310]
Average: 16877.4
Time: 168.774 ms
Check if random neighborhood is in Finland?
[edit]w.measure(100) { g.listOf('Q123705').shuffle()[0].toCountry().has('wikibaseId', 'Q33').hasNext(); }
[17707, 17807, 17310, 17461, 18288]
Average: 17714.6
Time: 177.146 ms
Longer list queries
[edit]These may generate long lists and are expected to be slower.
List of countries by population
[edit]The list is small, so most probably it's cacheable.
w.measure(100) { t= []; g.listOf('Q6256').as('c').groupBy{it}{it.claimValues('P1082').preferred().latest()}.cap.scatter.filter{it.value.size()>0}.transform{it.value = it.value.P1082value.collect{it?it as int:0}.max(); it}.order{it.b.value <=> it.a.value}.transform{[it.key.wikibaseId, it.key.labelEn, it.value]}.aggregate(t).iterate(); }
[2885, 2838, 2811, 2803, 2776]
Average: 2822.6
Time: 28.226 ms
List of all occupations
[edit]Probably caches too.
w.measure(100) { t = []; g.wd('Q28640').treeIn('P279').instances().dedup().aggregate(t).iterate(); assert t.size() == 2777}
[4647, 4530, 4593, 4549, 4479]
Average: 4559.6
Time: 45.596 ms
List of potential nationalities
[edit]WDQ produces 571815 results.
g.listOf('Q5').as('humans').claimValues('P569').filter{it.P569value != 'somevalue' && it.P569value > Date.parse('yyyy', '1750')} .back('humans').claimVertices('P19').toCountry().as('countries').select(['humans', 'countries']){it.labelEn}{it.labelEn}
List of humans having occupation writer but not author
[edit]This one has 36K+ entries, takes a lot of time. Maybe there's more optimal way to write the same query.
w.benchmark { g.V.has('P106link', 'Q36180').filter{'Q5' in it.P31link && !('Q482980' in it.P106link)}.dump("authors", "wikibaseId", "labelEn") } w.benchmark { t = []; g.V.has('P106link', 'Q36180').as('w').has('P106link', 'Q482980').aggregate(t).optional('w').except(t).dump("authors", "wikibaseId", "labelEn") }
86.017s
List of humans with no date of death
[edit]WDQ produces 14431 results.
w.benchmark { g.listOf('Q5').as('humans').claimValues('P569').filter{it.P569value && it.P569value < Date.parse('yyyy', '1880')}.back('humans').filter{!it.out('P570').hasNext()}.dump("undead", "wikibaseId", "labelEn"); }
4763.817 s
too slow, probably needs value index.