Jump to content

Talk:Quarry

About this board

Previous discussion was archived at Talk:Quarry/Archive 1 on 2015-04-17. Discussion area for discussion about Quarry itself and help about individual queries.

Plastikspork (talkcontribs)
Reply to "Queued queries"

Query for wikipedia user registrations by day

3
Felipeangelim (talkcontribs)

Hi! I was trying to get the timeseries of new user registrations in wikipedia, by day (two columns, day and count), but my query is really slow. Am I doing something wrong?

SELECT

  DATE(user_registration) AS registration_date,

  COUNT(*) AS user_count

FROM

  user

WHERE

  user_editcount > 10

GROUP BY

  DATE(user_registration)

ORDER BY

  registration_date;
Matěj Suchánek (talkcontribs)

There is no index on user_registration and user_editcount, so you are doing nothing wrong, it's just impossible to use a better query plan than scanning the whole table.

Matěj Suchánek (talkcontribs)

You can try query Special:Log/newusers instead. For example:

SELECT LEFT(log_timestamp, 8), COUNT(*)
FROM logging_logindex
JOIN user ON user_name = REPLACE(log_title, '_', ' ')
WHERE log_type = 'newusers'
AND log_action IN ('create', 'newusers')
AND log_timestamp > '2006'
AND user_editcount > 10
GROUP BY LEFT(log_timestamp, 8);

There are even more values possible for log_action, but I'm not sure if the JOIN worked for them, too.

Reply to "Query for wikipedia user registrations by day"

All files in a category (for categories with million+ files)

2
Schlurcher (talkcontribs)

Hi, I'm trying to efficiently get a list of all files in a category with 1'000'000+ files. For example all files on Commons in this Category "Category:Flickr images reviewed by FlickreviewR 2". This was the most simple I could come up with, but it takes like forever to get an output: https://quarry.wmcloud.org/query/83527

Is there a better way or does this require another tool (which would that be)?

TheDJ (talkcontribs)

You would have to write a dedicated tool to do requests that divides the query into multiple chunks and writes them to a file. You could use the Special:MyLanguage/API:Categorymembers api for instance, and list per 500 (limit) and use the continue parameter from request to request. Or get a database dump installed locally and export them from there. A million is a lot, it is not a type of request that the wikimedia optimises for.

Also depending on what you are trying to achieve, there might be alternate/better ways to achieve that goal other than listing a million+ entries in one go.

Reply to "All files in a category (for categories with million+ files)"

Help finding articles with a given template

6
Wizmut (talkcontribs)

I am trying to find articles that use "Template:Auto short description" but not "Template:Short description". Is this possible?

Matěj Suchánek (talkcontribs)

Though it is possible using an SQL query, you can also use the default search.

Wizmut (talkcontribs)

Thank you. But I also need to sort by the length of the induced short description, so it has to be SQL.

Matěj Suchánek (talkcontribs)

I think I misunderstood your request. "Template:Short description" is used inside articles, but "Template:Auto short description" is only for documentation purposes. So you actually want articles with a template documented as "short description inducing" without "Template:Short description", right?

I made this query (hope it's correct):

SELECT page_title, LENGTH(pp_value) AS len, pp_value
FROM page_props
JOIN page ON page_id = pp_page
LEFT JOIN templatelinks ON tl_from = page_id AND tl_target_id = 137  # Short_description
WHERE pp_propname = 'wikibase-shortdesc'
AND tl_target_id IS NULL
AND page_namespace = 0
AND page_is_redirect = 0
AND EXISTS (
 SELECT 1 FROM templatelinks
 JOIN linktarget ON lt_id = tl_target_id
 JOIN page ON page_namespace = lt_namespace AND page_title = lt_title
 JOIN categorylinks ON cl_from = page_id
 WHERE tl_from = page_id AND cl_to = 'Templates_that_generate_short_descriptions'
);

It took 10 minutes and the only returned entry was for en:Main_Page.

Wizmut (talkcontribs)
Wizmut (talkcontribs)

The "AND tl_target_id = 137" seems to filter in only articles with explicit SDs. But changing it to "AND tl_target_id <> 137" gives a list of template links that aren't T:short_description. I'm not sure how to make it return a list of pages instead of template links.

Reply to "Help finding articles with a given template"

Quarry appears to have stopped working properly

3
The Anome (talkcontribs)

I seem to having problems with queries running and/or completing. Are any other users seeing this?

GTrang (talkcontribs)
GTrang (talkcontribs)
Reply to "Quarry appears to have stopped working properly"

SQL query in Quarry for getting the current statements on a Wikidata item given its QID

5
Rdrg109 (talkcontribs)

I am learning how to use Quarry to query Wikidata data. I have written these two queries:

  1. Get the number of statements on a Wikidata item given its QID
  2. Get all revisions on a Wikidata item given its QID

I now want to write a SQL query that gets the current statements on a Wikidata item given its QID (the returned rows should be the same of those rows returned by this SPARQL query). This is my initial idea on how to do it: I know that the table revision contains all the revisions on a given page and the table comment contain information on the introduced changes, so one way to do what I want to do is to parse the column comment_text (see an example in the results of query 2 above), which describes the changes, and determine the latest changes. I think this method is complex to implement using SQL since I need to determine which changes were not edited by any other other changes. I wonder if there's a simpler approach or a table that already contains the current statements on a Wikidata item.

I know that the current statements on a Wikidata item can be easily obtained in SPARQL (this query already does that), but since I'm learning how to query data in Quarry (i.e. using SQL), reading a SQL query that does that would help me to understand more about how data is stored and should be queried in Quarry.

Matěj Suchánek (talkcontribs)

I wonder if there's a simpler approach or a table that already contains this information. SQL queries are not suitable for Wikidata data model, that's why Wikidata Query Service exists. Some SQL queries are possible, but they are rather management-oriented, not data-oriented.

Rdrg109 (talkcontribs)

For the record, I asked a similar question in Libera Chat's room #wikimedia-cloud and some user replied the following:

18:50 <rdrg109> For the record, I have created a topic with that question in Talk:Quarry here: https://www.mediawiki.org/wiki/Topic:Y3u6dz3ci6eqlura
19:08 <+wm-bb> <<hidden user>> rdrg109: it’s basically not doable. it’s best to use SPARQL instead
19:30 <rdrg109> <<hidden user>>: Ok, thanks for the help!
BDavis (WMF) (talkcontribs)
Dipsacus fullonum (talkcontribs)

It is impossible to reconstruct all claims from comment_text. There are several Wikibase API commands that create or edit entities without specifying in the comment text which claims are created, removed or modified. See for example the creation of d:Q125692383 today which was created with 5 claims at once without it being visible in the comment.

Reply to "SQL query in Quarry for getting the current statements on a Wikidata item given its QID"

Internal server error when trying to stop a query

1
Tom.Reding (talkcontribs)

I tried using 2 different browsers, Chrome & Firefox, but get the same result. 79955 & 82289 have been in queue status for ~18 hours.

task T363644

Reply to "Internal server error when trying to stop a query"

Get values for a parameter of a template

4
Ameisenigel (talkcontribs)

Is there any possibility to query the values of a specific parameter of a template? What I would like to have is something like a list of the values that are used for parameter "X" in template "Y".

Matěj Suchánek (talkcontribs)
BDavis (WMF) (talkcontribs)

There is no native MediaWiki data maintained to track this interesting semantic data. There once was a community maintained tool called Templatetiger that maintained a database of this type of information. Templatetiger's database was constructed by reading the monthly database dumps and parsing the wikitext content they contain. Technically the tool still exists, but the data it contains has not been updated in many years.

Ameisenigel (talkcontribs)

Thanks to both of you for your answers!

Get the latest page assessment

2
2601:40F:4002:4DD0:E567:3C2B:9BAC:7198 (talkcontribs)

What is the best way to get the latest assessment value per page per project. I only care about pages "assigned" to projects.

I tried to get all "page_assessments_projects" table but waited for hours

thank you

2601:40F:4002:4DD0:E567:3C2B:9BAC:7198 (talkcontribs)

* I tried to get all "page_assessments" table....

Reply to "Get the latest page assessment"

Where are non-local file pages stored?

4
StefenTower (talkcontribs)

On enwiki, I noticed that when I query the Page table, it gives me only local File: pages but not Commons File: pages used on enwiki. In what table do I find info on those? Sometimes they are in WikiProjects by way of their talk pages, so I'd like to connect them to see which of these files are in a WikiProject.

BDavis (WMF) (talkcontribs)

Non-local File pages are not stored in the wiki's local database. Instead they are exposed via remote access to the foreign wiki's database either directly (true for most Wikimedia hosted wikis -> Commons) or via API (InstantCommons). Pages like Manual:$wgForeignFileRepos give some clues about how this works on the backend.

If I'm understanding your use case, I think it is similar to the problems described by T268240: Provide a mechanism for detecting duplicate files in commons and a local wiki and/or T267992: Provide mechanism to detect name clashed media between Commons and a Local project, without needing to join tables across wiki-db's. The TL;DR there is that it is not possible to perform an SQL join across the backing database partitions for enwiki and commonswiki. This type of computation needs to currently be done in custom software that makes separate requests to both sides of the partition instead.

StefenTower (talkcontribs)

Thank you for your quick response. I have created a workaround that assumes that the matching File: page is there, as oddly enough, their associated File talk: pages are tracked in the Page table.

BDavis (WMF) (talkcontribs)

File talk: pages would be local. Nice find! :)

Reply to "Where are non-local file pages stored?"