Talk:Quarry/Flow

About this board

Discussion area for discussion about Quarry itself and help about individual queries.

OperationalError('table resultsets already exists')

5 comments • 12:39, 9 November 2024 1 month ago

5

Enhancing999 (talkcontribs)

Any idea why that happens? I get it after long running queries. Sample: quarry:query/86096. If I happen to have the window open, I sometimes see the actual results before, meaning the query was successful. Usually I forget to export it before it disappears.

Reply 09:02, 10 September 2024 3 months ago

Framawiki (talkcontribs)

Hello! it seams to be an internal bug in Quarry. Could you open a bugreport on Phabricator for it? Thanks!

Reply 11:51, 16 September 2024 3 months ago

Prototyperspective (talkcontribs)

Just had the same problem at query 86864. Is there a bug report about it by now? I could not find one with that error so probably not (please link it here). I refreshed the page and then submitted the query again, hopefully it works now.

Reply 08:39, 11 October 2024 2 months ago

Enhancing999 (talkcontribs)

No, I might be mistaken, but frequently nothing happens once one adds a problem to phab. Just creates cost at WMF and work for bug sorters.

Reply 06:57, 12 October 2024 2 months ago

Prototyperspective (talkcontribs)

It wouldn't be an issue or unexpected if that frequently happens but it's the most common thing that happens and even for major problems, including major issues open for over a decade. Changing that is I think step 1 and I made a concrete proposal for that here (and several more that are linked there).

Reply 12:35, 12 October 2024 2 months ago

Reply to "OperationalError('table resultsets already exists')"

Two feature ideas

3 comments • 11:18, 5 November 2024 1 month ago

3

Zar2gar1 (talkcontribs)

Hi there,

I've started trying out Quarry recently, and I really like the service. It's focused, snappy, and makes results easy to link to. However, I've hit a couple limitations in what I want to do.

I figured from the start I'll ultimately need to code up something more complex at Toolforge, but I realized a couple things would allow for more heavy-lifting with Quarry:

Is there any reason the watchlist table is entirely redacted, instead of just user fields being blanked? I guess this one's more a general question about the DB replicas
How exactly is the result-set data stored from successful queries? And would it be possible to provide it through SQL, even temporarily in a cache somehow? Maybe similarly to the ToolDB databases? My thinking is that could allow decomposing queries, then joining or filtering their result-sets, all asynchronously through Quarry.

I can fill out feature request tickets at Phabricator, but I thought I'd ask here first in case I'm missing something obvious.

Reply 23:51, 3 November 2024 1 month ago

BDavis (WMF) (talkcontribs)

> Is there any reason the watchlist table is entirely redacted

T59617: Make watchlist table available as curated foo_p.watchlist_count on labsdb

> How exactly is the result-set data stored from successful queries?

https://github.com/toolforge/quarry/blob/main/quarry/web/results.py

Reply 16:28, 4 November 2024 1 month ago

Zar2gar1 (talkcontribs)

Perfect, that answers my questions exactly. I'll look into it further and maybe I can contribute some on the software end.

Reply 11:18, 5 November 2024 1 month ago

Reply to "Two feature ideas"

Character count

2 comments • 21:26, 20 October 2024 1 month ago

2

Liz (talkcontribs)

Is there a character count for Query names? Because there is a new editor, Yesh0305, who is writing ridiculously long query names and the table at https://quarry.wmcloud.org/query/runs/all gets all out-of-shape. I've posted to their talk page but I don't think they even realize that they have a talk page. I've looked at their global contributions to reach out to them on their home Wikipedia (which I think is tewiki) but they had none so they must use a different username on Quarry. Maybe there could be a reasonable character limit on names, like 20-30 characters. What do you think?

Reply Edited 05:57, 18 October 2024 2 months ago

Enhancing999 (talkcontribs)

You mean https://quarry.wmcloud.org/Yesh0305 ? @User:Yesh0305

I don't think name such as "Compare each top editor's total edits against the overall average: How much more than average as percentage" is problematic. It's actually fairly descriptive as name.

Personally, I'm either too lazy or try to keep them short because it becomes the download name, but in principle, in a list of queries, the above can be sensible.

Reply 21:26, 20 October 2024 1 month ago

Reply to "Character count"

Why are queries getting stopped?

4 comments • 15:21, 12 October 2024 2 months ago

4

Prototyperspective (talkcontribs)

It says the query was stopped but I did not stop it.

Reply 11:59, 11 October 2024 2 months ago

Prototyperspective (talkcontribs)

Bug report: https://phabricator.wikimedia.org/T377010 I can't run any queries because they get stopped!

Reply 17:45, 11 October 2024 2 months ago

Enhancing999 (talkcontribs)

I had that too. I assumed a dbadmin stopped it as it was running for a long time.

Reply 06:54, 12 October 2024 2 months ago

Prototyperspective (talkcontribs)

That could be the case. It could also be because there were issues with the database or because some limit was hit. I think at a minimum it should display some error message / info. One of the queries did run through now and with the bug report above I guess this is solved here.

Reply 12:33, 12 October 2024 2 months ago

Reply to "Why are queries getting stopped?"

use from python

4 comments • 09:26, 1 October 2024 2 months ago

4

Enhancing999 (talkcontribs)

From locally running python, what's the best way to run a query and download the result?

Supposedly Manual:Pywikibot/MySQL can't work with Quarry, except from toolserver.

Reply 18:28, 28 September 2024 2 months ago

Matěj Suchánek (talkcontribs)

There has never been support for Quarry in Pywikibot, but recently support for Wikimedia Superset was added. Use SupersetPageGenerator in code or -supersetquery from command line.

Reply 08:08, 29 September 2024 2 months ago

Enhancing999 (talkcontribs)

Interesting suggestion: I should try to figure out how to get Superset to work. The access to create new datasets seems to be limited.

Reply 11:50, 29 September 2024 2 months ago

Enhancing999 (talkcontribs)

Is there a way to trigger the update of query from python and then load the result of the most recent run without knowing the run number?

Reply 09:26, 1 October 2024 2 months ago

Reply to "use from python"

Page deletions from 2023 seem off

2 comments • 19:08, 25 September 2024 2 months ago

2

Clayoquot (talkcontribs)

According to https://quarry.wmcloud.org/query/71599, the English Wikipedia deleted 440,817 pages in 2022 and only 54,216 pages in 2023. Does anyone know of a possible explanation for this?

Reply 18:27, 25 September 2024 2 months ago

Matěj Suchánek (talkcontribs)

I think COUNT(log_namespace = 0) is incorrect. When I reproduce the stats using:

SELECT LEFT(log_timestamp, 4) AS year, COUNT(*) FROM logging_logindex WHERE log_namespace = 0 AND log_type = 'delete' AND log_action = 'delete' GROUP BY LEFT(log_timestamp, 4);

I get 89,872 deleted main space pages in 2022 and 81,099 in 2023. For even namespaces (log_namespace % 2 = 0), it's 440,817 and 391,807.

Reply 19:08, 25 September 2024 2 months ago

Reply to "Page deletions from 2023 seem off"

What's more efficient: AND NOT, not in, <>

2 comments • 15:30, 6 September 2024 3 months ago

2

Enhancing999 (talkcontribs)

Which is more efficient?

AND NOT ( lt_title = "ABC" ) AND NOT ( lt_title = "XYZ")
AND NOT in ( "ABC", "XYZ")
AND lt_title <> "ABC" AND lt_title <> "XYZ"

Agree that none is ideal.

Reply 14:34, 6 September 2024 3 months ago

Matěj Suchánek (talkcontribs)

As the first thing, the query engine builds a query plan, then executes the query according to the plan. You can check if the query plan is always the same (e.g., using Toolforge SQL Optimizer). If it is, there is no difference.

I prefer NOT IN.

Reply 15:30, 6 September 2024 3 months ago

Reply to "What's more efficient: AND NOT, not in, <>"

Using 2 databases

7 comments • 06:08, 5 September 2024 3 months ago

7

Enhancing999 (talkcontribs)

How to specify tables from two different databases (wikidatawiki_p and commonswiki_p)?

Reply Edited 12:21, 2 September 2024 3 months ago

Enhancing999 (talkcontribs)

I tried

SELECT * FROM `commonswiki_p`.`pages` LIMIT 1
USE DATABASE commonswiki_p
USE commonswiki_p;

to override what's specified in the GUI.

Reply 12:33, 2 September 2024 3 months ago

Matěj Suchánek (talkcontribs)

It's not possible since around 2021. See Topic:W6tzj276xib56phf.

Reply 13:26, 2 September 2024 3 months ago

TheDJ (talkcontribs)

They are completely separate DB servers, you cannot make queries across multiple servers in the same query.

Reply 20:13, 3 September 2024 3 months ago

Enhancing999 (talkcontribs)

Apparently it was possible (see sample in the topic referenced by Matej) but then un-featured.

I found an easier solution, as the gap between Wikidata and Commons is only partial: one table at Commons is updated ( wbc_entity_usage), but not the other (page_props): quarry:query/86040

Reply 04:06, 4 September 2024 3 months ago

TheDJ (talkcontribs)

"Apparently it was possible " Yes, until the infrastructure ran into scaling problems.

Reply 09:58, 4 September 2024 3 months ago

Enhancing999 (talkcontribs)

Are there any measure in place to keep the databases in sync? The gap mentioned above is minor in percentages (maybe 0.1%), but in absolute numbers 4600 is a lot.

Reply 06:08, 5 September 2024 3 months ago

Reply to "Using 2 databases"