Jump to content

User talk:Legoktm/parsoid re-parse

Add topic
From mediawiki.org
Latest comment: 7 months ago by TripleCamera2022 in topic Update this project?

Update this project?

[edit]

@Legoktm @SSastry (WMF) Hello.

We are using this script to refresh Lint errors on our own wikis (since this is the only way to do so). However, this script lacks functionality.

I found this page via Google, and it says that WMF is internally using a modified version of this script. Is it possible to make it public or publish it to GitHub & pip so that people can contribute?

Thanks. TripleCamera2022 (talk) 08:30, 20 October 2023 (UTC)Reply

Known issues:
  1. How to run this script? My current approach is cd parsoid-reparse and then python __init__.py .... I don't think this is the right way.
  2. Cannot change number of threads in arguments (32 is too big).
  3. Parsoid URL (in build_parsoid_url) suits WMF wikis, but not non-WMF wikis.This is not an issue. For non-WMF wikis, argument parsoid_server should be path to rest.php.
  4. Cannot be interrupted by Ctrl + C.
  5. This script sometimes accesses one page infinitely, while the counter stays at TRY # 1.
I don't know if they are fixed in the internal version, I'm just listing them here.
TripleCamera2022 (talk) 09:50, 16 December 2023 (UTC) (edited)Reply
I suspect this script comes from the Tidy days, so my memory is faint since we no longer use this. But, can you post a link to the script (I imagine you found this in some repo -- Linter / core / Parsoid?) and I can then better offer suggestions. SSastry (WMF) (talk) 20:03, 23 October 2023 (UTC)Reply
OK. The source code is at https://git.legoktm.com/legoktm/parsoid-reparse. There are also a few related links:
By the way, may I ask why this script is no longer used? Is it because there is an alternative way to clear Parsoid cache? TripleCamera2022 (talk) 16:29, 24 October 2023 (UTC)Reply
This script was used back in the day when we were still fine tuning linting code and linter categories and we needed to quickly populate all lints across an entire wiki (for all wikis on the cluster) without having to wait for the pages to be edited (and hence reparsed and relinted). Once the lints have been initialized, we don't need to do this again -- as pages are edited, lints get updated when the pages are reparsed by Parsoid. When we introduce new lints, we might want to reparse everything, but (a) that is very infrequent (b) we are happy to have the lint repopulate organically.
It is possible this may be needed some time in the future again, but we will cross that bridge if / when we get there. SSastry (WMF) (talk) 16:57, 24 October 2023 (UTC)Reply
My friends have configured Linter recently, and they are using this script to refresh Lint errors. According to them, there are only two ways to clear Parsoid cache: one is to access rest.php, another is to edit a page using Visual Editor. Besides, they say running refreshLinks.php wouldn't help (as opposed to what Extension:Linter#Bootstrap_or_reprocess_all_pages says). Is this true? TripleCamera2022 (talk) 08:42, 26 October 2023 (UTC)Reply
A gentle ping~ TripleCamera2022 (talk) 16:43, 8 November 2023 (UTC)Reply
I found the cause of issue #5: The counter is not decremented when errors other than timeout occur (source code). TripleCamera2022 (talk) 16:15, 30 April 2024 (UTC)Reply