I am used to Special:LintErrors on nl.wiktionary showing estimated numbers. But the line "missing end-tag" for the last weeks has been showing an increasing total, not reflecting that all new errors have been corrected. Has there been a change in the way the total is estimated? At present the page is becoming pretty useless.
Topic on Help talk:Extension:Linter/Flow
There has been no change to how it is done. It is based on estimates provided by mysql.
But how is this estimate computed? I looks like the removal of errors has no or only a very slow effect on the estimate. If it is meant to encourage that errors are removed, this seems counterproductive. And even as a tool is has become useless: I have to look at the subpage to find out whether there are any new errors.
The estimation code is pretty wacky, see https://www.percona.com/blog/2011/10/13/when-explain-estimates-can-go-wrong/ and https://stackoverflow.com/questions/1037471/why-the-rows-returns-by-explain-is-not-equal-to-count, https://phabricator.wikimedia.org/rMW6261b4187a91099f4f2fd900562506cf2b3df984 . The hard part is probably fixing the estimate.
The stackoverflow link suggests that using analyse instead of explain will yield better results (at least with mariadb they seem to output different results).
Anyway, the alternative is pretty simple, whenever the number of rows returned by the estimate are less than 5000, simply run the full sql query, or instead of getting the estimate, just re-run the query clientside using javascript api once the special page loads.
https://nl.wiktionary.org/wiki/Speciaal:LintErrors/missing-end-tag currently estimates 26, and lists 12. I don't know what the numbers were when Marco looked at it yesterday.
If I remember correctly: estimate 23, listed 10 (8 of which are more or less permanent, the other 2 were resolved. Now we have 4 new errors on new pages, bringing the total back to 12. But the actual number remains most of the time close to 8. There has been a single time a few weeks ago when a single page once contained a larger number of errors, pushing the total over 20 for just one day. Somehow I get the feeling that the estimate keeps using this single instance as a basis.
The new errors were corrected this afternoon (local time 13:30), so the actual number became 8 again. At the moment (local time: 19:45) the total is back to 23.
All this is just the behavior of the database estimation tool.
As the anon user above indicated, the solution for this is for us to update the code to do an actual count when the # of results is estimated to be under some threshold that has acceptable performance from the DBA point of view.
Is there anything I can do to help bring this about?
It requires development to be done on the Linter codebase. We are currently heads down on the high-priority work of porting Parsoid to PHP and we are unable to pick up linting improvements till we are past that milestone.
But, if you are a developer or know someone who wants to work on linter improvements before then, I could probably squeeze in time to help with code review and guiding the work.