Jump to content

User:Schnark/October 2011 Coding Challenge

From mediawiki.org

If you are interested, here are some notes on what I thought about when I took part in the October 2011 coding challenge.

First thoughts

[edit]

Since I'm reading wikitech-l I heard about the contest before the start, and since I've already written lots of user scripts on de.wikipedia, I was interested in it. But when I saw the different challenges, well, I didn't really like them.

I don't have a mobile device, so I won't upload images with one, and can't test any code to do so.

Slideshows: On the one hand that's too easy, many projects already have some code for slideshows in their MediaWiki:Common.js, there are lots of jQuery-plugins for slideshows, put some of this existing code together, and it will work. On the other hand it's too difficult, since any script that blocks direct access to the image description page that is enabled for all users must be able to recognize all information relevant for the license, including strange templates in languages you can't even read and description pages predating standardized license templates.

Okay, let's have a look at the remaining challenge: I don't use social networks, so I won't code a share button, page views can't be retrieved by an user script alone.

Article update ping? The article you are reading has just been vandalized, do you want to see the new version? Or: The article has been updated, but it isn't flagged yet, so just ignore it. Or: Someone corrected a type you didn't notice. Do you want to see the version without typo anyway? That doesn't sound like a great idea to me.

So, the competition is a nice idea, but the challenges aren't, perhaps next time if there will be a next time.

Second thoughts

[edit]

The next day I remembered how I became a regular editor of Wikipedia. I had tried different things first, I had written some (well, actually two) short articles, I had fixed links to disambiguation pages, now I was watching the recent changes to revert vandalism, but that wasn't my favorite way to contribute to Wikipedia either. And then I saw an edit to a maintenance page, that I hadn't heard of before. I looked at this maintenance page and since then up to today the majority of my edits is related to that page. At that time this page was probably one of the most edited pages in the German Wikipedia, so I would have noticed it sooner or later by just looking at the recent changes.

First idea

[edit]

So a page that is edited many times in short period was interesting to me. Will such pages be interesting to others? Yes, new users like I was can find a place to contribute to, for regular editors it can answer questions like: Is there an ongoing discussion about some guidelines? Are there new interesting projects? Is there an important poll I've overlooked? It can find edits by a new user who needs help (or at least a hint to use the preview button). It can detect edit wars and pages that are heavily vandalized. Even for readers it can be interesting: Why is this article edited so many times? There must be interesting news about it!

So let's write a tool that shows the pages with the most edits in the last time!

Second ideas

[edit]

What do we need for this? It sounds like a typical idea for a toolserver tool, but I don't write toolserver tools, I write user scripts. The script is related to Special:RecentChanges but that page is already full of information, so this is no good place to add even more. It's probably best to use Special:BlankPage with a faked action, link it from the toolbar and from the recent changes.

What parameters should the user be able to set? Looking at the parameters for the recent changes, apart from the time one important thing are namespaces. Readers don't care about edits to project pages, so they should be able to hide them. But all the other parameters aren't really necessary for the script:

Exclude own edits: If your edits a significant to the selection of the most edited pages you either don't have time to use the script or you use it to find out, if you really pushed the page to the top.

Exclude bot edits: Again, if bots a significant, this is either a bot edit war or a page that is created in many other languages. Both is interesting.

The only thing that could make sense to exclude are minor edits. But if someone wants this feature, it can be done later.

What information is needed? This shouldn't need more than api.php?action=query&list=recentchanges. It definitely would be interesting to see how many edits are reverted but for this we need either the complete content (which is too much for a nice little script), a hash of the content (which doesn't exist at the moment), or we have to parse and believe the comment (which doesn't seem to be a good idea either). Let's look at the information we can get: user is interesting, then we can show something like x edits by y users (z anonymous users). comment can be used to find out if there is a special section where all the edits are done (Is there a discussion in the village pump I should have a look at?). Instead of finding out which revisions have been reverted we can at least show the changes in size, so ask the API for sizes too. All the other information doesn't seem very interesting, we could show how many minor edits occurred, but since we didn't care about minor edits above, why should we now?

So all we need are some calls to api.php?action=query&list=recentchanges&rcstart=user input&rclimit=max&rcnamespace=user input&rcprop=user|comment|title|sizes until we have all data until now, then count and sort the pages, and show them.

It works! The interface is still missing, the texts aren't localized yet, everything looks a bit ugly, but it works. JSHint doesn't complain about anything (at least if you tick off all options that cause complaints ...) and the code shows that not only I've read Manual:Coding conventions, but also tried to follow them. What you can't see are some really stupid mistakes that are only visible in the history of the copy of the script in my private wiki I'm using for testing. (Well, actually there are some of them you can see.)

What needs to be done next?

  1. Most important: the interface. It must be possible to select the namespace, the period to look at (at the moment a hardcoded hour), and the pages to show (at the moment ten). There also must be some links pointing to the page, typing in the URL is not the thing a normal user does.
  2. The information you get is very limited. We need to differentiate between logged in users and anons, we need to collect the same information for the sections, ignoring the minor flag wasn't a good idea. The information needs to be formatted in a nice way, too.
  3. In tests on en.wikipedia and de.wikipedia the script was a bit slow, I had to wait several seconds before the list showed up. At least a spinner should be shown to the user. The number of API calls should be limited, too.
  4. Links to sections don't work, the anchors must be encoded.
  5. A possibility to localize the texts is necessary.

Okay, here is the interface. It still isn't possible to put the time in the URL, but the parameters namespace, invert and associated are recognized. Links to sections now work (and if you think encodeSectionLink should be in core, feel free to put it into mw.util). The number of API calls is now limited (for which I had to change the directions and related stuff, and of course introduced a bug by this), additionally a spinner is shown while the data is fetched. (Well, at least in my test wiki that runs the trunk version of MediaWiki. Why isn't this in 1.18?) More information is shown.

Messages seem to come from mess. At least when there is no proper way to get the messages. So I just copied them and put them in using mw.messages.set. Once {{PLURAL:}} is handled by mw.msg this will be really fun to work with it. But it looks great even now, at least if you use a language for which a localized the script (English and German, including some variants). The bugs mentioned above should be fixed now. Hey, this is the first version that actually works! And I think that it is really useful, at least I already found interesting pages with my tool.

Two bugs remain: A typo that causes the "associated"-checkbox not being checked when it was on the recent changes. The second issue is the fact that due to my strange approach to calculate the size change log entries at the beginning or end result in nonsense. Since the texts say "edits" and not "edits and log actions" the best thing is to throw out the log entries entirely.

There is another thing that should be changed: Almost every project seems to have set the link to the recent changes in the sidebar to a portlet different from the default position, which causes the new link to appear at a more or less random place. The script should try to find the correct portlet and put the link there.

If something isn't really broken it needs more features! What about an arrow that shows if the number of edits seems to be increasing or decreasing? Normally I'd count the edits up to the half time, but we don't know the end time (it can be earlier than expected since we limit the number of API calls). So let's calculate the average time instead. We need to fetch the timestamp for each edit, convert it to UNIX time, and sum up the differences to the current time, and compare the mean to the difference of the last (i.e. the first) edit (which we will know right at the end).

The scripts lies sometimes: If there are too many edits in the selected period, it will be shortened silently. On first thought this is simple: Port formatTimePeriod from Language.php to JavaScript and put it in a nice sentence. This doesn't work, at least not without damaging the used language: The sentence sounds worse than my English. But even just showing the shortened period without trying to put it in a sentence is bad, since {{PLURAL:}} doesn't work but is strongly needed. Okay, there actually is {{PLURAL:}} support in mediawiki.language, this is even loaded by default, I already used a hack to make mw.html.element behave the way it should behave, so I don't have scruple to do so with mw.msg There it is: a quick-and-dirty solution for a mw.msg supporting {{PLURAL:}} (and {{SITENAME}}).

The script has grown to something that both has some interesting code and does something interesting. I already was able to find other participants working on there scripts, and I definitely will use it in the future. When I think another script by me needs {{PLURAL:}} support I now have a place to copy the code from though I hope that something like that will go into core JavaScript soon so I can remove the hacks.

After the contest: The enhanced mw.html.element was backported, an there is this new mediawiki.jqueryMsg, so it's time to remove the hacks. The great thing: It still works!

TODO

[edit]

What remains to be done?

  • Remove all the hacks once this is possible. With MediaWiki 1.19 it is possible to put the spinner in with a jQuery module, and the hack for mw.html.element is no longer needed then. Once mw.msg cares about {{PLURAL:}} that hack is obsolete, too. And with a new version of the Gadget extension the mess with the messages should be past.
  • There are circumstances (strange skins, other scripts) under which the interface won't look the expected way. If somebody cares he could try to find better selectors etc.
  • Many issues only show up when the script is heavily used. So if you find a bug let me know.