Engineering metrics in May:
Major news in May include:
Berlin hackathon (1–3 June 2012, Berlin, Germany)
- The Wikimedia technical community prepared tutorials and plans for the event. MediaWiki developers, Toolserver users, systems administrators, bot writers and maintainers, Gadget creators, and other Wikimedia technologists looked forward to learning about and working on Lua, Git, Gadgets changes, security, Wikidata, RENDER, and other Wikimedia technologies. More information will be available in the June engineering report.
Wikimania hackathon (10–11 July 2012, Washington, D.C., USA)
- Katie Filbert, Gregory Varnum, and Sumana Harihareswara are organizing a hybrid inreach/outreach hackathon occurring just prior to Wikimania, and aim to make it welcoming for both novices and experts. Experienced Wikimedia technologists will collaborate on their own projects, while interested new developers will be able to learn introductory MediaWiki development. Accessibility will be one of the event themes.
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
Site infrastructure
[edit]
Data Centers
- May has been a busy month for racking, stacking and provisioning of newly purchased servers, especially by Rob Halsell, Chris Johnson, Leslie Carr and Mark Bergsma. Recently, we purchased new hardware for server refresh (many are out of warranty and over 4 years old), adding capacity and redundancy, and for new projects, including servers for Search, Analytics, Fundraising, OpenStreetMap, databases, Varnish, Memcached and backups. Much effort was put into OS installation and servers network; they are now ready for the various system and application deployments.
- IPv6 work went into full swing as well, in order to be ready for IPv6 Launch Day on June 6. As of end of May, the database schemas were updated, and work started on refactoring LVS, PayPal, Varnish, Squid, DNS, Nagios monitoring and puppetization.
- In April, we deployed the newly built Search cluster at our Ashburn datacenter, and disabled the Tampa search cluster. This month, Peter Youngmeister went through the exercise of upgrading the 4-year-old Tampa search cluster infrastructure, and brought it back up. We now have a cross-datacenter hot standby for the 'Search' service.
- With Ubuntu 12.4 (Precise Pangolin) available, we have packaged and started using it selectively in some of our systems, including Search at Tampa and half of the LVS servers. Next, we will be setting up the Apache servers at Ashburn data center using Precise as well.
- Recently, we were experiencing a few systems rebooting themselves. Faidon Liambotis investigated and reported a bug with our kernel in 10.4 that caused servers to reboot after about 208 days of uptime. We applied the necessary kernel and security patches to the impacted servers.
- Ben Hartshorne has been working with the SwiftStack folks on enhancing Swift to provide additional Swift-specific monitoring to our ganglia tool. Next, they will work on identifying potential Swift performance bottlenecks (when under load) in our implementation and recommend mitigation. Ben has started testing the upgrading of our current version to the just-released 1.4.8. This should improve stability of the software.
Testing environment
[edit]
Wikimedia Labs
- The Labs infrastructure had a couple outages, due to excess load and the GlusterFS system. As a result, Ryan Lane, Faidon Liambotis and Andrew Bogott are working on a get well plan, which includes finding a suitable replacement of GlusterFS. The short term plan that is in the works however would expose us to a non-redundant infrastructure, by placing the instances in local storage on each node. Longer term plans are evaluating Ceph and possibly writing a new filesystem mode in OpenStack to use DRBD in a way similar to Ganeti. Faidon implemented a new way of managing puppet that allows users to test all of their changes locally before pushing them in for review. Sara Smollett moved her changes for ganglia from Labs to production. Andrew Bogott has been working on bringing up the new cluster in eqiad, for testing ceph, testing the upgrade of OpenStack from Diablo to Essex, and preparing for the new zone we'll add there. Ryan Lane wrote a new software deployment system, slated for use in Labs and in production, using git-deploy and saltstack.
Backups and data archives
[edit]
Data Dumps
- We've been busy creating bundles of media in use per project and the first set of files is almost complete. For each wiki, there is now one or more files containing all media uploaded locally to the wiki, and one or more files containing all media used by the wiki but uploaded to Commons. We've also been preparing for the media back-end switch to Swift; since we won't be able to make copies of all media files in the usual way, some scripts were hacked together which will check the
image
and imagelinks
tables and will retrieve and/or update media files via http as needed. Your.org and Masaryk University mirrors officially came online; we're still looking for other partners to host media backups and pageview statistics.
Visual editor [edit]
The team completed
release planning for June and welcomed
James Forrester as the Technical Product Analyst for the project. Ongoing work on the
Visual Editor and
Parsoid is tracked on-wiki.
Gabriel Wicke set up a very basic
parsoid service that lets users browse the English Wikipedia as Parsoid sees it, and convert Wikitext to HTML DOM and vice versa.
Article feedback [edit]
Fabrice Florin worked with OmniTI and new WMF engineer
Matthias Mullie to develop a
range of new features for version 5 of the
Article Feedback Tool (AFT5). This month, the team deployed a new look and feel for the
article feedback page (based on a streamlined design by
Pau Giner) as well as a
central feedback page, where editors can monitor posts from all articles on Wikipedia. We also developed a
final feedback form (scroll to bottom of page), which gradually engages users to contribute to the encyclopedia.
Dario Taraborelli,
Oliver Keyes and
Aaron Halfaker collected and analyzed data on how posting feedback impacts both
conversion and newcomer quality. Based on this analysis, we now project over 2 million feedback posts per month on the English Encyclopedia when the tool is widely deployed later this year (on par with the total number of edits per month). Our research suggests that posting feedback encourages a substantial number of users to productively edit articles on Wikipedia, which is expected to help reverse the recent decline in both new and existing editors.
Roan Kattouw continued to review our code and deploy weekly releases, while training our team to deploy their own code over time. We are planning for a wider deployment by the end of June, with full deployment a couple months later.
Page Triage [edit]
Ryan Kaldari,
Benny Situ,
Ian Baker,
Brandon Harris,
Oliver Keyes,
Fabrice Florin and
Howie Fung deployed the first prototype of a list view for
Page Triage on the English Wikipedia. This new tool, called
New Pages Feed, provides an enhanced list of pages for review by community patrollers. The team started work on a new curation toolbar to appear on article pages, enabling patrollers to get more article info, mark pages as reviewed, tag them or nominate them for deletion. Current goals are to complete development of this new curation toolbar this month, then deploy an integrated release version the following month, along with the
Article Creation landing system. Check out the
current prototype on the English Wikipedia, as well as the
latest version, now under development on Wikimedia Labs.
Article Creation Workflow [edit]
Fabrice Florin,
Benny Situ,
Ryan Kaldari,
Ian Baker,
Oliver Keyes and
Brandon Harris have put this
Article Creation feature on hold, in order to make more progress on the New Pages Feed project (code-named Page Triage). We are not comfortable deploying this feature until the New Pages Feed is released, because it may create more work for page patrollers.
Dario Taraborelli prepared a streamlined metrics plan to test this feature, and determine whether or not to build a special drafts tool. These metrics will be collected next month, once the New Pages Feed project is further along. The
current ACW prototype is available for testing on Wikimedia Labs.
ResourceLoader [edit]
Work was mostly delayed until the
June Berlin hackathon, as engineering resources have been devoted to Platform engineering (automated testing) and Visual Editor this month.
Wikipedia Education Program [edit]
Jeroen De Dauw completed the project.
Sam Reed is now reviewing the code.
2012 Wikimedia fundraiser [edit]
The fundraising team developed and deployed new filters to help identify and stop fraudulent transactions. In addition, the team made employment offers to two candidates that were accepted. The new staff will be integrated to the team, which will be fully staffed before Wikimania.
Internationalization and localization tools [edit]
The team continued integrating the first round of UI design for the
Universal Language Selector (ULS) for desktop and mobile browsers. The prototype to showcase the first version of ULS was completed and demonstrated. The team completed development and deployed enhancements to the
Translate extension with notification support, added more language support to the
Narayam extension, fixed bugs, reviewed code for i18n support in Mediawiki, and completed a first draft for language impact metrics. The team also participated in IRC office hours with the community.
Editor engagement experiments [edit]
The team started the development of the
Timestamp Position Modification experimental feature, which was deployed then disabled due to a conflict between the ClickTracking feature and the MediaWiki API. Further testing and tuning continues, as well as analysis, redesign and development of the
ClickTracking extension. We are gathering requirements for the next experiment on analyzing
post-edit feedback, and we continue to hire software engineers.
Mobile Contact US [edit]
Arthur Richards finished his changes to the
Contact us feature, and we've deployed the first version to production. We'll be collecting feedback over the next week to figure out what other features to add.
Wiki Loves Monuments mobile application [edit]
Phil Chang, Lindsey Smith,
Yuvaraj Pandian, and
Brion Vibber spent the month defining specifications, prototyping, and implementing the first version of the Wiki Loves Monuments (WLM) app. Phil & Lindsey worked with various members of the WLM community (including
Elke Wetzig and
Maarten Dammers) to better understand the requirements of the contest.
Mobile_design/Wikipedia_navigation [edit]
Phil Chang, Lindsey Smith, and
Jon Robson continued work on the navigation in May. The basic navigation has gone into beta testing and we've implemented article-level features like interwiki links & contents navigation, along with navigation features like random, settings, and contact.
Wikimedia Apps [edit]
The mobile team spent the month of May converting the Wikipedia app to use the API, increasing the amount of supported platforms, and porting it to the latest PhoneGap codebase.
Yuvaraj Pandian and
Max Semenik used the new mobile API to fully decouple the Wikipedia app and started beta testing. By using the new mobile API, the Wikipedia app no longer has to
screen-scrape the site, allowing us to make design changes that don't break the app experience.
Brion Vibber released the
Windows 8 version of the Wikipedia app. This new Wikipedia app is built following the Windows 8 Metro Style guidelines and only uses CSS, HTML5, and JavaScript. Patrick finished work porting the Wikipedia app over to PhoneGap 1.7. These changes will be included in version 1.3 of the Wikipedia app.
Wikipedia Zero [edit]
May saw the first official launch of
Wikipedia Zero in Malaysia.
Patrick Reilly,
Dan Foy and others conducted tests in Tunisia, Kenya, Uganda, Cameroon, Niger and Ivory Coast. Patrick worked with Dan to further improve the Wikipedia Zero extension.
MobileFrontend/J2ME app [edit]
We've finalized on a vendor and are completing the contract.
Mobile support in MediaWiki core [edit]
Patrick Reilly,
Max Semenik, and
Arthur Richards worked on the very ambitious project of moving
MobileFrontend to MediaWiki core. We now have a dedicated set of tasks for the project and have started to process them. Max added modular device detection support to core, and Arthur migrated HTMLForm.
Mobile default for sibling projects [edit]
Arthur Richards defined the specifications for the move of Wikipedia's sister projects to the default redirect to the mobile site for mobile devices. We now have a
schedule posted and have started to reach out to our various communities to let them know of the change.
Improved Mobile Device Detection [edit]
Diederik van Liere and
Patrick Reilly defined the specifications and built an early prototype for the
Apache Device Map project. Over the next months, we'll use it for simple data collection.
Kiwix
- We set up a fully virtualized compilation farm using technologies like Buildbot, Virtualbox and Qemu. This will allow for a better continuous integration and more frequent releases. We have also developed our first proof-of-concept for kiwix-mobile using cordova-qt. Kiwix was featured as "project of the week" on SourceForge the last week of May, which helped us reach the milestone of 25.000 monthly software downloads for the first time.
MediaWiki 1.20/Roadmap [edit]
Git conversion [edit]
Chad Horohoe and
Ryan Lane upgraded Gerrit to version 2.3, which brings a variety of fixes and features; notably, a less cluttered diff interface, and the ability to have the "mediawiki/extensions" meta-repository that holds all extensions be automatically updated. Many new repositories were created for extensions and other uses, including a repository for packaging MediaWiki to easily install on Windows servers. We have also managed to fix our long-standing UTF-8 issues with help from
Marcin Cieślak, so users can all now use Unicode for their commits, comments and usernames. Image changes can now be shown via the UI, rather than being downloaded as a ZIP file to be compared locally (
example).
Brion Vibber has agreed to lead a process for evaluating Gerrit (and possible alternatives), which will conclude in early August.
David Schoonover is currently writing up a list of alternatives to Gerrit which he plans to publish on mediawiki.org and announce on wikitech-l.
Multimedia [edit]
Antoine Musso and the Labs team have unblocked the
deployment prep issues; Labs is now closely tracking production MediaWiki. Most of the features (upload, play, full screen, etc.) are now
in testing, and upload seems to be faster than before as well.
Aaron Schulz and
Ben Hartshorne deployed a new version of the thumbnail handler to Commons, test, test2, and mediawiki.org, that uses our Swift FileBackend code. It should provide us with useful production testing prior to using Swift FileBackend for handling original files. Cleanup of corrupted thumbnails is now finished. Aaron deployed a SiteStats fix that should make uploads much faster and fix some timeout problems. Ben and Aaron will also roll out the FileBackend-based thumbnail handler to the rest of the wikis.
Lua scripting [edit]
OAuth [edit]
Code review management [edit]
We continue to handle the bulk of code review via cross-review among team members plus the
Wikimedia engineering 20% policy for reviewing volunteer code.
Diederik van Liere is working on getting Gerrit stats published so that we can establish a trendline on our backlog. In addition to code review in Gerrit, we continue to keep an eye on Bugzilla,
RFCs and
extensions to review.
Security auditing and response [edit]
QA and testing [edit]
Beta cluster [edit]
Chris McMahon,
Sam Reed,
Antoine Musso, Faidon Liambotis, and
Ryan Lane met in San Francisco the week of May 7 to bootstrap work on this project, kickstarting a process of aligning the configuration with our production cluster. Apache web server instances are now completely configured automatically using Puppet classes. A few key Wikimedia configuration files that were previously managed via private Subversion repository are now managed in a public Git repository. Much work remains to make this a stable testing environment, which will continue in June.
Continuous integration [edit]
Timo Tijhof continued to work on the TestSwarm rewrite. The team is considering moving the continuous integration environment into Wikimedia Labs. The new TestSwarm version will probably be first deployed in the new environment instead of the current environment.
Analytics/Reportcard [edit]
Fabian Kaelin and
Erik Zachte updated the datasets to include April's data, and the whole team contributed to improving the graphs' appearance.
David Schoonover implemented the high-priority requests from people who use Reportcard in their presentations: for example, the front page has been streamlined, and now loads only the "core" graphs. Finally, the team has been working behind the scenes to make the framework behind the Reportcard, named "Limn", a best-of-breed project for general use. While not ready for public consumption, we implemented a GUI for selecting and manipulating datasets, and began work to support multiple visualization types. We now have several staging environments, including both
test and
dev targets. We hope to be in a place to open-source the framework in June.
Analytics/Pageview logging [edit]
Our
plan to improve logging sources (Squid, Varnish, nginx, etc.) includes adding more fields, and also allowing us to add arbitrary fields in the future without breaking features. Changing the field formats of the logging sources requires coordination with the Operations team. The format changes have been
committed, but not yet deployed.
udp-filter
has been modified so that it is more flexible, and a few features have been added as well: it now can geocode and anonymize inline in the same field as the IP address, so that later log parsers don't have to try to detect a new field.
Analytics/Kraken [edit]
Bug management [edit]
Mark Hershberger wrote
a triaging guide and the Engineering Community Team is now encouraging volunteers to use it to respond to new bugs.
Summer of Code 2012/management [edit]
The nine Google Summer of Code students have begun their twelve weeks of design and coding.
Wikimedia Foundation engineering project documentation [edit]
Volunteer coordination and outreach [edit]
Sumana Harihareswara continued to follow up on contacts, recruit new contributors to the Wikimedia tech community, and mentor new contributors. She granted developer access and Gerrit project ownership requests, and planned upcoming events.
Wikimedia engineering 20% policy [edit]
- The Wikidata project is funded and executed by Wikimedia Deutschland.
The team made good progress on their work on interwiki links. The demo system shows the current state of development. They published a draft showing how interwiki links should work in the future, which was amended after the recent work done on the universal language selector. They published another document explaining how data from Wikidata is going to be included in Wikipedia sites, also rewritten based on community feedback. Last, members of the Wikidata team attended a lot of events (like LinuxTag, re:publica and the 2nd ESWC Summer School) and held IRC office hours. At the end of the month, the team met with Foundation staff and community members in Berlin at the Wikidata/RENDER summit to present the work done so far, and discuss important decisions for the future of the project.
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.