Engineering metrics in August:
Major news in August include:
Wikipedia Engineering Meetup (15 August 2012, San Francisco, USA)
- Approximately 100 people attended the first Wikipedia Engineering Meetup in San Francisco, in a series meant to showcase Wikimedia's interesting engineering problems and products to the local developer community. Tentatively, the meetup will happen every two months at the Wikimedia offices in San Francisco, and will consist of three 15-minute engineering presentations, followed by a question & answer period bracketed by mingling. The inaugural meetup featured talks about Mobile engineering, Analytics and the VisualEditor.
Wikimedia's internationalization and mobile teams are tentatively planning a volunteer outreach event in Bangalore, India, November 9â11. More information will come in September.
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
Site infrastructure
- Continuing from his earlier MySQL work, Asher Feldman built additional MySQL servers for each of the clusters in Ashburn, all in preparation for the primary data center migration in the coming quarter. In the Tampa datacenter, he added a new server to the English Wikipedia (en.wp) cluster and replaced the en.wp master with newer hardware. A database tree chart provides the latest information on our database clusters.
- Thanks to Varnish Software support, we have a new build of Varnish that comes with persistent cache and the video streaming bug fix. Mark Bergsma tested the build on one of the mobile Varnish servers, and so far it has been stable. In the coming days, Mark will be updating the 'upload' Varnish cluster at Ashurn (Eqiad) and move traffic through them.
- Mark has also successfully updated and deployed the NetApp storage servers and enabled replication from Tampa to Ashburn. He started working on migrating some of the systems that are mounting to nfs1 to this new server. With this, Mark has resolved another critical path item on the migration to the new primary data center. In addition, Jeff Green started using the nas1-a to archive the Fundraising banner logs.
Network Infrastructure
- The usual traffic surge due to the new school year caused an increase in package loss on our Tampa internal network. With Chris Johnson's help, Mark upgraded the links between the racks. Earlier this month, Leslie Carr and Chris installed a new passive optics (CWDM) system between the 2 floors of the Tampa datacenter hosting our servers, giving us effectively a 4X capacity increase.
Fundraising Infrastructure
- Jeff continues to make progress in the Fundraising infrastructure buildup at Ashburn (EQIAD). With Leslie's help, the new firewall was set up and Jeff deployed a build host, a logging host, the application cluster and built the pxeboot, preseed and puppet configurations. He has also enabled nagios-nsca monitoring for those new hosts.
Object Store/Swift
- 'Originals' were successfully copied over to the Swift cluster from the ms7 (a NFS filer for images). In addition to serving thumbnails (which was completed last month), Swift is now also the primary object store for Images and multimedia contents. In the current setup, MediaWiki reads from Swift only, but writes to both the Swift cluster and the legacy NFS servers (ms5 & ms7). In the coming months, we will be disabling ms5 & ms7, and run solely on Swift.
Wikimedia Labs
- This month was mostly spent on upgrading all of the Labs infrastructure. OpenStack nova and glance were upgraded to the essex release. The keystone service was added and now handles all authentication for Labs-related OpenStack services. OpenStackManager was upgraded to support keystone, use the OpenStack API rather than the EC2 API, and to have multi-region support, in anticipation of the new region we'll be bringing up in Eqiad. Testing of ceph as a replacement of gluster for project storage continued during this month; more testing is required. A lot of puppet work has been done to start moving our spaghetti code-style repository into modules.
Data Dumps
- We've been focusing on the media infrastructure, working on the migration to Swift, and also taking a hard look at scaled media usage and storage. Since scaled media (thumbnails) could be regenerated at will from the original, we are going to evaluate treating thumb storage as a medium-to-long term cache rather than permanent disk storage as we have been doing. Running the numbers on existing thumbs turned up some interesting results. We're still bringing mirrors online; we've gotten all the hardware and network issues worked out with WANSecurity and have started copying over the data. They'll have files most mirrors don't host: page view files, archives, and more, as well as a full copy of our media files.
VisualEditor [edit]
In August, the team focused on overhauling the code design of VisualEditor so that it is more modular and easier to extend. This involves creating and documenting a number of formal APIs at each point in the architecture, that means a developer does not have to understand the entire code base to be able to add new features. The early version of the VisualEditor on mediawiki.org was updated twice (
wmf9 and
wmf10), fixing a number of bugs, as well as adding a much-improved link inspector to help users build links, and a save dialog that better guides users on what to do.
Parsoid [edit]
The Parsoid team reached a major milestone in August by implementing a template output encapsulation algorithm, and started to use it to support expanded template round-tripping. In parallel with this and the usual smaller tweaks, work on a C++ port of the parser was started. The port is expected to allow an efficient integration with PHP and Lua, improve performance and allow the parallelization of the parser in the longer term.
Article feedback [edit]
This month, we developed
a range of new features for
Article Feedback, which is now deployed on 10% of the English Encyclopedia. Improvements include the ability to
view feedback from my watched pages,
hide my posts,
give feedback on help pages â as well as enable editors to
clear all flags and administrators to
protect articles to limit feedback on controversial pages. These and other features can be tested on this sample
article feedback page or on the
central feedback page (please report any bugs on
Bugzilla). For more information about this tool, check our
project overview. We are now in our productization phase (support for more platforms, scalability, code re-factoring, localization, metrics, mobile) and are aiming for a full release to 100% of English Wikipedia by the end of October 2012 â with other wiki projects starting later this year. This new tool was created in collaboration with the Wikipedia community and developed by
Fabrice Florin,
Matthias Mullie,
Pau Giner,
Ryan Kaldari,
Roan Kattouw,
Oliver Keyes,
Chris McMahon,
Benny Situ,
Dario Taraborelli,
Howie Fung and
Terry Chay, in association with OmniTI.
Page Curation [edit]
This month, we deployed a 'pre-release' version of
Page Curation on the English Wikipedia. This new product includes two main features: 1) the
New Pages Feed, a dynamic list of new pages for review by community patrollers; and 2) the
Curation Toolbar, an optional panel on article pages, which enables editors to quickly review these pages. The Curation Toolbar provides a variety of tools that let users get page info, mark a page as reviewed, tag it, mark it for deletion, send WikiLove to page creators â or jump to the next page on the list. This month, we completed development on final features, such as the ability to send a personal note to page creators, as well as special logs, links and templates, as outlined in this
help page and these
project slides. We are now preparing for a full release of Page Curation on the English Wikipedia at the end of September 2012. Check out the
current beta version on the English Wikipedia, as well as the
latest version on Wikimedia Labs (confirmed editors can click "Review" to curate any article on the
New Pages Feed). Please
report any bugs on Bugzilla. Formerly called 'Page Triage', this new tool was designed in close collaboration with the Wikipedia community and developed by
Ryan Kaldari,
Benny Situ,
Fabrice Florin,
Oliver Keyes,
Brandon Harris,
Vibha Bamba,
Terry Chay and
Howie Fung.
Micro Design Improvements [edit]
This month saw the creation of the Micro Design Improvements team, an ad-hoc group of staffers who look at small but useful design improvements to make to MediaWiki.
Vibha Bamba,
Oliver Keyes and
Munaf Assaf (with assistance from
Howie Fung) worked on the design for our first feature, which simplifies the "edit" window. The team is very grateful to
Terry Chay for securing technical assistance in the form of
Rob Moen, who has agreed to donate his 20% time to working on this project. In the coming month, we plan to talk to the community about this feature, deploy it, and work on
more of the items on our to-do list; if you have any thoughts about our current work or ideas for future projects, please leave us a note on the
project talkpage.
Editor engagement experiments [edit]
We deployed and ran the first iteration of
post-edit feedback, testing whether various types of positive feedback after submission of an edit increase the productivity and retention of Wikipedia editors. (The results will be publicized soon.) We are currently working on the next iteration of post-edit feedback and on a
new experiment which centers around the account creation process. We've also deployed click-tracking to the English Wikipedia
community portal, account creation page, and the article edit form, and devised a tool for generating reports from the raw log data. Working with Asher Feldman, we've also architected an alternative
data pipeline for event tracking, and begun its deployment.
ResourceLoader [edit]
After the sprint in July, there was no notable progress as the team were busy with other urgent projects. There was
the start of a community discussion about where global gadgets will be hosted for access across the Wikimedia cluster, and about their licensing (as they have generally been caught by the content license, which is less suitable for code).
Echo_(Notifications) [edit]
Andrew Garrett deployed Echo to MediaWiki.org, but it was temporarily turned off pending
a bug that has recently been fixed. Vibha Bamba is working on some of the UI backlog.
Flow Portal/Project information [edit]
Work on Flow will officially start in January. In the meantime, preparatory work will focus on
Database sharding.
2012 Wikimedia fundraiser [edit]
The fundraising team completed 3 very successful sprints, completing more work in each sprint than some of the previous sprints combined (
Sprint 7: Auditing and Reconciliation;
Sprint 8: Amazon, and a bunch of other random stuff; and
Sprint 9: Adyen, Amazon wrap-up, and Listeners). During the sprints, the team integrated with Amazon Payments, added features to CiviCRM to enable the settlement of donations in multiple currencies, added features (including the beginning of an API) and made bugfixes to CentralNotice, discovered and dealt with an issue in the global credit card processing system, and began integration on a new payment processor that will give the fundraising team access to additional payment methods around the world.
Internationalization and localization tools [edit]
The team continued to work on the
Universal Language Selector (ULS): the display settings dialog was completed and is now able to show and set WebFonts, similarly to the WebFonts extension which will be phased out once the ULS is deployed. The lists of languages were tweaked to emphasize those likely to be chosen by the user, based on their location and past selections.
Translation memory was deployed on all Wikimedia sites using the Translate extension, and CLDR (
Common Locale Data Repository) plurals support was merged into the core master.
User experience testing of the
Translate extension is in progress. Initial analysis for
i18n metrics was also completed and published. The team conducted its monthly office hours, a bug triage and
development showcase.
Milkshake [edit]
Development on Project Milkshake continued at a lower priority due to the focus on the
Universal Language Selector this month. We are getting some basic blocks together in our
GitHub repositories.
Wiki Loves Monuments mobile application [edit]
The mobile team released three new betas for the WLM app and published the last one on
Google Play. We finalized many new features like saving for later, showing current location, and cleaned up data issues. The contest started on September 1st.
Wikipedia Zero [edit]
Configuration of partner data is now more configurable and various additional partners are now in testing mode. List of launches to follow.
MobileFrontend/J2ME app [edit]
Open Path delivered numerous new builds of the Wikipedia J2ME app this month. Patrick Reilly and the Global Development team did internal testing to validate that they were performing as expected. We're now feature-complete and spending cycles on making the app perform better on low memory devices. We expect to complete this project in a few weeks.
Wikipedia over SMS & USSD [edit]
Production hardware is now in place and running the latest builds via puppet configuration.
Kiwix
- Our work mostly focused on the 0.9 RC2 (see CHANGELOG) which should be released soon after we port kiwix-serve to MS/Windows. Kiwix UI localization was improved, thanks to the translatewiki.net Translation Rally; four new languages have been added. For the ZIM autobuild project, we have migrated the server to a datacenter in Zurich, Switzerland, and coding work is ongoing. We are planning our next projects and seeking volunteer help.
MediaWiki 1.20/Roadmap [edit]
Git conversion [edit]
Chad Horohoe spent a good amount of time fixing issues upstream, including two big improvements to the project listing page. He also cleaned up the Gerrit installation on Labs to more accurately mirror productionâalso cleaning up the production setup along the way. Initial research was done into replication to GitHub. Finally, Gerrit 2.5 is nearing release, which brings a bunch of new features (like plugins) and fixes. The Labs instance of Gerrit is already running the release candidate. In September, we'll be upgrading to Gerrit 2.5 and getting repositories replicated out to GitHub.
Multimedia [edit]
In August we
concentrated on
testing on the testwiki, and found some issues that need addressing. The project is on hold for now, but we expect to resume in September. All Wikimedia sites are now using Swift as the primary storage mechanism for multimedia files such as images (both original images as well as image thumbnails). We continue to write images to our old NFS server as well, though we plan to turn this off in September. Some specialized extensions still use the old NFS server, such as the Math and Timeline extensions. These will be migrated to Swift soon (tentatively in September).
Lua scripting [edit]
The Scribunto extension has been deployed to test2.wikipedia.org and www.mediawiki.org, and several editors are porting existing templates such as Cite over to Lua (see
recent changes in the "Module:" namespace)
OAuth [edit]
Site performance and architecture [edit]
In addition to the Lua work, Tim Starling did some investigation of parallel parsing, but that project may go on the backburner until after
Parsoid goes into production. Tim Starling wrote a new Redis-based client for session handling. This will be important for the Virginia Datacenter Migration.
Admin tools development [edit]
Chris Steipp added two new major features to the
AbuseFilter extension, global rules and global throttling. Code review was done by
Tim Starling and the changesets were merged successfully. These features will allow the creation of filters that apply to all Wikimedia wikis, which is effective for stopping cross-wiki spambots.
Jack Phoenix released the
Phalanx extension and began working on making it suitable for deployment on Wikimedia servers. During the rest of 2012, the team will work on through
their roadmap: CentralAuth mass account locking, improving, stabilizing and reviewing Phalanx, and evaluating the effectiveness of the current CAPTCHA system and possible replacements for it.
Code review management [edit]
The analytics team released
code review graphs, and Brian Wolff created a tool showing a
view of unmerged patchsets and a "Wall of Shame" for authors with several patchsets requiring improvement. Both tools helped inform the
discussion about the code review situation. Sumana Harihareswara
encouraged authors to take steps to get their code reviewed faster, and actively requested reviews for many submissions.
Security auditing and response [edit]
QA and testing [edit]
This month saw an emphasis on hiring, with excellent candidates soon to be hired for all the positions that will be closely related to QA. With
AFTv5 in place in production, testing focus shifted to
NewPagesFeed and Page Curation Toolbar. Due to conflicts of holidays, vacations, time of year, meetings, and general complications, we decided not to hold an explicit community test event for NewPagesFeed/Curation, but test environments and
a test plan will be available for those interested to explore this new feature. NOTE: announcement for QA Engineer and possibly Mobile QA will have been made by the time this is published.
Beta cluster [edit]
The MediaWiki core and its extensions are now automatically updating, and the
beta cluster is now always using the very latest version published under the master branch of each Git repository.
Continuous integration [edit]
The
TitleBlacklist extension is the first MediaWiki extension for which tests are now automatically run via Jenkins. The dashboard is at
https://integration.wikimedia.org/ci/job/Ext-TitleBlacklist/ and build status is sent back to
Gerrit.
Analytics/Reportcard [edit]
The team started preparations to move hosting from Labs to a dedicated server (stat1001), and is investigating how to package a nodejs app.
Analytics/Limn [edit]
Analytics/Kraken [edit]
Bug management [edit]
The Wikimedia Foundation is nearing the end of its hiring process for a new Bug Wrangler, who will lead
triage activities and train volunteers to triage as well. In the interim, volunteers such as Krenair and Thehelpfulone have stepped in to partially fill the gap. Volunteer Matanya Moses is planning to
lead an online bug triage meeting, focusing on unreviewed patches, on September 5th.
Summer of Code 2011/management [edit]
A
wikitech-l discussion of
new user account creation drew former GSoC student Akshay Agarwal out of the woodwork to complete work on his
SignupAPI extension. WMF engineers are planning to collaborate with him this autumn. Also, WMF engineers plan to review student Salvatore Ingala's
Gadgets work as they improve
ResourceLoader this fall.
Summer of Code 2012/management [edit]
In the end, eight of Wikimedia's nine
Summer of Code 2012 students passed, and each posted a
wrapup post on wikitech-l. Their
achievements have already led to improvements in the
Wikimedia Incubator, and improvements to Semantic MediaWiki and UploadWizard will reach users soon. Improvements to SVG translation, realtime editing collaboration, and other functionality are also progressing as the students clean up, merge, and iterate on their summer work.
Volunteer coordination and outreach [edit]
Sumana Harihareswara continued to follow up on contacts, recruit new contributors to the Wikimedia tech community, and mentor newer contributors. She granted
Developer access and Gerrit project ownership requests, and worked on planning for the upcoming Bangalore outreach event. Hiring for a volunteer engineering coordinator to work on volunteer coordination and outreach is almost finished. Community discussion topics included
Git and Gerrit's difficulty,
bug triages,
new mailing lists,
transparency and collaboration in feature design,
MediaWiki releases and
a potential community organization,
GSoC's effectiveness,
code review, and
appreciation for each other.
Wikimedia engineering 20% policy [edit]
Sumana Harihareswara is coordinating WMF engineers' efforts to spend 20% of their work time on code review and other efforts benefiting the entire Wikimedia engineering community. Their highest priorities are fixing new urgent bugs, which surface during deployments, and addressing the
Gerrit merge queue, especially for backlogged components such as Wikidata, UploadWizard, and ProofreadPage. Some participants are concentrating on bug triage, documentation, and the
extensions awaiting review for deployment. Some teams were exempt in August from the 20% policy, because of pressing deadlines.
- The Wikidata project is funded and executed by Wikimedia Deutschland.
The team has been working further on getting the code-base ready for a first deployment. You can try the current status on the demo system. Work focused on diff, undo, migrating to using the Universal Language Selector, and providing useful edit summaries in recent changes and article history. They also published a draft for the export to RDF.
The team published tasks to get started to make it easier to contribute to Wikidata.
Joan Creus released pywikidata, a framework for Wikidata bots.
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.