Talk:Wikimedia Engineering/2014-15 Goals/Q3

Proposed top priorities

As refined in December 19 meeting. This is still draft language and will be refined with teams and project leads in early January.

Prepare for gradually bringing VisualEditor to wikis where it is currently opt-in.
- Creating the preconditions for a successful release -- strong metrics, clearly defined release criteria, carefully planned community outreach and engagement, an aggressive schedule of triages.
- Resolve key blockers -- performance, critical bugs, must-have functionality.
- Ensure VisualEditor is a great experience across devices.
- No deployment is expected to take place yet, but a stretch goal may be to run a first well-defined pilot.
- This may impact work by the frontend standards group and will certainly strongly impact services/Parsoid prioritization.
- Ori Livneh will likely lead this project, partnering with James Forrester who will continue to provide product and project management.

Test and validate micro-contributions to Wikidata, and pilot one additional micro-contribution feature on the mobile web.
- Testing Wikidata micro-contributions with logged-out users, using pre-populated tasks.
- Working with the team investigating the development of a Wikidata Query Service to ensure it meets the requirement of this effort, should it continue post-validation.
- Pilot a new micro-contribution feature on the mobile web. Article lists that can be easily built and shared are a strong contender.
- Maryana Pinchuk will lead this project, within the established mobile web team operating model.

Solidify the instrumentation and dashboarding pipeline and consistently apply instrumentation across key features.
- Strong emphasis on the VisualEditor release effort and full instrumentation of all edit funnels as a must-have deliverable.
- Clear documentation and support processes for feature instrumentation, including training/workshops.
- Identify and deploy supported solution (including future development) for dashboards that visualize user behavior data and funnels.
- A to-be-developed measure of additional must-have and stretch goals for cross-team adoption of instrumentation. (This is why this is a top priority -- almost every team will be touched by these efforts eventually, and the actual instrumentation work needs to be done by engineers within the teams.)
- Toby Negrin has nominated Dan Andreescu to lead this project, within the established analytics team operating model.

Beyond these three priorities, the following have been discussed:

SOA Auth: This work will likely begin, but we are not ready to declare it a top priority as we cannot give it sufficient dedicated leadership and support to push for very ambitious goals in Q3.

Wikidata Query Service: This work is ongoing. It is still a small team that's performing initial investigation and prototyping, and the team has requested more dedicated help both from a product management perspective, and potentially in terms of additional engineering resources. We may yet elevate this to an official top priority for Q3 (we can potentially do so mid-quarter, as well), but want to be careful to only do this if the level of support we can provide is consistent with the perceived importance of the project. If not, we will focus effort on establishing the preconditions for increased effort in this domain in Q4.

Intra-departmental focus areas for engineering and product/strategy. Whether or how to articulate these will be refined by the VPs in their respective areas.

Strong Candidates

A/B and multi-variate testing infrastructure. Create better foundations for testing, comparing and validating user experience changes.
- Why: Must-have to increase product development velocity (though should be driven by concrete product needs for Q3).
- Why maybe not: Effective instrumentation has become a more urgent need for product teams.
Fundraising tech refactor. Make fr-tech less of an island internally and ensure the team can add and rotate team members.
- Why: Fr-tech needs to staff up to support new integrations (e.g. mobile), and to support that, and create more team sustainability, this has long been identified as a must-have.
- Why maybe not: fr-tech is very busy with post-fundraiser cleanup in the first quarter; opportunities to collaborate around A/B testing/CentralNotice don't manifest until Q4.

Beta Cluster

Why

Must-have to improve product quality.
Make deployments less painful
Improve our QA.
Deploying is still hard, Beta Cluster is part of that

What

Overarching goal: Have realworld metrics and monitoring in place
- To judge effectiveness of the below goals
Reconcile HHVM and Beta Cluster
- See: HHVM fcgi restart during scap runs cause 503s (and failed tests)
- Required crossover: MW Core, Ops
Dev code pipeline
- A "nightly" build tested on Beta Cluster would give us a higher degree of certainty (and a firmer commitment to test against) before deploying
- see also: True code pipeline
- Required crossover: MW Core (minimal, hopefully)
Beta Cluster/Prod reconcilization
- Identifying (we already have many) the rough edges between Prod and BC and addressing them
- This is purposefully non-deterministic for "done-ness" at this point, but we would timebox the investigation portion to make it deterministic/do-able
- Required crossover: Ops, MW Core
Puppet code pipeline ("stretch-goal")
- Need a method to allow ops/others to test their code changes on Beta Cluster before prod deployment
- Why:
  - Will ensure "upstream" puppet changes don't kill Beta Cluster accidentally (and unknowingly)
  - Will allow for fuller testing of puppet changes
- Required crossover: Ops

Mobile: new contributions

Objective

Our objective this quarter is to release and test two new features aimed at engaging readers in new forms of contribution and curation, in order to validate the hypothesis that users who may not be interested in editing on mobile can and will add value to our projects in other ways.

WikiGrok is a test of a new reader contribution framework that does not involve text editing or input and aggregates multiple contributions to produce high-quality results. The goal this quarter is to launch and test this framework with readers, in order to determine which design/user experience produces high engagement and which aggregation metrics produce high quality results. Long-term, the goal of this feature and others that may use this framework is to engage many readers to contribute structured data, which helps grow the Wikidata project and, in turn, unlocks the ability to create new ways to read and edit Wikimedia content in the future.
Collections is a pilot of a new reader curation project. Its goal is to allow mobile users to create collections of articles and share these collections with other readers. The long-term goal is to use the data from this pilot to inform further work on giving users tools to create new browsable/shareable content by remixing and reusing existing content.

Who benefits

The Wikidata community will receive many new contributions to their project, and we will work together to ensure that these contributions are high quality.
Mobile and other internal/external development teams interested in reader contributions will benefit from a backend framework for aggregating many contributions, as well as the lessons we learn from testing different interfaces and workflows for contribution.
The Wikimedia Foundation will learn more about how users approach sharing our content, which is an important readership area that we have not extensively studied.
Mobile readers who are not interested in editing content will be empowered to contribute to our projects for the first time.
Wikipedia editors who are not interested in editing on mobile will be empowered to contribute in other ways.

How we measure success

WikiGrok: we will be testing along three primary variables (UX/UI, aggregation metrics, and persona) to determine which lead to the highest engagement-to-response-quality ratio. By the end of the quarter, we will determine which combination of the following factors produces the most engagement and the highest quality results:
- UX/UI: positioning of widget, workflow, and design
- Aggregation metrics: how many unique submissions need to be pooled per question
- Persona: casual engagement (one-time contribution) from many readers vs. repeated engagement (many contributions) from a smaller subset of power readers

External dependencies

Research and Analytics to support A/B test instrumentation, testing plan, data QA, and data analysis
Wikidata Query Service team to ensure that the complex querying solution they build works for our use-cases now and in the future

Expected completion

Wikidata query service

Why

The two mobile teams (apps and web) are highly dependent on having the ability to serve content in a more modular way to readers and editors. Having an easy-to-query central repository of structured data that supplements Wikipedia articles (e.g. Wikidata) makes it possible to create entirely new ways of presenting content to users, but Wikidata currently lacks the infrastructure to fetch anything but very simple information at scale. In order to be able to continue building features like WikiGrok, create easy-to-edit mobile infoboxes from Wikidata, and continue to explore new ways of breaking up content for users on small screens, we need to build this service.

What

Product Instrumentation and Visualization

Why

Management wants us to be more data driven in our feature development and assessments. We have a basic logging and visualization pipeline (Event Logging + Limn) that is functional but knowledge in how to use it is inconsistent across the organization. We need to collaborate with the engineering teams to make sure they understand how to use this pipeline. In addition, we need to ensure that instrumentation is consistent across the organization and that the tools have desired features and necessary capacity, operability and reliability.

What

Set up training, documentation, consultation and office hours for Event Logging (Jan)
Review and harmonize schemas
Performance test and address system throughput, SPOFs, monitoring, etc
Create/enhance beta system and event QA environment/best practices
Identify/implement new features (e.g data primitives)
Establish visualization roadmap

Candidates

Latest comment: 10 years ago1 comment1 person in discussion

Phabricator for code review, phase out Gerrit.
- Why maybe not: This may be premature as we'll still be in the early days of using Phabricator as a PM tool, and may not yet fully understand the requirements. It may also contend for resources with critical test infrastructure work.
- Also, the team has been working full time at full speed with a high pressure for delivering, and now it's time to take it a bit easier, work on other things, and let Phabricator consolidate. The Code Review project may start, but slower, without being a top priority.--Qgil-WMF (talk) 07:33, 4 November 2014 (UTC)Reply
Front-end standardization / UX standardization cont'd as a top priority.
- Why: We'll have only converted one or at best a few interfaces; if we don't want the interfaces to remain disjoint and instead get the benefit, we'll need to make a concerted push to roll it out to all of core and major extensions.
- Why maybe not: We're now establishing a lot of technical foundations and working parameters for the team; we may not need to continue it as a top priority to keep the momentum going.
Library infrastructure work cont'd.
- Why: So it doesn't immediately fall by the wayside after making some initial efforts.
UX Testing Environment (REFLEX)
- Why: Our production environment is not set up for running 10/100/1000 users through a battery of tasks without one user invalidating tasks of subsequent users, and sandboxed environment that reflects the current production state of the sites, with or without modification and the addition of usability tracking software is necessary for the success of quantifiably measured qualitative analysis of our prototypes and productions features. Ops, platform, analytics, and design research would need to work together for this. Technical details
Logging the user environment (FINCH)
- Why: Wikimedia Product and Design cannot currently make informed decisions about some aspects of our users experience of our sites, such as access device, platform, and screen size. Critical information about connection speed, geolocation and technology availability are not available or not easily accessible to decision makers in the Product group.
MicroSurveys
- Why: Ability to quickly identify problems or positive sentiment around existing systems (whether static or in development). These would be brief, few-question surveys to gain quantifiable insight into the usability of current systems for contributors. Discussed as testing out mobile version as well. Good for getting information from both readers and contributors. The two main approaches are:
  1. Overall satisfaction (aka net promoter scores): One-click method for users to tell you whether they are generally satisfied. Good for gathering baseline data, good for comparing different products (e.g., people reading on Mobile apps vs Mobile Web), good for identifying the existence of "skunk in the doorway" problems (problems you didn't know to ask about, because you couldn't see the software from the user's perspective, i.e., that there was a serious problem in the user's path).
  2. Structured feedback: Click here to make a suggestion, click here to file a bug report, click here to say that it's working okay for you. This is still simple from the user's perspective, while being more informative from the product manager's perspective.
Localization cache do-over
- Why: It's currently the largest bottleneck to quick deployments.
Beta Popups and/or Echo Notifications for Beta Features
- Why: We need to build out better notification tools to ensure users are properly informed of upcoming changes. This could be used to notify when a feature will be launched into Beta as well as when it will be launched into Production as a quick way to notify everyone that a change is imminent

Testing overhaul
- Why: Our state of testing is shameful, it is slowing us down and prevents even our smartest developers to get their changes merged in. A few examples: operations/puppet barely have any tests causing a review burden on ops shoulders. Mediawiki/core tests requires you to install it + setup the backends and is painfully slow. Operations/mediawiki-config doesn't validate any site configurations, luckily we have a staging area to confirm. A first step would be to have true unit tests for the most important projects (site config, puppet, mediawiki/core) and aim for a good coverage over the course of 2015.

Progressive rollout of features
- Why: to actually deploy what we want to a reduced set of our user base w/o having to have the feature "enabled" via a beta feature or fully productionized for all wikis. Example: we want ContentTranslation deployed only to Catalan Wikipedia or to logged in users of Catalan Wikipedia, making sure no other users can enable the feature (as we know it is not ready for them)

Browser reports
- Why: to know what our users use to browse the desktop and mobile sites and plan features accordingly

Power user tooling
- Why: Our power users make a disproportionately large amount of our content. We should spend some time refining and improving the tools they use so that they can be the most effective they can possibly be.
- Draft proposal: User:Deskana (WMF)/Power user tools development team
Skin unification
- Why: Our users are interacting with two completely separate user interfaces when moving between mobile and desktop devices, and features designed for each of these platforms are not compatible with each other. By improving the skin system to be more modular while targeting both mobile and desktop, a single responsive skin can be converged on and more features can be made available to both targets.

General thoughts

Latest comment: 10 years ago4 comments4 people in discussion

Are some of these still too specific, too project-focused? E.g. could fr-tech refactor and library infrastructure work be collapsed into cross-organizational efforts to reduce technical debt, to break up monolithic code, to increase test coverage, etc.?--Erik Moeller (WMF) (talk) 08:02, 17 October 2014 (UTC)Reply
- To me fr-tech is one of the few projects that absolutely needs to happen and always get shuffled to the background. Let's give them the resources they need to succeed given the critical need of those systems supporting everything else that we do Tfinc (talk) 18:45, 17 October 2014 (UTC)Reply
- We should be careful about repaying technical debt purely for the sake of repaying technical debt. IMO it is preferable to tie new stuff to the paydown of debt, or at least figure out how to tie the paydown of technical debt to some concrete deliverable - this a) helps us make sure that we're paying off debt in some kind of priority order [eg focusing on the stuff that matters first] while b) not totally sacrificing creating things of value and c) helps ensure that we're paying off the debt in the right way [eg not doing a massive refactor, only to discover when the next new feature comes up we should have refactored a little differently] Awjrichards (WMF) (talk) 18:50, 31 October 2014 (UTC)Reply
- I wonder if we could tie repaying the tech debt into the work of a new integration, which is also high priority for fr-tech next quarter in order to meet the fundraising goal this FY. AGomez (WMF) (talk) 19:31, 12 December 2014 (UTC)Reply