Jump to content

Wikimedia Release Engineering Team/Group -1

From mediawiki.org

Group -1 is a multi-phase project that hopes to create an environment for both manual and automated integration and regression testing of MediaWiki changes in advance of those same changes progressing through the Deployment train process to the Wikimedia movement's content wikis like Wikidata and English Wikipedia.

The project has its roots in discussions about T215217: deployment-prep (beta cluster): Code stewardship request and the needs of the Wikimedia Quality and Test Engineering Team. The work is currently associated with the 2024-2025 Annual Plan's Wiki Experiences (WE) WE6 objective's WE6.2 key result:

By the end of Q4, enhance an existing project and perform at least two experiments aimed at providing maintainable, targeted environments moving us towards safe, semi-continuous delivery.

Hypotheses

[edit]

If we deploy MediaWiki multiple times per day into a contained area of production we will create the most "production like" environment for QTE staff to use for manual exploratory testing and automated regression checks prior to train deployment to major content wikis.

Implementation will progress via a series of smaller hypotheses which will eventually connect to realize the overarching hypothesis. This approach is being used to avoid attempting to track progress against a single "boil the ocean" hypothesis that could take a year or more to reach an easily measurable state.

[WE6.2.1] Publish pre-train single version containers

[edit]

If we publish a versioned build of MediaWiki, extensions, skins, and Wikimedia configuration at least once per day we will uncover new constraints and establish a baseline of wallclock time needed to perform a build.

Step 0 towards the long term goal of being capable of continuous delivery (CD) into production is being able to deliver faster than the current weekly train process. A daily process would be approximately 3-4 times faster than our current production delivery cadence. We currently envision our eventual capability goal as being able to deliver every 15 minutes. Setting the initial goal two orders of magnitude higher (once per 1440 minutes vs once per 15 minutes) will still expose us to a number of real-world constraints that are not addressed by current workflows. We expect to uncover more details about challenges that will arise in continuing to accelerate the pace of delivery without tipping too quickly into extreme difficulty that could endanger our ability to use an iterative development model.

We are explicitly not constraining where this publishing workflow will be measured at this time. We expect the SRE groups who will need to be involved in deploying into a wikikube environment to be occupied by other goals in the initial months of FY24/25. We are not currently certain what new capabilities would need to be produced to target the current beta cluster shared environment or that doing so would be of long term benefit to the goal, team, or projects. We do expect to deliver beyond a single user development environment and will be able to provide more details as design progresses narrowing the cone of uncertainty for the overall project.

This hypothesis has been declared successfully completed. See our 2024-10-31 progress report for the writeup of what we accomplished and the lessons learned along the way.

[WE6.2.6] Create design document for Group -1 deployment

[edit]

If we gather feedback from QTE, SRE, and individuals with domain specific knowledge and use their feedback to write a design document for deploying and using the wmf/next OCI container, then we will reduce friction when we start deploying that container.

We have a build process that is producing a wmf/next branch and related OCI image daily. That image is not yet being used anywhere however. The next major implementation milestone is to deploy the image into production somewhere with config to serve one or more wikis and edge routing to bring traffic to the deployment. This work will need input from a number of teams and individuals who will have various concerns and constraints that will need to be addressed. Finding consensus on known technical and social questions before attempting to implement the deployment process and its related config should reduce conflicts and confusion for everyone.

Goals

[edit]

Workflow goals

[edit]
  • Enable testing of pre-train ("next") branches of MediaWiki, skins, and extensions in a stable environment where newly discovered defects are more likely to be the result of the next branches than problems with support services or configuration in the environment itself.

Technical goals

[edit]
  • Yes Done Automate MediaWiki OCI image creation based on a timer or similar trigger.
  • Create an environment in the production network for running pre-train MediaWiki versions.
  • Automate MediaWiki OCI container deployment into the pre-train environment.
  • Yes Done Enable overriding any staged wikiversions.json when scap is determining which MediaWiki versions need to be included in a container (allow, but do not require single version builds).
  • Yes Done Enable overriding an in-container wikiversions.json with a hard coded MediaWiki version inside of the container.
  • Yes Done Enable creation of MediaWiki containers from arbitrary staging directories so that a single deployment or CI server can be used to build as many variant containers as we find need for.

Out of scope

[edit]

Workflow out of scope

[edit]
  • Enabling testing of pre-release services and configuration not managed by scap is out of scope.

Technical out of scope

[edit]
  • Replacing the weekly train progression with continuous delivery to all wikis is out of scope.
  • Building an image for every commit to a Train deployed repo/submodule is out of scope.
  • Keeping deployment-prep working in the face of production's migration to Kubernetes and containers as the MediaWiki deployment and runtime solution is out of scope.
  • Building a chain of images starting from public files only to produce images that can be used outside of production is out of scope.
  • Reimagining how configuration is delivered into MediaWiki containers is out of scope.
  • Runtime support for single version images beyond minimum functionality needed to support building and operating Group -1 directly is out of scope.
[edit]

There are a number of active, planned, and imagined projects which have some intersectionality with the Group -1 concept and implementation. When possible we should try to avoid becoming a blocker to these projects. We should also avoid making systemic changes that will cause us future headaches we can foresee today.

  • WE6.2.5 Move multiversion routing outside of the MediaWiki containers to unblock single version containers
  • WE5.4.2 PHP runtime upgrade process in a containerized world

Unknowns / open questions

[edit]
  • Will QTE folks need any shell access or other special permissions to the containers running Group -1 wikis?
  • If we move existing wikis (testwiki, test2wiki, officewiki, mediawikiwiki, wikitech, etc) to Group -1, will mwscript whatever --wiki=movedwiki work transparently or will these wikis need to be addressed from a special place?

Reports

[edit]