Wikimedia Release Engineering Team/Checkin archive/2024-03-13
2023-04-13
[edit]π Wins/winterrogation
[edit]- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
- Mar 2024
- Nightly security patch failures updating phabricator tasks merged, ready to release
- Merged deploys-in-progress reset script
- Two repos have patches for git-fat β git-lfs
- scap: replaced canary swagger checks with test server httpbb checks
- Phorge integration with GitLab in its third round of review
- GitLab webhooks also still going, looks like it'll go through
- People like scap backport - more patches, fewer things typed into terminals.
- Security patch notification now working!
- GitLab webhooks have a more accurate regex for "Bug: TXX"
- Foreachwiki in beta
- Getting rid of the /srv/mediawiki/php symlink
- Upgraded GitLab k8s/cloud cluster to new k8s version and documented the process
Stuff from last time
[edit]π Vacations/Important dates
[edit]- https://office.wikimedia.org/wiki/HR_Corner/Holiday_List#2024
- https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar
- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off (page needs updating for Dayforce)
Mar 2024
[edit]- 29 Feb, 1st Mar, 4th Mar - 8th Mar - Antoine
- 14 Marβ14 May: Dan
- 29 Mar: Brennen, Jeena
Apr 2024
[edit]- Mon 22 Apr: Global holiday, all staff
- 26 Apr: Brennen (tentative)
- Fri 05 AprβFri 12 Apr -- Tyler, eclipse viewing
May 2024
[edit]- Mon 27 May: Memorial Day (US staff with reqs)
Future
[edit]- A few days around July 4: Brennen
- 25 Aug - 03 Sep: Brennen
π₯π Train
[edit]- https://tools.wmflabs.org/versions/
- https://train-blockers.toolforge.org/
- https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar
Rotation
[edit]- 3 Dec β 1.42.0-wmf.8 β No Train offsite
- 11 Dec β 1.42.0-wmf.9 β Brennen + Antoine (Jaime out)
- 18 Dec β 1.42.0-wmf.10 β Ahmon + Brennen (Jaime out)
- 25 Dec β 1.42.0-wmf.11 β No Train
- 1 Jan β 1.42.0-wmf.12 β Dan + Ahmon (Jaime out)
- 8 Jan β 1.42.0-wmf.13 β Jeena + Dan (Jaime out)
- 15 Jan β 1.42.0-wmf.14 β Jaime + Jeena
- 22 Jan β 1.42.0-wmf.15 β Antoine + Jaime
- 29 Jan β 1.42.0-wmf.16 β Ahmon + Antoine(Brennen out WedβFri)
- 05 Feb β 1.42.0-wmf.17 β Brennen + Ahmon
- 12 Feb β 1.42.0-wmf.18 β Brennen+Antoine (Friday)
- 19 Feb β 1.42.0-wmf.19 β Jeena+Brennnen
- 26 Feb β 1.42.0-wmf.20 β Dan + Jeena
- 04 Mar β 1.42.0-wmf.21 β Jaime + Dan (Antoine out)
People for train: Ahmon, Antoine, Brennen, Jeena, Jaime
- 11 Mar β 1.42.0-wmf.22 β Antoine + Jaime (Dan out)
- 18 Mar β 1.42.0-wmf.23 β Ahmon + Antoine
- 25 Mar β 1.42.0-wmf.24 β Jeena + Ahmon
- 1 Apr β 1.42.0-wmf.25 β Jaime + Jeena
- 8 Apri β 1.42.0-wmf.26 β Antoine + Jaime
- 15 Apr β 1.42.0-wmf.27 β Ahmon + Antoine
- 22 Apr β 1.42.0-wmf.28 β Brennen + Ahmon (Global holiday Monday; Brennen out Friday)
Team Discussions
[edit]Annual planning
[edit]Meta page: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2024-2025/Goals
- How this works: Goals β Buckets β Objectives β KRs β Hypotheses
- Where our work fits: Infrastructure β WikiExperiences β WE6 Developer Services β WE6.2
- WE6:
Technical staff and volunteer developers have the tools they need to effectively support the Wikimedia projects
- WE6.2: By Q4, complete an intervention and run an experiment each aimed at providing maintainable, targeted environments to serve developers' high-priority testing needs
Experiment: the goal is we learn some things Intervention: we make a thing based on stuff we learned
WE6.2 Long version
[edit]Developers and users depend on the Wikimedia Beta Cluster (beta) to catch bugs before they affect users. Over time, the uses of beta have grown and come into conflictβ-the uses are too diverse to fit in a single environment. We will perform one intervention and conduct one experiment each aimed at replacing a single high-priority testing need currently fulfilled by beta with a maintainable alternative environment that better serves each use case's needs.
Hypotheses-areas:
- Experiment: Group -1
- Intervention: Catalyst
Discussion of our hypotheis (alongside ServiceOps):
- Rollback faster
- Smaller, single-version images
- Wikiversions should be config rather than code (no deploy needed)
- Continuous deployment to test wikis
- ServiceOps open to the idea of testwikis being the victim here
- We don't know how caching works when it's updated every minute
- Social change here, working closely with developers to change expectations
- User interface challenges: ssh to server, lots of output to interpret, we can present things to be less-scary, web-ui would be really awesome
- What's scary about deploys now is what's happening in production and what do I have to do about it as a deployer?
- Logging and monitoring and alerts exposed in a way for developers to feel confident deploying themselves vs speeding up
- Something about making the summary of the state of production more visible
Framing that might make sense, post-discussion:
Hypothesis one: group -1
- Lots of work falling in ServiceOps, our work is building single-version images (+ wikiversion/mw/config work)
- Single version makes actual deployment faster
- Wikiversions outside of code means fewer deploys (makes deployment faster)
- Draft hypothesis: If we build a single version container image and experiment to move wiki-to-verison routing outside of code deployment, we'll
Hypothesis two: speeding time to deploy
- Lots of work in our team, little work in the ServiceOps space
- Making it less scary to deploy: rollback, web ui, giving deployers an easy way to see what's happening in production, making it obvious what to do about it