Wikimedia Release Engineering Team/Checkin archive/2022-06-08
Appearance
2022-06-08
[edit]Inspiration Week 2022
[edit]- 5 weeks away
- July 11th: <https://office.wikimedia.org/wiki/Engineering/Inspiration_Week_2022>
- Please do it!
- I know you have ideas, let's lead here
ERC Update
[edit]- Still working
Team API
[edit]https://docs.google.com/document/d/1KoWCLyhHbekAf8OTmtDnCvNzssCnzEv479et9ljhhvs/edit#
π Wins
[edit]- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
- June '22 edition
- GitLab Sprint summary by Brennen https://phabricator.wikimedia.org/phame/post/view/288/gitlab-a-thon/
- We have GitLab on new metal, and can probably enable GL Container Registry \o/
- We know more about git than we did in May
- Functional scap already self-installed in prod
- JWT presentation!
- Phab deployment has a runbook https://wikitech.wikimedia.org/wiki/Phabricator/Deployment
- scap scap
πΆ Let's these this empty
[edit]π Vacations/Important dates
[edit]- https://office.wikimedia.org/wiki/HR_Corner/Holiday_List#2022
- https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar
- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off
June
[edit]- ~9-10 Jun: Brennen (π₯βΊ) (probably ((maybe)))
- 9 Jun: Antoine
- 15-17 Jun: Dan (πβ°)
- 20 Jun: Juneteenth (observed) (U.S. Staff with Reqs)
- 22 Jun: Brennen out afternoon
- 20-30 Jun: Jaime
July
[edit]- 1 Jul: Jaime; Dan (π)
- 4 Jul: US Independence day (U.S. Staff with Reqs)
- 25-29 Jul: Dancy out
- ~29 Jul: Brennen (πͺ)
August
[edit]- Antoine: some weeks
- 9 Aug: International Day of the Worldβs Indigenous Peoples
- 12 Aug: Brennen (πΈ)
- 27-31 Aug: Brennen (π₯)
September
[edit]- 5 Sept: U.S. Labor Day (U.S. Staff with Reqs)
- 1-6 Sept: Brennen (π₯)
- ~14-18 Sept: Brennen (βΊπͺ)
π₯π Train
[edit]- https://tools.wmflabs.org/versions/
- https://train-blockers.toolforge.org/
- https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar
- 2 May β wmf.10 β Antoine + Brennen
- 9 May - wmf.11 β Skipping for GitLab-a-thon
- 16 May - wmf.12 - Jaime + Antoine
- 23 May - wmf.13 - Ahmon + Jaime (Antoine out)
- 30 May - wmf.14 - Jeena + Ahmon
- 6 Jun - wmf.15 - Dan + Jeena (Brennen out)
- 13 Jun - wmf.16 - Brennen + Jeena (Dan out)
- 20 Jun - wmf.17 - Antoine + Brennen (Jaime out)
- 27 Jun - wmf.18 - Dan + Antoine (Jaime out)
- 4 Jul - wmf.19 - Jaime + Dan
- 11 Jul - wmf.20 - Ahmon + Jaime
- 18 Jul - wmf.21 - Jeena + Ahmon
- 25 Jul - wmf.22 - Brennen + Jeena
- 1 Aug - wmf.23 - Antoine + Brennen
- 8 Aug - wmf.24 - No train (Brennen out)
- 15 Aug - wmf.25
Hiring Update
[edit]- We hired someone!
- Let's fix up our onboarding
- meetings are incoming
ππ€ Sprint planning, planning
[edit]Simple mediawiki rollbacks are one command
[edit]The rollbacks we're considering here are rollbacks of prior `scap backport` runs. Rollbacks of wikiversions should be done by running deploy-promote w/ the desired state.
Current documentation https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Rollback git revert $(git log -1 --format=%H -- wikiversions.json) scap sync-wikiversions 'Revert "group[0|1] wikis to [VERSION]"' git commit --amend git push origin HEAD:refs/for/master%topic=[VERSION],l=Code-Review+2
- ^ update docs for backport deployers ( https://wikitech.wikimedia.org/wiki/Backport_windows/Deployers )
- Command exists in scap
- Command presents a list of history that the user can select
- The history is of prior `scap prep auto` runs. scap prep auto keeps a history. You can view it by running "scap prep --history auto"
- I swear we could use git notes to track those, or maybe a local git tag (recursive reflog?)
- scap3 has .git/DEPLOY_HEAD and git tags like scap/sync/<date>/<increment>
- I swear we could use git notes to track those, or maybe a local git tag (recursive reflog?)
- The history is of prior `scap prep auto` runs. scap prep auto keeps a history. You can view it by running "scap prep --history auto"
- When selected, /srv/mediawiki-staging is restaged according to the selection and scap sync-world is executed.
- Command presents a list of history that the user can select
- Command restores Git state on the deployment server
- scap prep --history auto
- Command does a full sync of the reverted state
- scap sync-world
- NOT IN SCOPE: Revert on Gerrit to be done by user
Current state
[edit]- Scap backport exists
- Stages Gerrit version numbers
- No rollback behavior
- scap prep auto subsystem has a mechanism for recording successful staging
- Jeena is working on: Scap backport doesn't do mwdebug currently
Pre-discussion
[edit]- If you sync and something goes wrong, it should rollback
- If you notice something is wrong later, you backport a NEW revert
- Where does a rollback differ from deploying a patch?
- MWDebug or scap stopping you from deploying
- Only for things using scap prep auto
- Is there ever a case where you finish a backport, and then later you want to undo?
- scap prep --history
- scap rollback: gives you a history, and you can just hit "enter"
Stage-train testwikis happens without human intervention
[edit]- New code is checked out and security patches are applied
- DONE (stage-train does all the magic)
- systemd timer (cron job?) for stage-train
- https://releases-jenkins.wikimedia.org/ gives a nice history of builds which is handful when something explodes, then it is public and we probably don't want to hook it to the production deployment server
- find a way to notify folks of completion
- Probably need some alarms to be emitted on failure https://wikitech.wikimedia.org/wiki/AlertmanagerΒ ?
- Security patch explosion handling
- Don't we have a system to routinely test they still apply? (No, not for new branches)
- Interlock with other deployment operations.
- New version is sync'd to all MediaWiki servers & TestWiki runs new version
- Do we need a way to do this without flipping wikiversions? Or JFDI testwiki? I think we landed on JFDI but we could modify scap to be able to pre-stage a non-active mw version.
- That is "scap prep" isn't it? Iirc that is a global lock and prevent scap backports so the auto task has to be timed in a way which does not overlap with the backport window (which is CEST tied rather than PST/UTC tied). I guess 5am UTC will work or maybe just after Jenkins had cut the branch
- agree on a set time, update deployment calendar
- Ensure the new branch has been cut (verify jenkins job is success? check the branch exists in all repos?)
- Handle skipping trains (due to holidays, team offsite, december deployment freeze) (maybe the timer script can check the deployment calendar repo to check whether a train should run).
- Alerts (for failed security patches, etc.)
- Sudo permissions to re-run service
Current state
[edit]- MWpresync account exists
- Cronjob that runs stage-train!
- sudo mwpresync still needed