Wikimedia Release Engineering Team/Monthly notable accomplishments
Appearance
This page lists notable accomplishments for the month as we come up with them during our weekly team meetings.
24/25 Q1
[edit]Oct
[edit]- Gave a 45min presentation at Wikicon NA on cloud services/toolforge https://www.mediawiki.org/wiki/File:What%27s_new_with_Wikimedia_Cloud_Services,_WikiConNA_2024.pdf
- Dan found a bug in localization syncing
- Fixed l10n CDB file handling on secondary masters:
- https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/457
- https://gerrit.wikimedia.org/r/c/operations/puppet/+/1076019
- Phabricator incoming mail works again! Broken since Feb.
- Phab deploy renaming "wikitech accounts" -> "ldap accounts"
- Updated users in bitergia database
- SpiderPig demo
- Toolforge standards commitee all through NDA!
- Volunteer NDA steps are reduced and clearer on the wiki docs
- Deleting branches via train-branch bot
- https://tools-static.wmflabs.org/jenkins-build-stats/
- Increased quota for integration -- https://phabricator.wikimedia.org/T376847
- Single single version image built and published
- Bunch of pending deploy doc edits
- We disabled changing priority for Phab/Phorge users outside of Trusted Contributors & orgs (is this a win? time will tell.)
- Deployed Jenkins plugins patches for the annoying castor-save-workspace-cache aborted during postbuild https://phabricator.wikimedia.org/T352319
- Catalyst: (almost) all the checkboxes work!
- Antoine paired with Esuvat to fix the Catalyst/Patch demo integration in Gerrit UI https://phabricator.wikimedia.org/T374954
- Phab: disabled notifications for repository commits from diffusion (only mirrored repos)
- Upstream phorge Soon™ to be PHP8.4 compatible per static analysis
- WE6.2.1 Final Results™ drafted—branching a wmf/next branch—taking the head of everything that is deployed, running it through gate and submit, and building a docker image from that state + mwconfig + ssssecrets. All happening nightly at Midnight UTC. And now deploys to the restricted section of the container registry.
- WE6.2.3 SpiderPig—merged api server + web front-end
- https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/CWGSKXDQ5IQ7EQQXSXY2Q6C2NKRNJGRP/ - schedule-deployment.toolforge.org now allows per-window scheduling
- Integrating Wikifunctions helm chart with catalyst API
- Streamling logs from k8s inside PatchDemo—beta launch full steam ahead
- 2 MRs for PHP 8.1 stuff
Sep
[edit]- Fixed scap prep bug https://phabricator.wikimedia.org/T373425, https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/424
- GraphQL data queries for bitergia are published—important for determining users that are the same person across tool within bitergia https://www.mediawiki.org/wiki/User:AKlapper_(WMF)/Bitergia_data_quality_queries
- train branch cutting credentials are all good—unified credentials for gerrit api and git
- Use sqlite to store scap sync-world history (https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/423)
- Make automatic-branch-cut re-runnable: https://phabricator.wikimedia.org/T373709 (https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/115)
- Dan helped me stumble through adding an alert for PVC space in gitlab-cloud-runner (well, half of adding an alert, anyway)
- Antoine enabled error logging for everything to reveal hidden errors ( https://phabricator.wikimedia.org/T228838 )
- Antoine enhanced https://grafana.wikimedia.org/d/000000102/mediawiki-production-logging
- Wikifunctions has issues https://phabricator.wikimedia.org/T374241 / https://phabricator.wikimedia.org/T374231
- spiderpig data model and jobrunner merged
- https://gerrit.wikimedia.org/r/c/operations/puppet/+/1072786 gitlab: Sync people/wmde GitLab group w/ wmde LDAP group
- New phatality deployment, including bugs, downtime, and an eventual fix:
- Removed deprecated Greenballs plugin from releases-jenkins . There are many more to go.
- Prepared an initial gitlab CI for https://gitlab.wikimedia.org/repos/data-engineering/schemas-event-secondary
- Trying to get Volunteer NDAs signed for folks who will be the next Toolforge Standards Committee: https://phabricator.wikimedia.org/T374993
- Working on presentation about WMCS things for WikiCon NA the first week of October
- Progress on group-1
- wmf/next branch is cut nightly
- image build process via scap is blocked on lack of php dependencies on releases server and lack of mwscript
- Currently looking at containerizing the execution of l10n update script to side-step this problem
- WikimediaDebug update published
- Dan's new work laptop
- Containerizing mwscript
- PatchDemo catalyst backend supports all extensions/skins/etc
Aug
[edit]- Fixed issue with deployment-deploy04 free space. Added a 40GB volume and copied /srv to it.
- Buildkit 0.15.1 release deployed
- Helped data-engineering Airflow DAGs with their Gitlab CI.
- Rewrote remainder of make-container-image stuff in Python: https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/99 \o/
- Scap invokes this repo during deployment
- Current status: Create a php7.4 image + debugging packages
- Future: php7.4 + php8.1
- Single version images: there's a change in mw-config to override wikiversions.json
- Updated train-dev to use debian:11 base image.
- Kicked off nomination process to reboot the Toolforge standards committee (https://phabricator.wikimedia.org/T370474)
- Moved a tool to toolforge build service
- Fixed links in patchdemo for catalyst wikis
- merged persistence for k8s patchdemo
- Added read-only flag for patchdemo
- Fixed a Phab code bug not checking user permissions creating a form
- Merged more Phorge upstream stuff to get bugfixes + features once we pull, e.g. logging errors for broken Herald rules. See some stuff as deps: https://phabricator.wikimedia.org/T370266 (when downstream tasks exist)
- Played with checking for active Phab accounts linked to locked WMF SUL accounts (TODO: other way round)
- Started working on a Kubernetes cluster for deployment-prep using OpenTofu and Magnum as provisioning tools. Lots of things to figure out still, but a proof of concept cluster was provisioned, destroyed, and provisioned again. https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu + deploymentpreps3
- Scap deploy with rewrite of build-image script
- Merged catalyst/patchdemo environment redirects
- Repos under https://gitlab.wikimedia.org/toolforge-repos/ are now indexed by codesearch as part of the "wmcs" collection. https://codesearch.wmcloud.org/wmcs/?q=mwclient
- Images built by Kokkuri on DO runners are now usable from WMCS runners. This is being used in the tech spike on creating a deployment-prep Kubernetes cluster using Magnum to build and then run an image containing OpenTofu and other tools needed for gitops automation of the process. https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu/-/blob/main/.gitlab-ci.yml
- 8 folks have been nominated to reboot the membership of the Toolforge Standards Committee. Bryan will be working in the coming weeks to get them all vetted by the Toolforge admins and to facilitate them signing an NDA with the Foundation. https://wikitech.wikimedia.org/wiki/Help_talk:Toolforge/Toolforge_standards_committee#August_2024_committee_nominations
- train dev fixin'
- Fixed mw-web deployment in train-dev: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1060464
- https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1064124 mw-debug/mw-web: Reduce CPU requests/limits for train-dev
- Renewed the DEPLOY_TOKEN in gitlab-cloud-runners project.
- System needed other than Ahmon's reminders :)
- Upgraded buildkitd to 0.15.2 in all the places.
- https://gitlab.wikimedia.org/repos/releng/buildkit/-/merge_requests/69 (README.md: Initial notes on handling new releases)
- Wrapped up build-images stuff w/ help from Jeena and Scott French
- T372921: scap deploy blank checks bug fixed.
- https://phabricator.wikimedia.org/T361724 scap should check if it is running within a tmux/screen
- Better remote build context support in Kokkuri
- A handy `.kokkuri:remote-context` mixin
- Kokkuri can now resolve the frontend ref ("syntax" line in .pipeline/blubber.yaml) from a remote build context (via the GitLab API)
- New releases-jenkins job to cut wmf/next is ready \o/
- Played with upstream Phorge doc tool (Diviner), wrote a dozen of upstream patches to fix 404s of methods in search results, PHP 8 exceptions (unit tests for phorge :(( ), some PhpDoc cleanup, etc.
- Calm train last week
July
[edit]- "[WE6.2.1] Publish pre-train single version containers" is now on Phab <https://phabricator.wikimedia.org/T369115>
- Submitted a WMCS themed talk proposal for WikiCon North America (<https://wikiconference.org/wiki/Submissions:2024/What%27s_new_with_Wikimedia_Cloud_Services>)
- https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/367 kubernetes: Make k8s deployment failures fatal
- https://gitlab.wikimedia.org/repos/releng/blubber/-/merge_requests/101 Allow empty requirements list to enable Python builder
- Cleaned up scap, kokkuri and mediawiki/services/machinetranslation
- https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/368 Prompt for log message if not supplied on command line
- Deployed gitlab-runner v17.0.0. on gitlab-cloud-runners.
- https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1047158 mw-web: Add traindev environment — testing canary deployments should work!
- https://gitlab.wikimedia.org/repos/releng/train-dev/-/merge_requests/76 enable mw-web canary / main k8s deployment
- Fixed train-dev for 1.43.0-wmf.11
- Blubber v1.0.0 has been published
- Native BuildKit LLB (Low-Level Build) instructions (no more reliance on Dockerfile)
- Refactored to support all `docker build` and `docker buildx build` options
- Supports attestations stored alongside images in the registry, provenance and SBOM
- Looks like phorge 2024.19 stable release merges cleanly, as does phorge/master
- Settled on a squash commit template: https://gitlab.wikimedia.org/repos/releng/gitlab-settings/-/merge_requests/64
- Enabled image diff in Gerrit: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/902211/1/static/images/project-logos/dkwikimedia.png
- Further refinement in upstream code would allow some more tuning, screenshots at https://phabricator.wikimedia.org/T341291#9939660
- Diff is based on Resemble.js library, demo at http://rsmbl.github.io/Resemble.js/
- MediaWiki train!!!! Win win!
- Started building out an on-wiki space to keep track of group-1 things: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Group_-1
- https://gitlab.wikimedia.org/repos/releng/kokkuri/-/merge_requests/100: Change kokkuri to use v2.0.3 (newest buildctl)
- https://gitlab.wikimedia.org/repos/releng/blubber/-/merge_requests/107 Fix duplicate pipelines when pushing merge request
- https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/377 sync: Refactor deployment stages
- thx to swfrench for reviews (nice)
- Dropped logstash_checker.py from puppet (nice :))
- Filed https://phabricator.wikimedia.org/T369532: Update ldap-sync-bot token
- We have a director
- Met with Guillaume re: archiva migration, publishing doesn't seem to be working but we'll debug that this afternoon.
- Discussed gitlab-behind-loadbalancing ideas with Collab Services, they may be dissuaded from changing SSH remotes
- Blubber can now build directly from Git URLs. This makes the provenance more "complete"
- Adding support for remote build contexts to Kokkuri. In the GitLab CI context this means passing the MR Git URL directly to BuildKit
- Rephrased the GitLab account approval banner because confusion: https://phabricator.wikimedia.org/T369698
- Updated the GitLab approval form to remove with the "specific reason" for GitLab
- Continued with writing random Phab downstream and Phorge upstream tech debt patches and bug fixes
- Talked with folks who like quarterly Phab account metrics about lies, damned lies, and statistics (thanks Tyler)
- https://phabricator.wikimedia.org/T369862 Upgrade to buildkit v0.15.0
- ldap-sync-bot token has been renewed.
- Braggedlogged about latest Phabricator improvements in https://phabricator.wikimedia.org/J316
- Phab/Phorge up to date with upstream stable (next step: rolling release) oooh
- invented https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Group_-1/Progress_reports/2024-07-11 to put weekly reports on-wiki
- dzahn got Gerrit on nftables, no obvious breakage
- Sorta got Maven publishing working at https://gitlab.wikimedia.org/repos/maven/maven-test-project
- Open question: Any reason to care about just using a single project for all kinds of packages?
- https://phabricator.wikimedia.org/T367322#9987048
- Swapping out VMs for CI/Integration-project --- cumin done! deb building in progress.
- Explored running old-timey patchdemo in k8s alongside catalyst
- Another routine Phab/Phorge deploy
- Some less dependencies upon EOL Debian Buster
- Merged modifications for repos/releng/release
- (make-container-image), operations/mediawiki-config
- (MWMultiversion.php), and scap to support FORCE_MW_VERSION, to support
single image container images.
- Andre's "scap train" mods deployed.
- scap changes to better support alternate stage_dir:
- https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/390 cli.py: Change default path of mediawiki sync lockfile
- https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/389 Make default history_log follow stage_dir
- Southparkfan hammering away at beta
- streaming logs from Catalyst environments
- It seems like the new deployment box is pretty much working by now.
- Jaime had to fix git flag placement for git 2.20 -> git 2.30
- Jaime had to fix scap deploy for heterogenous python versions within the cluster
- Merged changes in catalyst for MediaWiki helm charts to spin up new MediaWiki instances with the PatchDemo provisioning scripts
- Progress on merging the PatchDemos
- Split Puppet 5 and 7 compiler output since some hosts no more support v5 and that was confusing SREs (screenshots: https://phabricator.wikimedia.org/T371407#10028859 )
- Added a "(diff)" link to the notification that https://schedule-deployment.toolforge.org/ gives you after adding a new backport to the schedule. phab:T367948
- git.wikimedia.org is finally dead (a win in so far as maybe we never have to talk about it again)
- Quarterly phabricator queries updated
- Upstream opengraph diff was merged for phab, so that link previews may start working in slack at some point
- Adding milestone description copying to lessen suffering
23/24 Q4
[edit]June
[edit]- "Wikimedia Deployment Scheduler" tool is now linked from Gerrit changes and helps add things to backport windows. https://schedule-deployment.toolforge.org/ More at https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/VT44HXYMEYUNDVIGGLII7XZZTNCXA52S/
- Improved httpbb failed header check output: https://gerrit.wikimedia.org/r/c/operations/software/httpbb/+/1037156
- scap sync-world: k8s: image build errors are now fatal. https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/329
- Reggie manifest cleaner deployed (results in one week)
- Restrict some scap subcommands to deploy servers only https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/331
(plus followup from Jaime)
- Yesterdays phab deploy mitigating some phab vandalism vectors
- GitLab version 17 upgrades—deprecated endpoints in our tools https://phabricator.wikimedia.org/T365675
- Fixed https://phabricator.wikimedia.org/T364309: deployment: fix-staging-perms fails to finish [ set umask on more scap commands ]
- Fixed https://phabricator.wikimedia.org/T366217: unable to create revert commit from scap
- Fixed https://phabricator.wikimedia.org/T366844: Don't just append names with "and" (scap)
- Fixed https://phabricator.wikimedia.org/T366856: `UNIQUE constraint failed: blob.name` during manifest upload to Reggie
- https://gerrit.wikimedia.org/r/c/operations/puppet/+/1041746: logstash_checker.py: Add --time option merged but soon to be obsolete
- https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/360 kubernetes: Always use replicas:1 in traindev
- Deployed buildkitd 0.14.1 to staging and prod gitlab-cloud-runners, and trusted runners (https://phabricator.wikimedia.org/T367352) \o/
- https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/358 Move logstash checker code into scap, and behavior changes. \o/
- Merged and deployed with scap 4.89.0.
- Closed https://phabricator.wikimedia.org/T159991 (If aborting a scap due to test canary error rate, output some errors for reference)
- Closed https://phabricator.wikimedia.org/T183999 (Scap canary has a shifting baseline)
- Closed https://phabricator.wikimedia.org/T367131 (Did retrying canary checks do anything?)
- Phab
- AFAIK done with Phab custom downstream TechDebt cleanup, no more random breakage (sorry brennen)
- Antivandalism DB query code way way more performant - https://phabricator.wikimedia.org/T366811 \o/
- Misc upstream Phorge work, e.g. showing "this is a dup" in comment field: https://we.phorge.it/F2231629 and "Allow collapsing/expanding workboard column content by clicking its header" for mobile in https://we.phorge.it/D25672, awaiting review
- Wrote PoC/WIP patch to ignore milestones in a Herald rule "none" condition in https://phabricator.wikimedia.org/T144041 (requested by numerous WMF teams over the last years - currently you need to update your team's backlog funneling Herald rule every time you create a new milestone project)
- gitlab-settings/configure-projects stuff runs on a timer on GitLab prod box \o/ (laptop no longer in prod)
- Actually turned on Phorge integration on GitLab \o/!!
- Can deploy a mediawiki with patches to kubernetes from patchdemo!!!
- Feature flagged catalyst on patchdemo
- Working on BDD tests for Catalyst
- PatchDemo bootstrap scripts in k8s
- Buster deprecations in WMCS
- Blubber experimental/native-llb branch seems stable enough to merge
- Thinking a 1.0.0 release is in order
- Docker-ui package integration work
- Gerrit on 3.10 as of today
May
[edit]- scap k8s deployment progress reporting
- scap release-scripts/perform-release rewritten in Python, and added wait for the tag pipeline.
- Jaime is a PHP expert now, succesfully running patchdemo
- Patches upstream for Phorge viewing reports while not logged in (https://we.phorge.it/D25608) etc
- buildkitd upgraded to v13.2
- Started upstreaminig process for frontend restrictions.
- https://github.com/moby/buildkit/pull/4899
- scap clean improvements
- First changes to Catalyst Patchdemo
- Scap3 broken symlink up for review
- Skins available in the catalyst environment
- Upstream buildkit mod merged: https://github.com/moby/buildkit/pull/4899
- Moved wmf buildkit helm chart to its own repo for easier maintenance: https://gitlab.wikimedia.org/repos/releng/buildkit-chart/
- integration/config: jjb-diff improvement (don't assume stdout wants ansi)
- Docker gc config for CI: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1031045
- docker-hub-mirror upstream bug workarounds
- Phab: made good progress removing tech debt in Phabricator, all deployed thanks to Brennen: https://phabricator.wikimedia.org/maniphest/query/cGaRtbNWQSd1/#R . Disabled ~12 ancient Herald rules. Hackathon. Phorge upstream stuff. etc.
- Hackathon a good time generally
- Wikibugs got initial gitlab integration during the hackathon and has a couple of improvements since. Next step is wiring up a bot to configure more webhooks so the bot can see CI runs and code review comments. https://www.mediawiki.org/wiki/Wikibugs
- Contint1002 is now running on bullseye along with python2 zuul, but this is the LAST TIME! (thanks dzahn)
- Hack found for Wikibugs network instabilty issue talking to https://gitlab-webhooks.toolforge.org. Bypassing Kubernetes ingress by talking directly to service makes things much more stable for long lived connections.
- Bryan working with Eoghan to get secrets provisioned for adding GitLab account block/unblock to Wikitech block cascade.
- Upgrading SyntaxHighlight to work with the newest Pygments is stalled because the new version needs Python 3.8+. Prod, test, and default dev environments are all currently Buster with Python 3.7. <https://phabricator.wikimedia.org/T364249>
- Jelto unblocked Wikibugs tests from calling Phabricator by creating a GitLab shared runner that can be used by projects in /toolforge-repos/
- Deployed protection for https://phabricator.wikimedia.org/T282893 (Various CI jobs failing after "mkdir: cannot create directory ‘log’: Permission denied"). That revealed a few places where a root:root cache or log directory was previously being auto-created by docker. Added fixes for that. Plus a fix for codehealth checks from Tyler
- Blubber Python builder: Always use a virtualenv https://phabricator.wikimedia.org/T357548 . blubber/buildkit 0.23.0 released
- docker-gc resiliency improvement deployed.
- Simplified gitlab-trusted-runner projects.json (removed project-ids)
- Fixed problem recently discovered w/ gitlab-mentions-bot... it starts getting email notifications for MRs that it has made a note on. These emails go to releng. Fixed.
- Andre's 1st TRAINNNNN \o/
- Backfilled bugzilla tickets in phabricator to fix stats after 10 years - https://phabricator.wikimedia.org/T107254
- Phabricator OGP previews upstream patch - https://we.phorge.it/D25668
- SRE Collaboration Services has a dedicated IRC channel now irc://irc.libera.chat/wikimedia-sre-collab
- https://phabricator.wikimedia.org/T313624
- Reproduced the issue locally and identified that it occurs when the keyholder key is either not specified in scap.cfg or is missing from /etc/keyholder.d. According to OpenSSH behavior, if no specific key is provided, it tries all authentication methods up to the MaxAuthTries limit. Since these configurations are on the target and not modifiable, increasing MaxAuthTries is not a viable solution.
- To resolve this, I updated the code to abort the program and prompt the user for a rollback if the key is missing.
- Patchdemo checkbox with ooui https://patchdemo.catalyst-qte.wmcloud.org/
- GitLab account blocking/unblocking has been integrated with Wikitech. Blocking a Developer account on Wikitech now also blocks the user's associated accounts in Cloud VPS & Toolforge, Gerrit, GitLab, and potentially Phabricator. The Phabricator block depends on the user having previously linked their Developer account with Phabricator. Unblocking a Developer account on Wikitech reverses the associated account blocks as well. This makes disabling a Developer account a lightweight and reversable process which in turn makes it easier to use a "block first; investigate more deeply later" approach to combating abuse. https://phabricator.wikimedia.org/T316418
- commit-message-validator v2.1.0 (<https://www.mediawiki.org/wiki/Commit-message-validator>) now supports two optional trailing spaces after
Bug: Tnnnn
andChange-Id: Ixxxx
footers to improve support for GitLab markdown rendering of merge request descriptions which can become commit messages when doing non-fast-forward merges. https://phabricator.wikimedia.org/T351253 - Used new Blubber Python builder to resolve https://phabricator.wikimedia.org/T346226
https://gerrit.wikimedia.org/r/c/mediawiki/services/machinetranslation/+/1035009 (CI: Revive use of tox for tests)
- toolforge webservice logs -f: Don't spam user w/ stack trace on control-C (https://phabricator.wikimedia.org/T361437)
https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/39
- pvc-cleaner: Added connection retries to reduce Pod restart events.
- Seen while rolling the train to group0: Finished sync-prod-k8s (duration: 03m 03s) (less than half of time prior to today)
- Finalized T359643: Get rid of the /srv/mediawiki/php symbolic link
by removing the symlink from operations/mediawiki-config
- Finally!
- Lots of positive feedback about scap backport recently.
- Config file and speedup (thanks Bryan) for gitlab-settings/configure-projects/
- Learned that puppet compiler button on gerrit is neat
- Undeployed Blubberoid!
- Phabricator deployment of cleanup: - https://phabricator.wikimedia.org/maniphest/?ids=366075,364720,361997,230590,228507,140448#R
- Rotting list of Gerrit repos in Phabricator
- 2fa + Oauth doesn't make you re-auth with mediawiki
- Email pre-filled when you Oauth with MediaWiki in Phab
- Button for running the puppet compiler in Gerrit
- Content Transform team successfully used Quibble to debug https://doc.wikimedia.org/quibble/
- Catalyst API Generates user tokens
April
[edit]- new puppetserver in devtools!
- phorge inbound mail patches upstream
- Phorge PoC patch to thwart webcrawlers—every line number is a separate page
- Phabricator 6 mo stale ticket pings
- Hiding non-canonical diffusion links WIP
- Phab deploy!
- Default search hints to advanced search: https://phabricator.wikimedia.org/search/query/advanced/
- Gerrit 3.7 upgrade had a slow down ( https://phabricator.wikimedia.org/T355529 ) Upstream fixed it. I have upgraded on Monday morning and... it works!
- Scap user interactions consolidated into two functions
- /srv/mediawiki/php symlink uses is dead \o/
- Train-dev gerrit auto-upgrade
- Scap handles pending rollback state from helm
- SpiderPig demo
- Patchdemo is running in k8s
- Phab/Phorge translations are going again, ish
- scap release process: Added checks to avoid surprises
- scap backport: Improved behavior on bad change number input.
- train-dev: Up-to-date docker compose plugin is required now
- scap backport: support deployable non-production branch backport \o/
- phabricator upstream patches for error log explosions
- phabricator -- patch for restricting visibility of priority field
- catalyst deploying extensions with a MediaWiki environment
23/24 Q3
[edit]Mar
[edit]- Train: Nightly security patch failures updating phabricator tasks merged, ready to release
- GitLab CI: Merged deploys-in-progress reset script
- Scap3: Two repos have patches for git-fat → git-lfs
- scap: replaced canary swagger checks with test server httpbb checks
- Phorge integration with GitLab in its third round of review
- GitLab webhooks also still going, looks like it'll go through
- People like scap backport - more patches, fewer things typed into terminals.
- Train: Security patch notification now working!
- GitLab webhooks have a more accurate regex for "Bug: TXX"
- Working towards getting rid of the /srv/mediawiki/php symlink
- Upgraded GitLab k8s/cloud cluster to new k8s version and documented the process
- Phab deploy is out (but stuff is broken (not terribly (probably)))
- scap backport now works for non-extension submodules
- gitlab cloud runner dependencies
- scap backport -2 fix merged, need to release
- scap web demo
- catalyst project has a patchdemo vm
- Jenkins is upgraded to the latest version
- Logstash dashboard for Phab errors is nice and clean
Feb
[edit]- Phabricator dump script works again (but also probably isn't necessary?)
- extended docpub to run multiple doc build jobs
- Sandeep able to run train-dev environment
- Gerrit 3.8 upgrade prepped
- buildkitd security upgrade on gitlab-runners.
- logstash_checker.py updated to check mw-on-k8s canaries. Merged this morning. Working on scap part today.
- python2 wheels for zuul2 for bullseye - unlock ability to upgrade Debian on contint boxes
- Patch for scap backport branch about to be live! (paired w/Jeena)
- Prepped new Phorge release: https://phabricator.wikimedia.org/T358610
- Deployed new scap with support for canary checks and git-lfs fixes
- In progress: updating security issues with patch issues
Jan
[edit]- Gerrit train-dev fixed!
- bd808's account approver thing seems working, maybe?
- Requested extra Phab/Phorge hardware as backup
- Deployed persistent volume cleaner
- Docker images with pyenv with multiple versions of python
- Worked with upstream phorge to hide the audit application (the post-merge review application for source code)
- Webhook payloads for GitLab merge request changes—tracked down a bug in upstream
- phab1004 distro upgrade is done
- We know we can run in codfw if we have to (it sucks though, let's not)
- scap clean now works better
- PVC story nearing completion
- Helm pending release story, in progress, testing needed
- Sandeep a very happy linux user
- We're on Gerrit 3.7
- Added GitLab support to Git/Reviewers mediawiki.org page.
- scap stage-train working! We're deleting old versions.
23/24 Q2
[edit]Dec
[edit]- https://extloc.toolforge.org/ is live and works
- Fixed cas3 → openid providers in gitlab
- Differential is dead!
- Andre is now a "blessed committer" in upstream Phorge and can +2 other folks' patches
- Not my win, but bd808 has a gitlab account approver bot just about working
- https://phabricator.wikimedia.org/T351478 in progress!
- Jenkins job builder jobs into gitlab-ci.yml jobs
- Systemd-managed dockerfiles for zuul
Nov
[edit]- PoC for zuul delegating to GitLab pipelines works better than expected, surfacing in the UI works better than expected
- More catalyst changes: https://gitlab.wikimedia.org/repos/qte/catalyst
- Diffusion repository exploration is done!
- Leaked pod cleanup script—pods leak when restarts or updates of runners happen
- Jaime's first Catalyst patch! Spins up k3s + MediaWiki + Vector
- Less noisy docpub alerts
- Cindy's patches to Catalyst
- Zuul's gating functions alongside GitLab-ci.yml files (no gitlab clone, instead rsync from executor)
- Jeena helm chart templating
- Pod anti-afinity for buildkitd pods + increased buildkit volume size: https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/291 and https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/295
- Script to modify persistant volume claims: https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/294
- Deployed gitlab-pod-cleaner: https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/297
- Zuul upstream commit is moving along: https://review.opendev.org/c/zuul/zuul/+/899425
- Catalyst builds and exposes
- Phabricator deploy yesterday
- Zuul running in dev-tools! Blubberized upstream docker files! https://zuul-devtools.wmcloud.org/tenants
- 0 open patches in differential
- puppet catalog compiler better integration with Gerrit: https://people.wikimedia.org/~hashar/gerrit-pcc.webm
- Made this train thing: https://tools-static.wmflabs.org/train-stats/
- Jenkins bug! ~ 2 days from start to merged by upstream https://github.com/jenkinsci/parameterized-trigger-plugin/pull/363
- Uninstalled applications we don't use in phab—one more repo in WMDE
- Went through 2012–2014 untouched repos in Gerrit
- Archived mwdumper https://www.mediawiki.org/wiki/Manual:MWDumper
- Ancient gerrit repo tickets filed for project cleanup!
Oct
[edit]- Bitergia user data is now a webapp and we can use it
- Project Catalyst is underway -- https://wikitech.wikimedia.org/wiki/Catalyst
- RelEngers can downtime things for phabricator deploys without SRE (although cookbooks are less fiddly)
- Ran train with Andre!
- Regular Phab deploy!
- libs/metrics-platform moved over
- Clarified language on backports: https://phabricator.wikimedia.org/T344409
- Investigated hidden repositories in GitLab
- Prettified puppet compiler output
- fresnel updates for dependencies
- Jaime working on project catalyst
- Gerritlab revised branch naming landed
23/24–Q1
[edit]Sep
[edit]- Image published for Blubber that is native LLB, no dockerfile anymore
- implications
- dockerfile is unnecessary since no one sees the dockerfile—we can customize each llb instruction and what it displays to the users: a name that corresponds to the blubber.yaml config
- now we have the ability to create our own instructions
- dockerfile2llb gone! No more external helper images that haven't been maintained just to copy files around—no more cross-platform compatibility/emulation issues
- llb gets new stuff first—ex: diffop/mergeop https://www.docker.com/blog/mergediff-building-dags-more-efficiently-and-elegantly/
- implications
- Phorge working on the scap3∞ deployment environment
- Landed 3 upstream phorge patches, 1 is one we've had for years the blocks some tasks rendering (T284397)
- Patch for T&S could outputs the MediaWiki SUL account along with the phab username (T344303)
- Wrote a plugin for tox to keep supporting [tox:jenkins] CI config with tox v4 https://gerrit.wikimedia.org/g/integration/tox-jenkins-override unblocking part of the migration from tox v3.
- Moved Civi CRM CI from Stretch to Bullseye and to php 7.4 (aligning with prod). Paired with Ejegg from FR-tech.
- Ahmon refactoring GitLab nodepools, buildkitd persistent volume, and containerd debugging
- Dan deep debugging of source code for containerd
- GerritLab uses the git credential helper
- Trusted runners now not running untrusted jobs
- Authentication working in phorge dev environment
- Phabricator housekeeping for open tickets assigned for more than two years
- Phabricator logstash dashboard with filters
- We've got a weekly Phorge deploy window, of sorts (and can ask for other things)
- Workaround for security patches touching l10n—fixes bug!
- Tox v4 migration in progress
- Added phorge to the scap3 development environment!
- Fixed a logspam-watch bug (SIGPIPE)
Aug
[edit]- Developer Satisfaction Survey got presented
- Gerrit repo archiving script for GitLab migrations \o/
- Gerritlab adoption
- JWT auth changes
- T272693 - reviewed non-standard phabricator policies
- Downstream phabricator patches for php8 + logspam
- Upstream phorge patches for logspam
- Overwrote feed transaction default query in conduit (T344232#9092848)
- Scap3 can now be configured to disable the service on secondary hosts: https://phabricator.wikimedia.org/T343447
- Kokkuri is now using the new gitlab id tokens: https://phabricator.wikimedia.org/T337474
- We're on Phorge!
- Gitlab CI-built kask container image deployed today. (https://phabricator.wikimedia.org/T335691)
- Gitlab local hacks in progress
- Ahmon passed his CKA! Read Kubernetes in action
- Merged 3 fixes to Phorge upstream for phab logspam
- 🎉 Delayed announcement: Jeena's back, and she's a senior software engineer
- Blubber refactor ripping out dockerfile passing acceptance tests—straight to llb
- Added another pool to our DO cloud runner pull—memory-optimized
- Refactored the patch to tune-down staging substatually(sp?)
- Now there are 4 runner-controller runners running + 4 nodes ready to go
- GerritLab commits merged to speed up sending patches and does the right things given GitLab's weirdness
- scap backport bugfix
Jul
[edit]- git::clone puppet resource updated
- LDAP group sync to GitLab
- Git blame on stack traces within Phatality
- Buildkitd allowed image list deployed
- Onboarding Andre Klapper, all sorts of new permissions: phab-root, contint-root, gerrit-root, gerritadmin
- Bunch of GitLab accounts created ~200 accounts for the
mediawiki/*
namespace - Assisted in GitLab migrations, notably, [[ https://gitlab.wikimedia.org/repos/data-engineering/datahub | datahub ]]
22/23–Q4
[edit]April
[edit]- Mr. Widget doesn't seem to have broken again.
- Job to test train branch cut on a daily basis
- Successfully debugged an obscure buildkitd -> registry interaction
- Multi-arch image support pre-req!
- Still need access to logs for future debugging/troubleshooting. See https://phabricator.wikimedia.org/T322579
- A plan exists for Phorge migration
- Abstract Wikipedia showed up asking for help with a GitLab migration
- Jelto deployed the privileged buildkitd commit
- Moving scap backport tests, win in progress
- Aphlict on a new box---nothing exploded, nobody yelling
- Jenkins releases configuration fully automated
- scap train
May
[edit]- draft hypotheses are drafted
- Andre on team (nascently)
- Local working phorge install—legalpad is the only area of divergance
- Killing the dockerfile in blubber—adding functional tests
- Some work under Migrate mediawiki/ namespace from Gerrit to GitLab
- Updates to https://www.mediawiki.org/wiki/GitLab/Hosting_a_project_on_GitLab and a script for importing users to groups for GitLab
- Gerrit security update
- Jeena got tests to pass for scap backport
- Doxygen no more deployment of gh-pages branch (saves us 1GB of junk :D)
- GitLab spam mitigation
- Nascent plan for MediaWiki under GitLab
- Phorge is on devtools
- doc publishing on GitLab is working \o/
- last night I have added to Gerrit support for Patch Demo (tool used to spin wikis) https://phabricator.wikimedia.org/T332474#8874936
- scap backport integration tests!
- Pushed dozens of upstream patches into Phorge: https://we.phorge.it/differential/query/m1kEaCStjf4Z/#R
Jun
[edit]- Blubber acceptance tests
- docpub in Jenkins
- Antivandalism patch deployed! (one down; one to go)
- Learned that we needed to restart php
- git::clone changes in puppet for specifying a tag
- git::clone upstream changes now changes the origin
- WMCS instance caches for NPM via "npm cache verify" to GC the cache
- buildctl --wait
- dev-images image rebuilding
- train backport on Saturday
22/23-Q3
[edit]Jan
[edit]- Folks clamoring to use pipeline
- Dan & Ahmon jumping in to help devs
- Chad moved!
- Instance-wide runners!
- First non-us/non-ci repo deployed via GitLab Pipeline: https://docker-registry.wikimedia.org/repos/data-engineering/mediawiki-event-enrichment/tags/
- No more repos on diffusion!
- Kokkuri's python now!
Feb
[edit]- Buildkit upstream patch to make client connections more robust in case of loss
- Smooth wmf.22 train
- docker-pkg build --list \o/
- Isolating build containers in buildkit in privilege mode
- Mr. Widget should be deployable, I think (as soon as secrets are actually stashed)
- Pushed up patches for JWT (hopefully the final ones :D)
- Deployed production releases-jenkins using scap3!
- Patched scap to be smarter about interrupted helm deploys!
- See Slack://WikiLove thread re: scap backport
- "90% reduction in time spent in existential dread" ← going on a slide deck somewhere!
- fixed scap backport for dependencies
- Buildkit upstream patch to make client connections more robust in case of loss
- Smooth wmf.22 train
- docker-pkg build --list \o/
Mar
[edit]- New scap self-install in production
- Learned too much about iptables
- Moved gitlab-cloud-runner Helm stuff to Terraform :)
- Thundering herd testing passed—k8s can handle 100 concurrent job
- Phab release prepped
- docker-gc is blubberized and kokkorized
- found and deleted docker image based on obsolete debian version (stretch)
- made terraform plan run before merge
- Tentative optimism for CKA
- Monte's having success using the phab api to build different views of tasks
- Mr Widget deployed
- gitlab-cloud-runner stress tests successful!
- Dockerhub mirror admission controller
- Reggie JWT auth enabled in gitlab cloud runners
- Gitlab cloud runners ready to be made available instancewide
- We made a staging cluster
- Docker-gc repo using kokkori
- Gerrit progress bars
22/23-Q2
[edit]Oct
[edit]- Blubber's gitlab ci file is enough to move the project over
- Setup in gitlab ci: jwt auth and buildkit
- New features in blubber: builders improvement from Jaime + feature for running variant from Jaime
- Jaime + Antoine upgraded scap in the dev-tools project
- Team likes each other
- Train feels less stressful lately
- GitLab runners in K8s now running buildkit with caching—nothing to a k8s cluster
- Scap builds docs on GitLab
Nov
[edit]- Scap repo is fully moved over to GitLab
- Internship opportunities!
- Dan's daughter can now pedal a bike
- Critical systems list
- Antoine fixed mixed-case usernames in Gerrit
- Gerrit upgrade
- Scap3 dev env: https://gitlab.wikimedia.org/repos/releng/scap3-dev
- Phabricator is now hosted on a new box at phab1004 and deployed with scap
- registry-based caching
- reggie is in use and working
- Autoscaling
- Kokkuri
Dec
[edit]- Antoine replaced Docker with Podman
- https://wikitech.wikimedia.org/wiki/Gitlab/Phabricator_integration
- Provision Horizontal Pod Autoscaler (HPA) for GitLab cloud runners https://phabricator.wikimedia.org/T323164
- certmanager for DO k8s registry
- MW-ok-k8s routing traffic
Soon™—our part works \o/ woo - Dan's GitLab CI presentation to tech-all!
22/23-Q1
[edit]July
[edit]- We don't specifically have any reason to think our GitLab instance has been owned, necessarily.
- Small merges for mwpresync
- GitLab runner config management changes merged!
- Increased the team knowledge of scap3
- Pending major update for GitLab
- Scap-o-scap installed in beta! \o/
August
[edit]- Train-blockers toolforge scrapes from phab \o/
- Nagged GitLab into updating their FAQ: https://gitlab.com/gitlab-org/gitlab/-/issues/363212#note_1066797431
- Clare used scap backport for real
- Phabricator (probably) deploys from scap 3
- Beta exists still
- Chad re-earning t-shirt
- Upgraded Gerrit from 3.4.4 to 3.4.5
- Scap-backport improvements, seeing increased use
- Renewed GitLab relationship!
- Moved Gerrit replica server!
- Yet another successful train, automatic edition this time!
- Team reviews are fast!
- Gitlab JWT STUFF MERGEDDDDDD \o/
Sept
[edit]- GitLab meeting with Bryan at GitLab—ultimate is free if we want it, license compliance thing (https://docs.gitlab.com/ee/user/compliance/license_compliance/#license-compliance)
- Stage-train automatic mode ran all the way through!
- MW-to-k8s deploy via scap
- Blocker/resource conversation
- GitLab trusted runners testing can progress now that we can hit the internet
- Successfully built an image that fetched node and python packages from the internet
- sooo close to deploying phab with scap
- Tyler fixed the toolforge script generating the Deployment page (got broken in May https://wikitech.wikimedia.org/wiki/Special:Contributions/DeploymentCalendarTool ).
- Antoine's first Go patch \o/ \o/
- Build images on GitLab trusted runners!
- https://phabricator.wikimedia.org/phame/post/view/297/scap_backport_makes_deployments_easy/
- Finished migrating the last two services to PipelineLib
- Antoine can share his screen in firefox! :D
- Blubber and scap review!
- One command scap release—a 10,000% increase in productivity vs 1 year ago
21/22-Q4
[edit]April
[edit]- Jamie deployed the train
- Jaime rolled back the train
- GPG keysigning
- Fixed bug in proxy balancing in scap
- Scap deploy-promote
- Scap 4.7.0-fully out; 4.7.1 going out this week!
- Scap 4.7.1 fixes cross-datacenter pulls!
- New Phatality deploy
- Scap backport
May
[edit]- Our long, grinding efforts at deployment training are finally starting to result in more people doing deploys (well, ok, they've resulted in Clare doing deploys, anyway) \o/
- Rolled back train five times
- Deployment tooling just kind of sucks less than it used to
- Merged scap backport
- scap stage-train \o/
- Finally got rid of the generic service-pipeline-* jobs and migrated remaining 23 projects to use bespoke `.pipeline/config.yaml` based jobs
- gitlab-a-thon
- We found a a whooooole lot of blockers
- Dan being a good open-source citizen: https://github.com/moby/buildkit/pull/2868
- JWT implements oauth2
- Could be used to authorize push access to namespace based on project path
- Root access on phabricator
- Updated Jenkins for Security—which broke Jenkins for a while
- I think I finally remember how to use a standalone puppetmaster
- ERC going well and DEI moving ahead
- Dan got changes to buildkit merged upstream
- Seems like we're pretty close to how auth will work for publishing images from GL
- serviceops are plodding ahead on GitLab physical machines
- CI for blubber in gitlab
- update scap backport to work with new zuul plugin
- new tests for scap backport
- scap tests run without deprecation warnings (for stretch, buster, and bullseye)
- Giuseppe plans to enable always-restart-php-fpm on Thursday.
- Docs for GitLab are somewhat less crappy than they were a week ago
- Upgraded Gerrit in train-dev to match production
- Hired Backfill
- Phabricator deployment runbook https://wikitech.wikimedia.org/wiki/Phabricator/Deployment
June
[edit]- GitLab Sprint summary by Brennen https://phabricator.wikimedia.org/phame/post/view/288/gitlab-a-thon/
- We have GitLab on new metal, and can probably enable GL Container Registry \o/
- We know more about git than we did in May
- Functional scap already self-installed in prod
- JWT presentation!
- Phab deployment has a runbook https://wikitech.wikimedia.org/wiki/Phabricator/Deployment
- scap scap
- successfully used it
- ITCs getting done
- somebody noticed phab
- scap backport revert
- scap rollback
- scap pushing commits with a shared ssh key
- Chad has some equipment: monitor and docking station required
- Dan's rested
- uneventful train
- Antoine got rest api in gerrit working for jsonschema
21/22-Q3
[edit]January
[edit]- New release of Java Gearman plugin
- Gerrit upgrade! (3.3.6 → 3.3.9)
- Ahmon debugging of contint1001
- mwcli coolest tool honorable mention
- Production benchmarks of yaml parsing for MW
- New test environment progress for GitLab
- Plan for trusted runners coming together
- Scap prep auto working in beta (on the way to prod)
- Gerrit 3.3→3.4 prework to solve javascript incompat
- CI-agents upgrade prep for stretch -> bullseye
- MWCLI progress and GoLand for IntelliJ
February
[edit]- logspam-watch is fast now because Ahmon is good at computers and can tolerate modifying Perl
- Wrapping up MediaWiki settings loader expedition caching—abstractions for MediaWiki config settings—they can declare how they're cached and how long (with probablistic early expiry to help with lock contention)
- Mukunda has freed himself from train :D
- Jaime joined, already has a scap patchset
- Things we didn't do but benefit from: New GitLab test instance in devtools project: https://gitlab.devtools.wmcloud.org/ --> same puppet as prod!
- Tyler's DigitalOcean exploration spike
- Scap prep auto! is neat <3
- Antoine's qemu learnings!
- moving train-dev to helm3
March
[edit]- Scap 4.4.1 released (includes container image building stuff) \o/
- Brennen talked publicly and was not shamed by it
- Onboarding Jaime!
- Dan's blubber demo!
- docker build can use a blubber file directly now
- Supports bd808 and developer tooling
- Opens up for a more flexible release model
- Job post posted
- scap backport exits if change is not mergeable
- Trainsperiment was instructive AND SUCCESSFUL!
- Got a working mw container image deployed from deploy1002 woohoo 🐳🎉
- Fixed check-new-errors script (extremely tiny win) \o/
- Jaime's first scap release!
- I don't have to press enter any more! (added -n)
- Train schedule worked!
- deploy-promote upstreamed to scap!
21/22-Q2
[edit]December
[edit]- Greg now director of FR tech!😅
- Performance scap improvements (beta-scap-sync-world takes less time than ever)
- Scap backtrace much cleaner (fewer of them for common error situations)
- Several small Phabricator improvements including:
- improvements to in-progress status for workboards
- anti-vandalism feature to prevent merging more than 5 tasks in a single transaction.
- https://phabricator.wikimedia.org/T298063
- https://phabricator.wikimedia.org/T288956
- https://phabricator.wikimedia.org/T295934
- https://phabricator.wikimedia.org/T297249
- Test wiki that has NO LocalSettings.php — just uses yaml
- Pipelinelib for helm3
- Scap backport validation
November
[edit]- Deploy MediaWiki manually in Train-Dev to k8s!
- Pipeline supports copying files out of containers as published artifacts in Jenkins!
- Deployed Tuesday with a script thing!
- Scap 4.0.3 release!
- record time for scap release
- Worked?!l out a path forward for GitLab runner architecture with SRE; moved projects to top-level group with runners: https://gitlab.wikimedia.org/repos \o/
- https://gitlab.wikimedia.org/thcipriani/bacon-stats#-bacon-window-stats
- Data³ dashboard new stuff!
October
[edit]- Scap 4 release!! Now with more Python
- GitLab upgrades
- Gerrit added to train-dev unblocks scap backport dev
- Data^3d is functional and ready for demos
- More people using train dev
- Antoine uses chrome^Hium |not sure that is a win|
- Gerrit upgrade to 3.3.6 (fixes some minor ui glitches)
- Client side errors are blocking the train
- GitLab is open to all
- dashboards getting close to demo-worthy? http://173.17.185.55:8001/-/dashboards/project-metrics?project=PHID-PROJ-uier7rukzszoewbhj7ja
21/22-Q1
[edit]September
[edit]- Upgraded GitLab to 14.x release
- Migrated dev-images repo to GitLab
- GitLab usernames fixed
- Productive collaboration with an upstream https://github.com/rclement/datasette-dashboards/issues/9
- Successfully set up gitlab ci on ddd: https://gitlab.wikimedia.org/releng/ddd/-/pipelines/665
August
[edit]- Started dev-images to buster
- Gerrit 3.3
- Successful php_fpm_always_restart: true test (https://phabricator.wikimedia.org/T266055)
- GitLab soft launch
- migrated mw-cli to gitlab, got docker-in-docker integration tests working (thanks addshore)
- Finished dev-images to buster
- Merged workboard metrics code!
- Reviewed on GitLab
- GitLab code review experience ftw
- A successful (?) interaction with GitLab upstream
- GitLab upstream merge request in progress
- Node 14 patch updated
- Emacs installed on releasesXXXX servers
- Mukunda learned how to extend datasette with ddd/phab functionality
July
[edit]- Projects exist on GitLab
- Gerrit upgrade pairing
- Published local dev cli
20/21-Q4
[edit]April
[edit]- Scap 3.17.1 tagged
- GitLab Ansible code review
- Deployment trainings
May
[edit]- Quibble 0.0.47
- Jenkins upgrade to latest LTS
- Released new upstream Jenkins Gearman plugin
- Wikitech Gerrit docs updated
- data³ used successfully to extract train blocker stats from Phabricator
- Added transaction metadata to Phabricator task transactions api so that tools can get more detailed transaction details required for the train blockers analysis.
- Quibble weekly meetings
- gitlab.wikimedia.org is running (still needs cas registration)
- Documented the process for adding languages to phabricator, as well as maintaining the translation strings from translatewiki. All of this is now documented in the README for the phabricator translation repo. That change can be seen here: https://phabricator.wikimedia.org/rPHTR0de9c13ef996326a99d6320f4c26669901f3aff4
June
[edit]- Knowledge transfer on Gerrit deployments
- Running gitlab.wikimedia.org, real use now
- Guiseppe reports: curl -H 'Host: en.wikipedia.org' https://staging.svc.eqiad.wmnet:4444/wiki/Main_Page works
- Automatic notification of security patch application failures. One real use so far.
20/21-Q3
[edit]January
[edit]- Update dev images to split apache and php containers for local dev
- Gerrit security bug discovery and deployed fix by Antoine
- In sync with Gerrit upstream war (Java compiled code)
- Target releases for apt packages in blubber deployed so wuvi can use npm
February
[edit]- PipelineLib fully working on releases-jenkins.wikimedia.org
- Rust introduction talk (not strictly RelEng business)
- logspam-watch
- Minimum hits consolidation feature
- Error histograms, at-a-glance status indicators (emoji, it's emoji), improved UTF-8 handling and terminal resizing
- Gearman plugin deployed. Merged bunch of pending changes + a fork from GoodData company which adds support for Pipeline jobs
March
[edit]- PipelineLib fully working on releases-jenkins.wikimedia.org
- Credentials added to pipelinelib
- S&F contractors underway with production GitLab configuration
- Terrible script for finding status of production errors on logstash dashboard
- Ability to deploy phatality updates
- scap apply-patches much improved
20/21-Q2
[edit]October
[edit]- GitLab consultation
November
[edit]- Gerrit security upgrade
- Gerrit grafana dashboard
- Created pipelinelib-experimental cloud project for working on pipelinelib
- Scap 3.16.0 release (tagged, waiting on SRE now)
- logspam-watch improvements
- apparently scap apply-patches may possibly work in some circumstances
- Upstream fix for shallow cloning in git: https://github.com/git/git/commit/fb3d1a083f776f02caa514cad8b232d8b974641f
December
[edit]- Scap 3.16.0 released and deployed
- Dropped scap plugins from mw-config
- unconditional restart on deploy for opcache corruption deployed
- https://doc-stage.wmcloud.org/ , staging area for doc.wikimedia.org. Next prod then update related docs.
- Scap source formatted with Black now
- Runnable runbook blog
20/21-Q1
[edit]July
[edit]- CI now supports REL1_35 branches (and ignores REL1_33).
- Eliminate elasticsearch dependency from Phabricator search engine
- Cassandra Docker image
- Jenkins node Docker image cleanup & re-onlining after disk space recovers
- Collection of disk space stats on Jenkins workers
- Credentials and environment variables in PipelineLib
- Blubber now correctly supports multi-stage artifact copies
August
[edit]- Reduced the number of non-failure FAILURE messages in CI
- After 9 months, Aphlict is finally back.
- Scap version 3.15.0 released (in git, if not as .deb yet)
September
[edit]- Scap 3.15.0 was deployed to all servers
- We have trained another person to conduct the train
- New phabricator metrics / stats in the project reports (currently deployed to cloud, prod coming soon)
- image promotion in CI
- Tiny incremental improvements to logspam-watch (it now shows seconds)
- GitLab consultation well underway
- Released Quibble 0.0.45 https://doc.wikimedia.org/quibble/changelog.html
- Local development mailing list and updates page
19/20-Q4
[edit]April
[edit]- Docker images published on buster-based contint2001 (as part of general temporary switch-over from contint1001 to 2001 for buster migration)
- Composer is now authenticated with github
- Dropped basic PHP 7.1 testing from CI
- Published Kubernetes migration tutorial
- Phabricator milestone columns can now be moved on workboards
- Phabricator workboards can be sorted by most recent activity.
- Tech talk on PGP basics
- "Cache of wmf-config/InitialiseSettings often 1 step behind" fixed! - task T236104
May
[edit]- The release train branch cut is now an automatic job
- Wikimedia Portals build and WDQS data release jobs moved to docker
- The Continuous Integration instances on WMCS have been fully migrated off Jessie! T236576
- Scap 1.14.0 released (by releng) and deployed (by serviceops)
- Documentation for setting up a local dev environment for Phabricator: https://www.mediawiki.org/wiki/Phabricator/Local_Dev_Environment
- CI server (contint) migrated to buster!
June
[edit]- Scap plugins will move from mediawiki-config to scap git repository with the next release.
- Deployment script added to deployment-charts for deploying to k8s
- MediaWiki branch cuts are fully automated, at last!!!!
- TMH job runner works in MediaWiki-Docker
- Interactive logspam-watch
- Gerrit 3.2.2
19/20-Q3
[edit]January
[edit]February
[edit]- Production releases of Parsoid/PHP now also go through final pre-production tests
- Scap release 1.13.0
- Local development MediaWiki docker environment has shipped and been announced - https://lists.wikimedia.org/pipermail/wikitech-l/2020-February/093109.html / https://www.mediawiki.org/wiki/Docker
March
[edit]- scap has its first integration test
- MediaWiki tarball / Wikimedia production are now PHP 7.4-compatible.
- All extension and skin repos are now being tested against PHP 7.4.
- Analytics Refinery release job isolated into a Docker container.
19/20–Q2
[edit]December
[edit]- PHP 7.4 testing was available in CI the first "business day" after 7.4.0 was released.
- Revived "This week in logspam" email
- Auto DBLists
- PGP Key repo
- Production config now has pre-merge diff reports, e.g.: https://integration.wikimedia.org/ci/job/operations-mw-config-php72-composer-diffConfig-docker/86/console
November
[edit]- branch.py for cutting the branch for train
- logspam-watch for tailing logfiles
October
[edit]- Dev images are now automatically created as part of postmerge via the pipeline for:
- Parsoid
- Soon: RestBASE
- (different from RESTbase? ;-))
- Selenium documentation updated https://www.mediawiki.org/wiki/Selenium/Node.js
- Quibble 0.0.36 released https://lists.wikimedia.org/pipermail/wikitech-l/2019-October/092658.html
- Quibble 0.0.37 released https://lists.wikimedia.org/pipermail/wikitech-l/2019-October/092660.html
- Quibble 0.0.38 & 0.0.39 released for mediawiki/tools/api-testing
- Introducing Phatality - Streamlined error reporting from Kibana to Phabricator https://phabricator.wikimedia.org/phame/post/view/177/introducing_phatality/
- HHVM removed from CI and MediaWiki.
- Gerrit is on gerrit1001 now
- … and so is most of the code review. ;-) :)
- Unforked Jenkins Job Builder
19/20-Q1
[edit]September
[edit]- Scap 3.12.1-1 released/deployed
- Refactored Zuul layout to use per-branch pipelines
quibble -c
Lets you run arbitrary code against a working MediaWiki install- The phabricator "Report Error Code" form (https://phabricator.wikimedia.org/maniphest/task/edit/form/46/ ) has been updated with separate fields for the stack trace and error code/request id.
- T232608 Delete selenium-daily-beta-EXTENSION Jenkins jobs that are broken more than 30 days
- Write cached config to JSON as well as serialised PHP https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/533592/ (first step towards a saner config)
- MediaWiki PHP support target modernised from 7.0+ to 7.2+ for 1.34 onwards. https://phabricator.wikimedia.org/T228342
- Quibble 0.0.35 release
- 1.34.0-wmf.24 branch cut was done /mostly/ with branch.py instead of make-wmf-branch.php (some small bugs remain to work out but it's very close)
- Creating accounts was broken on beta cluster since 2019-09-08. It was fixed today (2019-09-25). https://phabricator.wikimedia.org/T232796
- Phatality extension for Kibana deployed to production and used for reporting production errors into Phabricator.
- Train blocker tasks created for 1.35.0-wmf.1—1.35.0-wmf.25
- Dev images are now automatically created as part of postmerge via the pipeline for MediaWiki
August
[edit]- Read only "gerrit-replica" active, handling 10% of all traffic (read from phab)
- https://time.releng.team ¯\_(ツ)_/¯
- Scap 3.12.0-1 in production
July
[edit]- Migrated all generic CI jobs from PHP 7.0 to PHP 7.2 https://phabricator.wikimedia.org/T225457
- Three new folks have been spun up on and have successfully run the Train, by end-of-month
- it-phabricator plugin updated; fixes errors in All-Users repo in Gerrit
- Completed first book club iteration: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club/Continuous_Delivery
- Unit vs Integration test split announcement: https://phabricator.wikimedia.org/phame/post/view/169/changes_and_improvements_to_phpunit_testing_in_mediawiki/
- Gerrit 2.15.14 deployed
- Contint1001 now storing docker images on separate partition
- Blubber 0.8.0 deployed - https://lists.wikimedia.org/pipermail/wikitech-l/2019-July/092344.html
- Deployment Pipeline docs published on Wikitech - https://wikitech.wikimedia.org/wiki/Deployment_pipeline
18/19-Q4
[edit]June
[edit]- Speculative CI meta-architecture published within WMF for feedback (two versions)
- Old image versions automatically removed from jenkins agents when /var/lib/docker space > 80%
- scap 3.10.0 cut
- Jenkins build timings reports: https://people.wikimedia.org/~dduvall/jenkins/
- Helped Kask team sketch an outline of its architecture (https://www.mediawiki.org/wiki/Kask)
- Fatal Monitor with marker lines for deployments: https://logstash.wikimedia.org/app/kibana#/dashboard/77cc3e90-aa27-11e7-9109-51bd3197f7a9?_g=()
May
[edit]- Repository-hosted CI/CD pipeline configurations now supported (.pipeline/config.yaml) - https://phabricator.wikimedia.org/T210267
- Train notes published on branch cut
- Codehealth pipeline beta - https://phabricator.wikimedia.org/phame/live/1/post/160/introducing_the_codehealth_pipeline_beta/
- Some baseline local development images published
April
[edit]- Phabricator vandalism rollback tool completed 🎉 (blog post? 😉)
- Upgrade Zuul to 2.5.1-wmf6 (which unblocks the Gerrit upgrade to 2.16) - https://phabricator.wikimedia.org/T208426
- Team offsite in Chicago
18/19-Q3
[edit]March
[edit]- CI tooling future WG started, blogged
- GerritBot comments on patches going through the pipeline (with fancy badges and the like)
- Train deploy notes are now automatically generated on branch push
- Scap 3.9.2-1 released in production
- Phabricator upgrade: https://phabricator.wikimedia.org/phame/post/view/147/projects_forms_and_subtypes_oh_my/
- Published the ISOSTWG results and recommendation on officewiki and announced: https://office.wikimedia.org/wiki/Internal_Support_for_Open_Source_Tools_Working_Group
- swat tags now show up in the deployment schedule (via lua magic)
- Blog post: https://phabricator.wikimedia.org/phame/post/view/152/help_my_ci_job_fails_with_exit_status_-11/
- CI future WG report: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/CI_Futures_WG/Report
- Blog post: https://phabricator.wikimedia.org/phame/post/view/155/quibble_hibernated_it_is_time_to_flourish/
- Published a CLI tool to roll back vandalism in phabricator.
Feb
[edit]- blubber uses blubberoid.wikimedia.org in the pipeline and pipeline is almost there for end-to-end functionality (can't yet deploy to production, but nearly can)
- scap development back on gerrit -- new contributors
- local-charts repo created
- docker SIG announced/setup
- Developer satisfaction survey results https://www.mediawiki.org/wiki/Developer_Satisfaction
- Scap 3.9.0-1 released in production
- Deployed wmf.18
- Updated Phabricator to 2019-02-20 release, blog posted detailing some changes: https://phabricator.wikimedia.org/phame/post/view/145/phab_phebruary/