Wikimedia Release Engineering Team/Checkin archive/20180820
Appearance
2018-08-20
[edit]Vacations/Important dates
[edit]- August 13-24: Greg vacation
- August 23-24 (Thursday-Friday): Željko vacation
- August ~: Antoine
- August 29-31: Dan vacation
- September a week or so - Antoine
Rotating positions
[edit]Train
[edit]- Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R
- July 02 - wmf.11 - Zeljko - no train, Fourth of July
- July 09 - wmf.12 - Zeljko
- July 16 - wmf.13 - Zeljko
- July 23 - wmf.14 - Zeljko
- July 30 - wmf.15 - Mukunda
- Aug 06 - wmf.16 - Mukunda
- Aug 13 - wmf.17 - Mukunda (No train - Wednesday is a holiday)
- Aug 20 - wmf.18 - Tyler <----
- Aug 27 - wmf.19 - Dan
- Sep 03 - wmf.20 - Tyler
- Sep 10 - wmf.21 - Dan
- Sep 17 - wmf.22 - Zeljko
- Sep 24 - wmf.23 - Zeljko
- Oct 01 - wmf.24 - Antoine
- Oct 08 - wmf.25 - Antoine
- Oct 15 - wmf.26 - Mukunda (last 1.32 wmf.XX release, 1.33 starts the next week)
- Oct 22 - wmf.1 - Mukunda
SoS
[edit]- July 04 - Dan
- July 11 - Antoine
- July 18 - Antoine
- July 25 - Tyler
- Aug 01 - Tyler
- Aug 08 - Zeljko
- Aug 15 - Dan (probably not SoS because it's a WMF holiday?)
- Aug 22 - Zeljko <---- (Željko can go to SoS for the next few weeks since he has done 1 SoS so far)
- Aug 29 - Mukunda
- Sep 05 - Tyler
- Sep 12 - Tyler
- Sep 19 - Dan
- Sep 26 - Dan
- Oct 03 - Zeljko
- Oct 10 - Zeljko
- Oct 17 - Antoine
- Oct 24 - Antoine
- Oct 31 - Mukunda
Team Business
[edit]First Offsite
[edit]- waiting to hear back confirmation from Travel but... I was told that no more offsites can be scheduled next to TechConf in Portland in October, so the week of Nov 5th it is.
Needs attention
[edit]- Create a production test wiki in group0 to parallel Wikimedia Commons - https://phabricator.wikimedia.org/T197616
- Status: Mark H and Amanda reached out to me, I asked for a meeting with Mark H.
- Re-evaluate use of "Dependent Pipeline" in Zuul for gate-and-submit - https://phabricator.wikimedia.org/T94322
- ^ for antoine
Scrum of Scrums
[edit]- Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums
- Already added the Code Health Metric working group info to the SoS etherpad.
This week
[edit]Release Engineering
[edit]- Blocked by:
- Feedback needed (on how problems could have been prevented) from many people/teams on a recent MediaWiki train related incident report.
- 1.32.0-wmf.13, 9 blockers, feedback needed for 8 of them: https://wikitech.wikimedia.org/wiki/Incident_documentation/20180717-Train
- Aaron Schulz (Performance), Adam Wight (Scoring Platform), Bartosz Dziewoński (Contributors), Brad Jorsch (MediaWiki Platform), C. Scott Ananian (Contributors), Daniel Kinzler (Wikimedia Deutschland), Timo Tijhof (Performance), Prateek Saxena (Audiences Design)
- Feedback needed (on how problems could have been prevented) from many people/teams on a recent MediaWiki train related incident report.
- Blocking:
- MediaWiki 1.29 final release and EOL; was due in June: https://phabricator.wikimedia.org/T197669 (w/ Security)
- Updates
- New general purpose CI job that builds and runs test containers via Blubber/Docker based on config provided in each project (think `.travis.yml` file)
- Read more about Blubber here: https://wikitech.wikimedia.org/wiki/Blubber
- See recent builds at https://integration.wikimedia.org/ci/blue/organizations/jenkins/blubber-test/activity
- Gives developers one major benefit of the CD pipeline work now, having control over their pre-merge and gating tests without having to mess with integration/config
- Only scheduled to run for a few repos at the moment, but will eventually be expanded to many more projects (we need to tune CI infra around it first)
- Looking for more participants to join the Code Health Metrics working group. This group's purpose is to define and later implement a set of core metrics that we will use to asses the health of our code base. More info: https://www.mediawiki.org/wiki/Code_Health_Group/projects/Code_Health_Metrics
- New general purpose CI job that builds and runs test containers via Blubber/Docker based on config provided in each project (think `.travis.yml` file)
Last week
[edit]Last week didn't happen due to holiday
- Blocked by:
- Blocking:
- Feedback needed from various teams (too many to name each one) on two recent MediaWiki train related incident reports. Specifically, how problems could have been prevented.
- 1.32.0-wmf.13, 9 blockers, feedback needed for all of them: https://wikitech.wikimedia.org/wiki/Incident_documentation/20180717-Train
- 1.32.0-wmf.14, 6 blockers, feedback needed for 2 of them: https://wikitech.wikimedia.org/wiki/Incident_documentation/20180724-Train
- Feedback needed from various teams (too many to name each one) on two recent MediaWiki train related incident reports. Specifically, how problems could have been prevented.
- Updates
- Blubber test
- Code health working group -- join up!
- Quarterly cross-dependencies
Train status and happenings
[edit]
Past week status updates
[edit]- All of it in table form: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201718Q4
Quaterly Goals for Q1
[edit]Pipeline: Move verify stage from Minikube to CI k8s namespace in production context
[edit]- Done?
- Dan made a blubberd for fun (`curl -s --data-binary @blubber.yaml http://tools.wmflabs.org/blubber/test`)
- Evaluate strategy for Docker/CI capacity https://phabricator.wikimedia.org/T202160
Code Health
[edit]- T199253 - Investigate and propose record of origin (ROO) for deployed code (currently Developers/Maintainers page)
- Perform existing Stewardship review process for Q1 cycle.
- T199254 - Add test evaluation to post mortem review process.
- Added test evaluation to PM template. Need to wiki-ize it now.
- Review existing e2e test coverage.
- Define prioritization scheme.
- Prioritize e2e testing gaps.
- T199257 - make current unit testing coverage more visible by reporting out to Engineering Management.
- T199259 - Platform and Search Platform teams are using TDM PoC
- T199262 - Identify key Tech Debt areas
- T199263 - Put in place Tech Debt management process for PEP
- T199261 - Define base Code Health metric set.
- Base Workgroup defined (Kumal, Petr, Guillaume, me). TechCom to participate as reviewers.
Developer Productivity
[edit]- Make a hire to create the capacity needed for this program.
- Write and share a survey to measure developer satisfaction and areas for investment. - task T197635
Other work
[edit]Selenium
[edit]- Q1 goals task: T198389 Q1 Selenium framework improvements
- T179188 Video recording for Selenium tests in Node.js
- T193157 Quibble does not install ffmpeg - comments from Antoine at https://gerrit.wikimedia.org/r/c/integration/quibble/+/451645
- T179188 Video recording for Selenium tests in Node.js
Gerrit
[edit]- Kaldari is able to log in again! https://phabricator.wikimedia.org/T197083
- \o/ nice!
Phabricator
[edit]- Antivandalism
- I intend to publish the source for phabricator-antivandalism after I move some parameters to configuration values so that the "secret sauce" isn't in the source. https://phabricator.wikimedia.org/T202080
- I fixed a couple of other false positives last week, need to deploy the code this week.
Jenkins
[edit]QA
[edit]- Had discussion with Audiences team (EMs and QA folks) regarding QA Career path.
Standup!
[edit]Antoine
[edit]- What I plan to do this week
- More Nodepool/Quibble migrations :/
- Write a document about running less tests
- What I'm blocked on
- Mail backlog
- Other?
- Are we migrating Differential repos to Gerrit?
- Mukunda: Harbormaster/Nodepool job is only used for Scap
- Are we migrating Differential repos to Gerrit?
Dan
[edit]- What I plan to do this week
- Refactor service-pipeline job using integration/pipelinelib
- Trying out larger CI instances with more Jenkins executors
- What I'm blocked on
- Other?
Greg
[edit]- What I plan to do this week
- Be on Vacation
- What I'm blocked on
- Other?
Jean-Rene
[edit]- What I plan to do this week
- T199261 - Define base Code Health metric set.
- Organize/set up WG kickoff
- complete Group wiki page and add it to SoS section
- T199254 - Add test evaluation to post mortem review process
- Wikiize PM template
- Perform existing Stewardship review process for Q1 cycle.
- Kickoff Q1 review cycle
- T199261 - Define base Code Health metric set.
- What I'm blocked on
- Other?
Mukunda
[edit]- What I plan to do this week
- Deploy updates to phabricator-antivandalism
- Develop a plan/schedule for upcoming work
- Phabricator wishlist stuff
- Look at swat workflow changes \o/
- What I'm blocked on
- Other?
Tyler
[edit]- What I plan to do this week
- review blubberoid
- releng.team https
- Move dist/pipeline -> .pipeline
- Review paladox work on gerrit avatars
- Deploy depool for nodes where disk is > 95% full
- What I'm blocked on
- Other?
Zeljko
[edit]- What I plan to do this week
- T179188 Video recording for Selenium tests in Node.js
- T193157 Quibble does not install ffmpeg - will merge and deploy today with Tyler https://gerrit.wikimedia.org/r/c/integration/quibble/+/451645
- Retrospective - Train Conducting
- Review edits for two recent train incident reports (.13 and .14) to see if more feedback is needed
- https://www.mediawiki.org/wiki/Wikimedia_Technology/Annual_Plans/FY2019/TEC13:_Code_Health/Goals#Outcome_2_/_Output_2.2
- T179188 Video recording for Selenium tests in Node.js
- What I'm blocked on
- Other?
Grooming
[edit]Team Kanban Board Review and Triage
[edit]- closed and touched in the 7 days
- No update for 4 weeks
- No update for 3 weeks
- No update for 2 weeks
- No update for 1 week
- All Open
- Review To Triage column of #releng
Once / month-ish review of backlog(s)
[edit]- releng Review To Triage column of #releng
- releng-kanban Review unassigned in kanban
- releng-kanban Review 'backlog' colum of -kanban
- releng-next - Review for things we need to put on our kanban backlog
- releng-backlog - oh my, the huge backlog of things...