Wikimedia Release Engineering Team/Checkin archive/20180917
Appearance
2018-09-17
[edit]Vacations/Important dates
[edit]- Mid september - Mid october, Antoine to take off some weeks/days/part time
- October 5th (Friday) - Željko on a conference (https://2018.webcampzg.org/ )
- October 8th - Holiday (Indigenous People's Day, Independence Day - Željko)
- November 1 (Thursday) - Holiday (All Saints' Day - Željko)
- November 9th - Holiday (Veteran's Day)
- November 22+23 - Holidays (Thanksgiving)
- Week of December 3rd - Team offsite
- December 24-28 - Holidays (Christmas)
Rotating positions
[edit]Train
[edit]- Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R
- July 02 - wmf.11 - Zeljko - no train, Fourth of July
- July 09 - wmf.12 - Zeljko
- July 16 - wmf.13 - Zeljko
- July 23 - wmf.14 - Zeljko
- July 30 - wmf.15 - Mukunda
- Aug 06 - wmf.16 - Mukunda
- Aug 13 - wmf.17 - Mukunda (No train - Wednesday is a holiday)
- Aug 20 - wmf.18 - Tyler
- Aug 27 - wmf.19 - Dan && Antoine lurking over the shoulders
- Sep 03 - wmf.20 - Antoine
- Sep 10 - wmf.21 - Antoine (No train due to DC switchover)
- Sep 17 - wmf.22 - Antoine <----
- Sep 24 - wmf.23 - Zeljko (only one week for me? -- Željko)
- Oct 01 - wmf.24 - Dan
- Oct 08 - wmf.25 - Dan (No train due to DC switchover)
- Oct 15 - wmf.26 - Mukunda (last 1.32 wmf.XX release, 1.33 starts the next week)
- Oct 22 - wmf.1 - Mukunda
SoS
[edit]- July 04 - Dan
- July 11 - Antoine
- July 18 - Antoine
- July 25 - Tyler
- Aug 01 - Tyler
- Aug 08 - Zeljko
- Aug 15 - Dan (No SoS this week)
- Aug 22 - Zeljko
- Aug 29 - Zeljko
- Sep 05 - Tyler / Željko
- Sep 12 - Tyler / Željko
- Sep 19 - Dan / Željko <----
- Sep 26 - Zeljko
- Oct 03 - Zeljko
- Oct 10 - Zeljko
- Oct 17 - Zeljko
- Oct 24 - Zeljko
- Oct 31 - Zeljko
Team Business
[edit]Hiring
[edit]- Software Engineer position open and reviewing/hiring for now
First Offsite
[edit]Details:
- Week of December 3rd
- At the Queen Mary hotel in Long Beach
- Deb T will be facilitating
Topics!
Development plans
[edit]- Due end of month
- We'll review on Wednesday the 26th
Needs attention
[edit]- 2018-09-10 -- Gerrit Privacy Policy & CoC patch
- https://phabricator.wikimedia.org/T196835
- 2018-09-17 -- Patches for new UI:
- (ops/puppet) Replace polygerrit theme in repo: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/458523/
- (gerrit) Remove from repo: https://gerrit.wikimedia.org/r/#/c/operations/software/gerrit/+/458524/
- (ops/puppet) Add footer link for new UI: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/458833/
- (ops/puppet) Add footer link for old UI: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/460914/
- All applied to: http://gerrit.tylercipriani.com:8080
- 2018-09-10 -- Run mediawiki::maintenance scripts in Beta Cluster
- https://phabricator.wikimedia.org/T125976
- Tyler to create instance
- 2018-09-17 - not done
- [Ops] Use of mwdebug2XXX for mediawiki deployers during codfw switch
Google Code In ?
[edit]- https://lists.wikimedia.org/pipermail/wikitech-l/2018-September/090799.html
- interest? Need small/easy-ish tasks that you're willing to help someone think through and review.
Scrum of Scrums
[edit]- Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums
This week
[edit]Release Engineering
[edit]- Blocked by:
- [WMCS] Increased quotas for vcpu and memory in integration project: https://phabricator.wikimedia.org/T204373
- Blocking:
- Updates:
- Train Health: no train last week due to DC switchover, train continues this week
- Log Health:
- Code Health:
- Code Health Metrics Working Group Kickoff last week
- Code Health Metrics Working Group meeting this week - further discuss/define the workgroup's scope and next steps
Last week
[edit]Release Engineering
[edit]- Blocked by:
- DBA (in support of Reedy): https://phabricator.wikimedia.org/T174802 (EducationProgram db dump in prep of removing the extension)
- Blocking:
- Language RelEng to review: https://gerrit.wikimedia.org/r/450508
- Updates:
- Train:
- we had a UBN! backport needed on Thursday ( https://phabricator.wikimedia.org/T203566 )
- This has been thoroughly documented in https://phabricator.wikimedia.org/T156541 and it is a regularly recurring problem which causes production breakage every time the structure of a class is changed in an incompatible way. We can do better!
- Log Health:
- Exception thrown for failure to save settings appears ~ 1000 times/day: https://phabricator.wikimedia.org/T202149 (Note: add to SoS Callouts)
- Train:
Train status and happenings
[edit]
Past week status updates
[edit]- All of it in table form: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201718Q4
Quaterly Goals for Q1
[edit]TEC 1
[edit]Output 1.1
- Determine the procedure and requirements for an automated MediaWiki branch cut
RAW NOTES OMG
[edit]- Generate a deployment, and reporting
- automate branch cut
- automate change log upload
- reupdate it upon backports
- add all people with commits in the current train as subscribers to the weekly train task
- many people would always be on there, though?
- deployment metrics
- # of commits/committers
- on schedule/rollbacks
Pipeline: Move verify stage from Minikube to CI k8s namespace in production context
[edit]- Output 3.1
- Zotero v2
- graphoid
- blubberoid
- Develop set of metrics to assess incident reports/post mortems. (NB: see the killer spreadsheet)
PUNT:
- Determine how to gather the Code Health metrics in a programmatic way
- Q3: Create a deployments report with metrics from the Code Health Group.
Code Health
[edit]- T199253 - Investigate and propose record of origin (ROO) for deployed code (currently Developers/Maintainers page)
- reached out to Mark/Faidon to talk about proposal.
- Perform existing Stewardship review process for Q1 cycle.
- nothing to review this Q
- T199254 - Add test evaluation to post mortem review process.
- Review existing e2e test coverage.
- Define prioritization scheme.
- Zeljko and I talked about prioritizing scheme.
- Prioritize e2e testing gaps.
- T199257 - make current unit testing coverage more visible by reporting out to Engineering Management.
- T199259 - Platform and Search Platform teams are using TDM PoC
- on hold
- T199262 - Identify key Tech Debt areas
- on hold
- T199263 - Put in place Tech Debt management process for PEP
- on hold
- T199261 - Define base Code Health metric set.
- held workgroup kickoff meeting. Additional async discussions took place as well. Meeting again this week.
Developer Productivity
[edit]- Make a hire to create the capacity needed for this program.
- Write and share a survey to measure developer satisfaction and areas for investment. - task T197635
Other work
[edit]Selenium
[edit]- Q1 goals task: T198389 Q1 Selenium framework improvements
- T179188 Video recording for Selenium tests in Node.js
- Patch in final review. Timo said code is fine but videos don't work. I've checked and videos work. :| https://gerrit.wikimedia.org/r/c/mediawiki/core/+/422933
- T199113 All repositories with Selenium tests should use wdio-mediawiki
- 3 out of 13 repos remaining (AdvancedSearch, TwoColConflict, WikibaseLexeme), at legendary 80%, so about 20% of code (or 80% of effort) still TODO
- T188742 Run tests daily targeting beta cluster for all repositories with Selenium tests
- 4 out of 13 repos failing (31%), all errors are due repos not using wdio-mediawiki
- T179188 Video recording for Selenium tests in Node.js
Gerrit
[edit]- Merging accounts in notedb feature request (of interest, not us directly): https://bugs.chromium.org/p/gerrit/issues/detail?id=9716
Phabricator
[edit]- Working on a phab blog post about task types and custom fields. Will ask for editorial review before posting.
Jenkins
[edit]- Planning migration for m1.medium instances to bigram instances, based on stats from last week showing improvement gains with the latter
QA
[edit]Standup!
[edit]Antoine
[edit]- What I plan to do this week
- What I'm blocked on
- Other?
Dan
[edit]- What I plan to do this week
- Get integration/{pipelinelib,config} patches merged (finish up https://phabricator.wikimedia.org/T196940)
- Continue futzing with an integration-prometheus to collect Jenkins build stats
- Migrate m1.medium instances to bigram instances and continue comparing stats
- Configure some bigram instances with 6 executors and compare stats with those w/ 4
- What I'm blocked on
- Quota increase for integration project https://phabricator.wikimedia.org/T204373
- Other?
Greg
[edit]- What I plan to do this week
- read and do manager development task
- get everyone's dev plan in place (mostly)
- get our Q2 goals ready
- What I'm blocked on
- Other?
Jean-Rene
[edit]- What I plan to do this week
- Review Queue refresh/ROO work
- Code Health Metrics
- Code Coverage report
- What I'm blocked on
- Other?
- New laptop seems to no longer be captured in the web of enterprise monitoring
Mukunda
[edit]- What I plan to do this week
- Lots of writing this week
- Phab custom fields blog post
- scap swat documentation
- Finish development plan
- Lots of writing this week
- What I'm blocked on
- Other?
Tyler
[edit]- What I plan to do this week
- CoC gerrit and such
- Make deployment-mwmaint (try to)
- reviews
- keyholder
- pipeline
- development plan stuffs
- maintenance script still has some timeouts
- What I'm blocked on
- Other?
Zeljko
[edit]- What I plan to do this week
- T179188 Video recording for Selenium tests in Node.js
- Patch in final review. Timo said code is fine but videos don't work. I've checked and videos work. :| https://gerrit.wikimedia.org/r/c/mediawiki/core/+/422933
- T199113 All repositories with Selenium tests should use wdio-mediawiki
- 3 out of 13 repos remaining (AdvancedSearch, TwoColConflict, WikibaseLexeme), at legendary 80%, so about 20% of code (or 80% of effort) still TODO
- T188742 Run tests daily targeting beta cluster for all repositories with Selenium tests
- 4 out of 13 repos failing (31%), all errors are due repos not using wdio-mediawiki
- Code Health Q1
- Review existing e2e test coverage.
- Define prioritization scheme.
- Prioritize e2e testing gaps.
- T179188 Video recording for Selenium tests in Node.js
- What I'm blocked on
- T179188 Video recording for Selenium tests in Node.js
- Patch in final review. Timo said code is fine but videos don't work. I've checked and videos work. :| https://gerrit.wikimedia.org/r/c/mediawiki/core/+/422933
- T179188 Video recording for Selenium tests in Node.js
- Other?
- My 2006 Kia managed to pass yearly techical review this week :flexing biceps:
- Slighly reduced availability this and the next week, visiting doctors for another muscle injury :|
Grooming
[edit]Team Kanban Board Review and Triage
[edit]- closed and touched in the 7 days
- No update for 4 weeks
- No update for 3 weeks
- No update for 2 weeks
- No update for 1 week
- All Open
- Review To Triage column of #releng
Once / month-ish review of backlog(s)
[edit]- releng Review To Triage column of #releng
- releng-kanban Review unassigned in kanban
- releng-kanban Review 'backlog' colum of -kanban
- releng-next - Review for things we need to put on our kanban backlog
- releng-backlog - oh my, the huge backlog of things...