Wikimedia Release Engineering Team/Checkin archive/20180910
Appearance
2018-09-10
[edit]Vacations/Important dates
[edit]- Mid september - Mid october, Antoine to take off some weeks/days/part time
- October 5th (Friday) - Željko on a conference (https://2018.webcampzg.org/ )
- October 8th - Holiday (Indigenous People's Day, Independence Day - Željko)
- November 1 (Thursday) - Holiday (All Saints' Day - Željko)
- November 9th - Holiday (Veteran's Day)
- November 22+23 - Holidays (Thanksgiving)
- Week of December 3rd - Team offsite
- December 24-28 - Holidays (Christmas)
Rotating positions
[edit]Train
[edit]- Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R
- July 02 - wmf.11 - Zeljko - no train, Fourth of July
- July 09 - wmf.12 - Zeljko
- July 16 - wmf.13 - Zeljko
- July 23 - wmf.14 - Zeljko
- July 30 - wmf.15 - Mukunda
- Aug 06 - wmf.16 - Mukunda
- Aug 13 - wmf.17 - Mukunda (No train - Wednesday is a holiday)
- Aug 20 - wmf.18 - Tyler
- Aug 27 - wmf.19 - Dan && Antoine lurking over the shoulders
- Sep 03 - wmf.20 - Antoine
- Sep 10 - wmf.21 - Antoine (No train due to DC switchover) <----
- Sep 17 - wmf.22 - Antoine
- Sep 24 - wmf.23 - Zeljko (only one week for me? -- Željko)
- Oct 01 - wmf.24 - Dan
- Oct 08 - wmf.25 - Dan (No train due to DC switchover)
- Oct 15 - wmf.26 - Mukunda (last 1.32 wmf.XX release, 1.33 starts the next week)
- Oct 22 - wmf.1 - Mukunda
SoS
[edit]- July 04 - Dan
- July 11 - Antoine
- July 18 - Antoine
- July 25 - Tyler
- Aug 01 - Tyler
- Aug 08 - Zeljko
- Aug 15 - Dan (No SoS this week)
- Aug 22 - Zeljko
- Aug 29 - Zeljko
- Sep 05 - Tyler / Željko
- Sep 12 - Tyler / Željko <----
- Sep 19 - Dan
- Sep 26 - Dan
- Oct 03 - Zeljko
- Oct 10 - Zeljko
- Oct 17 - Antoine
- Oct 24 - Antoine
- Oct 31 - Mukunda
Team Business
[edit]Hiring
[edit]- Accepted!
- October 8th start day
- Time to review https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Onboarding (Greg did some on Friday)
- Train the first week?
- Software Engineer position open and reviewing/hiring for now
First Offsite
[edit]Details:
- Week of December 3rd
- At the Queen Mary hotel in Long Beach
- Deb T will be facilitating
Topics!
Needs attention
[edit]- Gerrit Privacy Policy & CoC patch
- Run mediawiki::maintenance scripts in Beta Cluster
- https://phabricator.wikimedia.org/T125976
- Tyler to create instance
- Deprecate and remove the EducationProgram extension from Wikimedia servers after June 30, 2018
- https://phabricator.wikimedia.org/T125618
- legoktm poked thcipriani about it in IRC
- add to SoS for DBA review of Reedy's proposal on the subtask
- eqiad row D switch upgrade (email with Greg and Mukunda on thread)
- m3 db (phabricator) effected
- either week of Sept 17 or 24th
- mukunda to reply :)
Google Code In ?
[edit]- https://lists.wikimedia.org/pipermail/wikitech-l/2018-September/090799.html
- interest? Need small/easy-ish tasks that you're willing to help someone think through and review.
Scrum of Scrums
[edit]- Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums
This week
[edit]Release Engineering
[edit]- Blocked by:
- DBA (in support of Reedy): https://phabricator.wikimedia.org/T174802 (EducationProgram db dump in prep of removing the extension)
- Blocking:
- you tell us :)
- Updates:
- Train:
- we had a UBN! backport needed on Thursday ( https://phabricator.wikimedia.org/T203566 )
- This has been thoroughly documented in https://phabricator.wikimedia.org/T156541 and it is a regularly recurring problem which causes production breakage every time the structure of a class is changed in an incompatible way. We can do better!
- Log Health:
- Exception thrown for failure to save settings appears ~ 1000 times/day: https://phabricator.wikimedia.org/T202149 (Note: add to SoS Callouts)
- Train:
Last week
[edit]Release Engineering
[edit]- Blocked by:
- Noise from https://phabricator.wikimedia.org/T201082 during Train deployment (not really blocked but distracted)
- Blocking:
- Updates
- Train:
- 1.32.0-wmf.20 at group 1, no problems
- on European time this week
- No train next week, DC switchover
- Log Health:
- Exception thrown for failure to save settings appears ~ 1000 times/day: https://phabricator.wikimedia.org/T202149
- labtestweb2001 is sending updates to a read-only db host: db2037: https://phabricator.wikimedia.org/T201082
- ErrorException from line EducationProgram PHP Notice: Undefined variable: retValue: https://phabricator.wikimedia.org/T203577
- Train:
Train status and happenings
[edit]
Past week status updates
[edit]- All of it in table form: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201718Q4
Quaterly Goals for Q1
[edit]Pipeline: Move verify stage from Minikube to CI k8s namespace in production context
[edit]- Done
Code Health
[edit]- T199253 - Investigate and propose record of origin (ROO) for deployed code (currently Developers/Maintainers page)
- talked to Tyler a bit about this. Also need to hook up with SRE (and other stakeholders). This appears to be tightly coupled with the review queue process.
- Perform existing Stewardship review process for Q1 cycle.
- no reviews requested at the moment. Corey requested to meet with me today to discuss finding homes for some the platform code.
- T199254 - Add test evaluation to post mortem review process.
- Review existing e2e test coverage.
- Define prioritization scheme.
- Prioritize e2e testing gaps.
- T199257 - make current unit testing coverage more visible by reporting out to Engineering Management.
- worked on creating a template for the first montly report. Actually thinking that this will be part of a broader Code Health monthly newsletter.
- T199259 - Platform and Search Platform teams are using TDM PoC
- T199262 - Identify key Tech Debt areas
- T199263 - Put in place Tech Debt management process for PEP
- T199261 - Define base Code Health metric set.
- scheduled WG kickoff meeting (tomorrow)
Developer Productivity
[edit]- Make a hire to create the capacity needed for this program.
- Write and share a survey to measure developer satisfaction and areas for investment. - task T197635
Other work
[edit]Selenium
[edit]- Q1 goals task: T198389 Q1 Selenium framework improvements
- T179188 Video recording for Selenium tests in Node.js
- patch working, Timo requested some changes https://gerrit.wikimedia.org/r/c/mediawiki/core/+/422933
- T188742 Run tests daily targeting beta cluster for all repositories with Selenium tests
- almost done for repos that had nodepool jobs, all jobs green, patch in final review https://gerrit.wikimedia.org/r/c/integration/config/+/443931
- working on repos that did not have jobs https://gerrit.wikimedia.org/r/c/integration/config/+/457882
- T185011 Create selenium-daily-beta-MediaWiki daily Jenkins job
- job green because it's running only tests that pass on beta https://integration.wikimedia.org/ci/view/Selenium/job/selenium-daily-beta-MediaWiki/
- patch in review https://gerrit.wikimedia.org/r/c/integration/config/+/457881
- I have to investigate which tests could run on beta with little/some/much refactoring
- T179188 Video recording for Selenium tests in Node.js
Gerrit
[edit]Phabricator
[edit]- Nothing significant to note this week.
Jenkins
[edit]- mediawiki-quibble docker jobs fails due to disk full
- https://phabricator.wikimedia.org/T202457
- running containers eating into /
- Docker devicemapper
- Statsd publisher -- still going? still needed?
- Either needs to be expanded to collect more data or ditched
- Doesn't allow for metadata so it's harder to narrow down stats to particular segments
- In the end, it was a lot easier to pull stats from the Jenkins JSON API, but Jenkins only keeps 30 days worth of build data around
- Can we simply extend the Jenkins retention period?
QA
[edit]- all quiet - no additional discussions have occurred since the initial barrage.
Standup!
[edit]Antoine
[edit]- What I plan to do this week
- disk space cleanup patch
- new version of chromium in debian
- What I'm blocked on
- articleplaceholder quint test fixes
- donationinterface composer-merge bug
- Other?
- Last train:
- database corruption from parer-cache output
- Last train:
Dan
[edit]- What I plan to do this week
- Helping with CI disk-full problems – https://phabricator.wikimedia.org/T202457
- Rolling back stats publisher (wah wah)
- Starting up new bigmem instance
- Looking at JSON API based stats for tmpfs change and node types
- What I'm blocked on
- Disk full problems in CI
- Other?
Greg
[edit]- What I plan to do this week
- Hiring/resume review/getting more applicants
- Development Plans - mine, yous'ins
- read the first chapter of https://www.worldcat.org/title/leadership-pipeline-how-to-build-the-leadership-powered-company/oclc/47009595 (everything is a pipeline)
- make a thing based on that ^ that's due, uh, soon, this week?
- Staging catchup?
- What I'm blocked on
- time, I'm late in doing a required manager training :(
- Other?
Jean-Rene
[edit]- What I plan to do this week
- Continue work on ROO/Review Queue
- Continue work on Code Coverage/Code Health report
- Code Health Metrics workgroup is spinning up
- Talk with Corey about platform code ownership
- What I'm blocked on
- Got new laptop (yay!). But it's got a firmware PW :-( and as a result can't migrate my current laptop's configuration.
- WTF?!
- Got new laptop (yay!). But it's got a firmware PW :-( and as a result can't migrate my current laptop's configuration.
- Other?
Mukunda
[edit]- What I plan to do this week
- Work on developer productivity survey.
- Finish phabricator support for elasticsearch 6.
- Finish testing the phabricator spam-revert tool.
- Pairing with Tyler (task TBD, probably catch up on scap and keyholder stuff)
- Create a personal workboard in phab.
- What I'm blocked on
- Other?
Tyler
[edit]- What I plan to do this week
- Fix eval.jit=1 via scap (I'm evidently running a newer sudo version than prod)
- keyholder patch review
- get list of services running in beta (cumin?)
- maintenance-disconnect-full-disks bring recovered nodes back online (somehow?)
- Add instance for mwmaint1001/2001 https://phabricator.wikimedia.org/T125976
- What I'm blocked on
- Other?
- tinker w/zotero v2 if there's time
- review ext/ContentTranslation -> gatedextension https://gerrit.wikimedia.org/r/c/integration/config/+/450508/
Zeljko
[edit]- What I plan to do this week
- T179188 Video recording for Selenium tests in Node.js
- patch working, Timo requested some changes https://gerrit.wikimedia.org/r/c/mediawiki/core/+/422933
- T188742 Run tests daily targeting beta cluster for all repositories with Selenium tests
- almost done for repos that had nodepool jobs, all jobs green, patch in final review https://gerrit.wikimedia.org/r/c/integration/config/+/443931
- working on repos that did not have jobs https://gerrit.wikimedia.org/r/c/integration/config/+/457882
- T185011 Create selenium-daily-beta-MediaWiki daily Jenkins job
- job green because it's running only tests that pass on beta https://integration.wikimedia.org/ci/view/Selenium/job/selenium-daily-beta-MediaWiki/
- patch in review https://gerrit.wikimedia.org/r/c/integration/config/+/457881
- I have to investigate which tests could run on beta with little/some/much refactoring
- T179188 Video recording for Selenium tests in Node.js
- What I'm blocked on
- Other?
Grooming
[edit]Team Kanban Board Review and Triage
[edit]- closed and touched in the 7 days
- No update for 4 weeks
- No update for 3 weeks
- No update for 2 weeks
- No update for 1 week
- All Open
- Review To Triage column of #releng
Once / month-ish review of backlog(s)
[edit]- releng Review To Triage column of #releng
- releng-kanban Review unassigned in kanban
- releng-kanban Review 'backlog' colum of -kanban
- releng-next - Review for things we need to put on our kanban backlog
- releng-backlog - oh my, the huge backlog of things...