Wikimedia Release Engineering Team/Checkin archive/20160718
Appearance
2016-07-18
[edit]Special Guest - Rachel Farrand!
[edit]Team offiste planning!
Spreadsheets!
- Timing: https://docs.google.com/spreadsheets/d/1slYNnWJOAoNGK0Hn7wtvvShD2_ORO07I0fWNokuMry8/edit#gid=0
- Location: https://docs.google.com/spreadsheets/d/1_8KXdObI8tw033n4L245KoE1izgsdxp3h0BnZwGqk4s/edit#gid=0
Notes:
- Rachel will begin working on hotel/venue options in Chicago and DC \o/
Special Guest - Andrew with CI questions
[edit]- Need a good metric to watch for labs changes impact on CI
- Respawn may be causing DNS issues, can we increase the wait time there?
- What metrics do we have:
https://grafana.wikimedia.org/dashboard/db/releng-kpis https://grafana.wikimedia.org/dashboard/db/releng-zuul
Vacations/Important dates
[edit]How to do it: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off
- July 25 - August 15: Željko vacation. Will have laptop with me. Reachable via phone.
- July 30 - August 21: Antoine vacation. At home 1st week.
- August 1st - 5th: Mukunda - vacation: Concert & relaxation
...
- January 9-11: Dev Summit
- January 12-13: All Hands
Team Business
[edit]Rotating positions and absences
[edit]Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/u/blockers
weeks of July 11 and 18
[edit]- Train: Chad
- SoS: Tyler
- Out:
- Tyler - July 14+15 (Thur+Fri)
- Mukunda - July 15
- Chad - July 15
weeks of July 25 and Aug 1
[edit]- Train: Tyler
- SoS: Mukunda / Tyler
- Out:
- Zeljko: July 25 - Aug 15
- Antoine: July 30 - Aug 21
- Mukunda: Aug 1-5
Time spent spreadsheet
[edit]- FYQ1 (July-Sept 2016): https://docs.google.com/spreadsheets/d/1IrwGPdTDZ6H8x9Mf5dmCYlkK4hZ8sbUSLODEM4cFc4g/edit#gid=0
- Fixed the "unallocated" field. Added columns and formula hadn't been updated (shouldl be 1-N9 now, not L9)
Actions from last meeting
[edit]- Upgrade mariadb in deployment-prep from Precise/MariaDB 5.5 to Jessie/MariaDB 5.10 https://phabricator.wikimedia.org/T138778
- TODO: Greg. What is the priority? Check with Jaime. We have other priorities.
- Â Done Commented/asked on task.
- SWAT deploy next steps:
- Â Done TODO: Zeljko do an 8am Pacific SWAT deploy with Tyler
- Â Done TODO: After that, update docs
- NEXT: stalled pending finding people to do the SWAT window while Antoine and Zeljko are on vacation
Scrum of Scrums
[edit]- https://phabricator.wikimedia.org/project/board/64/
- Blocked on us: https://phabricator.wikimedia.org/maniphest/query/h7YTCBTJsepS/#R
This week
[edit]- Blocking
- Blocked
- Updates
- Zuul upgraded this week, should address a bunch of issues
Last week
[edit]- Blocking
- None
- Bocked
- None
- Updates
- New gerrit update needs testing: https://gerrit-new.wikimedia.org/r/
- wmf.9 was reverted, wmf.10 will get pushed to group0 and group1 today on a short schedule
- Retrospective to come https://wikitech.wikimedia.org/wiki/Incident_documentation/20160712-EchoCentralAuth
Other Team Business
[edit]- European SWAT deploys next steps (task T137970
- stalled until after Antoine and Zeljko's vacations, unless 2 other trained SWATers step forward
- TechDebt Analysis
- https://docs.google.com/spreadsheets/d/1Kxj9p4fKVNo2h23yAQVoOGg77dZ4FLxeXuYrH-1CrPA/edit#gid=0
- Greg hasn't had time to review the sheets
- Antoine and Zeljko paired on filling parts out, others want to do that as well? It helps :)
- Andrew interrupts with nodepool questions
- New labvirt nodes coming online today, please be alert to weird behavior
- Labs OPs would like to see metrics about testing performance:
- Benefit from increasing # of concurrent nodes
- Cost/benefit from changing rate of node recreation
Q1 goal/project check-in
[edit]Phase out Ubuntu Precise
[edit]keyresult tasks:
- Replace primary production Continuous Integration host (
gallium
) - task T95757- Meeting with Chase on Thursday was skipped
- Faidon will respond this week with his thoughts, we're waiting on him
- Upgrade Phabricator database servers to Maria10/Jessie - task T138460
- waiting on Jaime to failover m3-master
- Upgrade Beta Cluster database servers to Maria10/Jessie - task T138778
- waiting on Jaime to priority
Reduce Technical Debt
[edit]Perform a technical debt analysis of software and services maintained by WMF Release Engineering - task T138225
Streamline deployments (long-lived branches)
[edit]keyresult task:
- Convert our production deployment strategy to use long-lived branches - task T89945
project view: https://phabricator.wikimedia.org/project/view/2117/
- reorganized/repurposed other meetings to work on this
- time this past week was mostly spent on Phabricator fixing (task graphs, oh boy do we like tracking tasks)
Non-Quarterly goal work
[edit]CI Scaling/Nodepool
[edit]- CI Outage last week: https://wikitech.wikimedia.org/wiki/Incident_documentation/20160706-CI-Outage
- Follow-ups:
- https://phabricator.wikimedia.org/T139771 - "Identify metric (or metrics) that gives a useful indication of user-perceived (Wikimedia developer) service of CI"
- Follow-ups:
Browser tests
[edit]- working on survey report
- core bt found a bug (worked!) last week
- language screenshots are now running in CI+Sauce, will upload to Commons using existing ruby gem
Differential migration
[edit]Differential weekly (https://etherpad.wikimedia.org/p/diffuerential-weekly ) TODOs:
- Mukunda had questiosn for antione re puppet (keys into the private store, production or other? for CI image builder)
- Update documentation on creating/renaming of repos in Diffusion
- Update task with discussion about ACLs?
- Announce plan to migrate MW-Vagrant to Differential
- https://phabricator.wikimedia.org/T131419#2439362
- outstanding patches should be either merged, abandoned or migrated to differential revisions.
- semi-related TODO: file task re upgrading MW-Vagrant guests to Jessie
Beta Cluster
[edit]- "deployment-fluorine becomes unresponsive frequently" - https://phabricator.wikimedia.org/T140313
- From Matt (who's trying to diagnose login issues): "Happened again. I worked around it by rebooting in wikitech, but shouldn't keep happening."
Other
[edit]- Figure out how to help Jaime with the DB schema inconsistencies issue:
- https://phabricator.wikimedia.org/T132416 and https://phabricator.wikimedia.org/T104459 (see also: https://www.mediawiki.org/wiki/Development_policy#Database_patches )
- What can we do in CI to help prevent, mostly?
- Chad will lick this cookie :)
- "Consider alternative processes for Unbreak Now bugs, especially those which cross-cut components" - https://phabricator.wikimedia.org/T140207#2456573
- If you have opinions on this, please reply. I plan to stay engaged
People status updates
[edit]Antoine
[edit]Last week
[edit]- Gerrit upgrade / Zuul upgrade
- Target host to replace gallium
- Sync up with Tyler for CI / gallium phase out
- Moaar maintenance
- Offsite site/date
This week
[edit]Chad
[edit]Last week
[edit]- Gerrit. Gerrit. Gerrit.
This week
[edit]- Moar gerrit. Train. Choo choo.
Dan
[edit]Last week
[edit]- Getting back
This week
[edit]Mukunda
[edit]Last week
[edit]- Phabricator upgrade on wednesday
- The upgrade introduced a new task dependency graph which is awesome but also introduced a major performance issue on tracking tasks
- I've been working on a blog post about recent phabricator stuff, including the abovementioned task graph stuff: https://etherpad.wikimedia.org/p/phabricatorphacilityworkblogpost
- Figure out where to start on the long lived branches project
This week
[edit]- Get the merge-wmf-branch script cleaned up and shared with the team for feedback
- Brainstorm improvements / other ideas around branch merging / cherry-picking
Tyler
[edit]This week
[edit]- MW Canary work
Last week
[edit]- SWAT training/documentation
- Task wrangling
This week
[edit]Željko
[edit]Last week
[edit]- finishing migration of browsertests* Jenkins jobs to selenium* jobs https://phabricator.wikimedia.org/T128190
- Analyze (and share analysis of) the browser testing feedback survey https://phabricator.wikimedia.org/T139247
- Run language screenshots script for VisualEditor in Jenkins https://phabricator.wikimedia.org/T139613
- Figure out what to do with Firefox + Selenium https://phabricator.wikimedia.org/T137561
- SWAT training
This week
[edit]- trying to do the first SWAT (depending on https://phabricator.wikimedia.org/T140264 MediaWiki deployment shell access request for zfilipin)
- Analyze (and share analysis of) the browser testing feedback survey https://phabricator.wikimedia.org/T139247
- Run language screenshots script for VisualEditor in Jenkins https://phabricator.wikimedia.org/T139613