Wikimedia Release Engineering Team/Checkin archive/20160307
2016-03-07
[edit]Vacations/Important dates
[edit]How to do it: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off
- March 11th - draft Q4 (April 1st - June 30th) goals due
- March 11th - Željko - conference
- March 14th - Antoine can't make it to weekly team meeting
- March 25th Friday - Tyler
- March 28th - Antoine && Željko - local holiday (Easter Monday)
- March 31st - April 3rd : Hackathon in Israel
- April 1st - Q4 goals published
- April - Antoine: holidays one of the two first weeks
- May 6th Friday - Antoine
- May 9-Mid June-ish?: Greg - paternity leave - exact dates TBD
- May 17-(?): Dan - paternity leave :D
- Late May - draft Q1 (July 1st - Sept 30th) due
- May 30: US HOLIDAY - Memorial Day
- June 15-24: Chad - Vegas/EDC
- June 22nd - 28th : Wikimania in Italy
- July 1st - Q1 goals published
- July 1st – Annual Plan, Budget, Risks Document and FAQ are posted
- August: Antoine - France holiday - because french. :)
- January 2017 : Dev Summit + All Hands (presumably)
Team Business
[edit]Rotating positions
[edit]Train conductor
[edit]Week of ...
- Mar 7: Mukunda
- Mar 14: Mukunda
- Mar 21: Tyler - Code freeze, due to the eqiad -> codfw switch over (announcement:
- So we need to make sure Mar 14th week is super stable.
- Mar 28: Tyler
Scrum of Scrums representative
[edit](bad time for EU folks) Dan, Tyler, Chad, Mukunda Week of ...
- Mar 7: Chad
- Mar 14: Chad
- Mar 21: Mukunda
= CI point person
[edit]- reassess later
Actions from last meeting
[edit]- TODO - No One Yet: investigate carbon aggregation of stats >1 month old behavior
- ACTION: Antoine to create a task
- Overdue
- ACTION: Antoine to create a task
New vs Maint time spent
[edit]
Scrum of Scrums
[edit]- https://phabricator.wikimedia.org/project/board/64/
- Blocked on us: https://phabricator.wikimedia.org/maniphest/query/h7YTCBTJsepS/#R
Only thing new was from Chris Steipp The TOC issue: https://phabricator.wikimedia.org/T124356
For this week:
- scap adoption shout out
- link to the adoption milestone https://phabricator.wikimedia.org/project/view/1824/
Other Team Business
[edit]Annual Planning
[edit]- Spreadsheet (team only) - https://docs.google.com/spreadsheets/d/1GBokh9zeO5vflAAZLjMuagV4FeFQHCFrApjs_KXNZ7o/edit#gid=0
- Planning worksheet: https://docs.google.com/spreadsheets/d/1ZsB0RCoZD3a6qKsX-qkCpA3HK81mNrZYI3GXeiuzzI0/edit#gid=0
Q4 Goals
[edit]What we said for next fiscal: https://docs.google.com/spreadsheets/d/1ZsB0RCoZD3a6qKsX-qkCpA3HK81mNrZYI3GXeiuzzI0/edit#gid=0
Phabricator maintenance
Scap decrease in time
Differential increase
- things we have: debian packages
- things we need:
- MW Core (need define CI reality and actually integrate CI into Differential)
- Ops Puppet
Browser test creation change (the matrix building)
- defining and enforcing test ownership responsibilities
Not pulling from your repo (including MW Core) unless your tests are green, period. Want it to be deployed? Fix your tests. You own your code and tests.
- First pass is to only block on what we already block on (ie: voting tests in Jenkins)
TODO: Chad or Tyler to send the "no more Trebuchet for new services, kthx" email to Ops
TODO: make a timeline
- get a list of repos from ops/puppet
- order by last deploy change, descending
- schedule x repos per week over the quarter
tin$ find /srv/deployment -maxdepth 2 -wholename '*/*/*/*/*'|wc -l 58 tin$
Should be a list of everything: https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common/role/deployment.yaml#L1 Which is 40 repos grepping operations/puppet for 'provider.*trebuchet' gives 30 (the truth is somewhere in between?)
browser tests discussion
- when things start failing there are long gaps before diagnosis and then fixing
- people assume it's just an issue with CI or the tests themselves
- how to put a little bit of pressure on people to diagnose/fix failed tests
- integrate diagnosis of tests before train would put the pressure on people
- if we do this we need a way to correlate failures and changes in code
- if we had a deploy dashboard, when it started and the commits in between, and the test status
- we could see if we're going to be in a good place before the train
- can offer the pre-merge voting browser test job
- give warning of 2 weeks
Sun Mon Tues Wed Thur Fri Sat
g1 g2 g0
Sun Mon Tues Wed Thur Fri Sat
g0 g1 g2
Antoine: deploy to G0, run all browser tests against them. If any is red: DEPLOY FREEZE
Q3 goal/project check-in
[edit]Reduce CI Wait time
[edit]- KPI: https://grafana.wikimedia.org/dashboard/db/releng-kpis?panelId=2&fullscreen
- Migrate remaining CI jobs to Nodepool - task T119138
- php composer (Zend and HHVM) - task T119139
- as many miscellaneous jobs as possible - task T119140
- Migrate Jenkins to Jessie - task T124121
Antoine:
- Looot of reviews
- Lurking at daily browser tests refactoring
- Nodepool had files corrupted
- Nodepool instances hiera is badly configured
- Nodepool upgrade this 7th march at 20:00 UTC to speed up deletion (faster pool replenishment, might grow pool as well)
Consolidate deploy tools
[edit]- Migrate MediaWiki to scap3 - task T114313
- Q2 Quarterly Goal hold over: Migrate all Service team owned services and MW deploys to scap3 - https://phabricator.wikimedia.org/T109926
Differential Migration
[edit]- https://etherpad.wikimedia.org/p/diffuerential-weekly
- Integrate Differential with our Continuous Integration infrastructure - task T31
- build debian packages from differential: https://integration.wikimedia.org/ci/job/beta-build-deb/
- Shepherd the RFC - task T119908
- Garner early adopter projects (goal: 1 project per WMF "team")