Wikimedia Release Engineering Team/Checkin archive/20190529
Appearance
2019-05-29
[edit]Vacations/Important dates
[edit]- May 28th-31st - thcipriani - family in town
- May 31st - Lars, Ascension (timeshifted from Thursday)
- May 30th-31th - Antoine, Feast of the Ascension
- June 6-7 - Brennen, Apogaea
- June 10th - Antoine, Pentecost -- see https://en.wikipedia.org/wiki/Eastertide for Antoine/France Easter holidays
- June 10 – July 21 - Dan leave (6 weeks, then additional leave later)
- June 19 (Juneteenth) - US Staff - on a Wednesday!?
- June 20 - Željko, Corpus Christi
- June 25 - Željko, Statehood Day
- July 4 (US Independence Day) - US Staff
- July 22 - August 9 - Željko vacation
- July 22 - Lars, Midsummer
- August 7–19 - James off (inc. Wikimania)
- August 12 - September 8 - Dan leave
- August 12 (Glorious Twelfth) - US Staff
- August ??? - ??? - Antoine
- August 14–18 - Wikimania
- Attending: James, ? …
- August 15 - Željko, Assumption of Mary
- August 25 - September 4 - Brennen vacation
- September 2 (Labor Day) - US Staff
- October 14 (Indigenous Peoples' Day) - US Staff
- November 11 (Veterans' Day) - US Staff
- November 28–29 (Thanksgiving) - US Staff
- December 6 - Lars, Finnish Independence Day
- December 25–31 (Christmas) - US Staff
- December 25-26 - Lars, Christmas
- 2020 January 1 (New Year's Day) - US Staff, Lars
Rotating positions
[edit]Train
[edit]- Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/query/s3KW8bpsXhYF/#R
- May 27 - wmf.7 - Zeljko 😢
- June 03 - wmf.8 - Zeljko 😭
- June 10 - wmf.9 - No Train (SRE Summit)
- June 17 - wmf.10 - Mukunda (but Juneteenth on the Wednesday?)
- June 24 - wmf.11 - Mukunda
- July 1 - wmf.12 - No train (Fourth of July)
- July 8 - wmf.13 - Tyler
- July 15 - wmf.14 - Tyler
- July 22 - wmf.15 - Antoine
- July 29 - wmf.16 - Antoine
- Aug 5 - wmf.17 - one of Mukunda/Tyler (Antoine and Zeljko on vacation)
- Aug 12 - wmf.18 - No Train (Wikimania) 😳 Last year we discussed not having train during Wikimania https://wikitech.wikimedia.org/wiki/Incident_documentation/20180717-Train
- Aug 19 - wmf.19 - Zeljko (after Wikimania) 😱
- Aug 26 - wmf.20 - Zeljko
SoS
[edit]- Zeljko 4eva! :)
Team Business
[edit]Timespent spreadsheet
[edit]- For the avoidance of doubt: fill out the sheet week number for the previous week
- link to week stating May 20: https://docs.google.com/spreadsheets/d/1urCLNQXeEi1DOR8Iu0qW0yPt-glxX1laqlMovbGyCW0/edit#gid=837028493
Book club
[edit]- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club
- Notes: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club/Continuous_Delivery
- Next: June 14th, chapters 10+11 (9am Pacific)
Spring Offsite
[edit]Follow-ups:
- Greg: email mark about capex request for next year for pipeline
- I'm actually not sure what this is about/what the ask is, help?!
- "staging" pipeline?
- Production access?
- CapEx budget now locked.
- ????: re Integration environments: establish SLAs between the teams for what is their responsibility and ours, what is the working relationship
- I think there's something more here that needs to be fleshed out, see the relevant section here: https://docs.google.com/document/d/1Y-cYrPKT0dvN2oj0hScIjRjkM2zWL5NY9xMYfMuC2Do/edit?ts=5c9cd50b#heading=h.vbm26ktfhprv
- Greg: flesh out/say more on this
- 2019-05-13: not yet...
- Mukunda: talk with Timo and Fillipo about our prioritized of feature requests for LMM
- Note: Gergo confirmed that SRE is going to work on Sentry in Q1/Q2 (from a conversation with Faidon and Filippo)
- See: https://docs.google.com/document/d/1Y-cYrPKT0dvN2oj0hScIjRjkM2zWL5NY9xMYfMuC2Do/edit?ts=5c9cd50b#heading=h.ra3pbkbq71i4
- ...
- Greg: announce that RelEng is backup only for SWAT (removal of person’s names from getting pinged everytime on IRC) and we’ll start working on automating the train
- Still need to do Q4 goals...table this “doing” until Q1?
- Greg will send a signed email if someone writes it up ;)
- Željko will write the e-mail this week - done
- Greg to follow-up...
Fall Offsite + TechConf19
[edit]- Travel travel travel!
- Two short trips, or one long?
- https://docs.google.com/forms/d/e/1FAIpQLScxVG8xz_CCacGusirtxrz2dfGKVvHZ5jes4attCh0BtdVcjw/viewform?usp=sf_link
Monthly reflection on accomplishments - May '19 edition
[edit]- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
- Add as you have them!
- Phabricator vandalism rollback tool completed 🎉 (blog post? 😉)
- Upgrade Zuul to 2.5.1-wmf6 (which unblocks the Gerrit upgrade to 2.16) - https://phabricator.wikimedia.org/T208426
- Team offsite in Chicago
- Repository-hosted CI/CD pipeline configurations now supported (.pipeline/config.yaml) - https://phabricator.wikimedia.org/T210267
- Train notes published on branch cut
- Codehealth pipeline beta - https://phabricator.wikimedia.org/phame/live/1/post/160/introducing_the_codehealth_pipeline_beta/
Annual Planning
[edit]- ...
Annual Reviews
[edit]Overview: https://office.wikimedia.org/wiki/FY_2018-19_Annual_Review_and_Retrospective
- Note: there is a workshop you can attend to get advice: https://office.wikimedia.org/wiki/FY_2018-19_Annual_Review_and_Retrospective#Sprints_&_trainings_-_support_from_T&C
Deadlines
[edit]Everyone:
Starting now: You and I discuss who your peer reviewers should beApril 26th: Enter your peer reviewers into Namely (please run them by me first)- May 17th: Deadline to complete self-reviews, peer reviews, and reviews of your manager.
- May 20th: I start reviewing the peer reviews and writing my feedback on you.
Non SafeGuard (aka US Employees):
- June 14th: Deadline for managers to complete all 1:1 meetings with direct reports and provide written feedback in Namely.
SafeGuard:
- June 14th - Managers of those employed by Safeguard submit their reviews to HR for submission to Safeguard
- July 12th - Deadline to have a 1:1 and share final manager review with direct report in Namely
Incoming/Needs attention
[edit]- REL1_33 branching for extensions: https://phabricator.wikimedia.org/T220653
- Reedy said he'll move forward with rc0 announcement soon.
- Mukunda tried to run ther script but it ran into trouble. Will re-try, manually.
- Switching on HTTP Auth again still seems blocked. Barricade should help with this; review when Tyler gets back.
- Talking to Pipeline project documenter.
- Lars – per e-mail, can we open up your arch document on-wiki soon?
- Lars: yes, opening it up this week
- Excellent, thank you.
- Lars: yes, opening it up this week
- Brennen – per e-mail, I'm not so sure what documentation input can happen yet?
- Brennen will follow up to mail
- Thanks!
- Brennen will follow up to mail
- Lars – per e-mail, can we open up your arch document on-wiki soon?
- CI Node 10 migration – let's JFDI? https://phabricator.wikimedia.org/T222406 Will need to pair with a CI expert (hashar?)
- James and Antoine to pair next week.
Scrum of Scrums
[edit]Incoming from last week
[edit]- Blocking:
Outgoing this week (wrong section heading level is on purpose for copy/pasting into Scrum of Scrums etherpad
[edit]Release Engineering
[edit]- Blocked by:
- Core Platform Team (low priority): https://phabricator.wikimedia.org/T205361 is blocking undeployment of CodeReview.
- SRE Traffic Team (low priority): https://phabricator.wikimedia.org/T213769 is blocking undeployment of Wikipedia Zero.
- Blocking:
- Parsing - Can RelEng team take a look at https://phabricator.wikimedia.org/T221872 ? We seem to be babysitting merges a lot more than we would like to because of having to "recheck" patches frequently.
- Fundraising Tech - Need to update Fundraising Tech CiviCRM tests to PHP7: https://phabricator.wikimedia.org/T223348
- Updates:
- Train Health
- Last week: 1.34.0-wmf.6 - https://phabricator.wikimedia.org/T220731
- This week: 1.34.0-wmf.7 - https://phabricator.wikimedia.org/T220732
- Next week: 1.34.0-wmf.8 - https://phabricator.wikimedia.org/T220733
- Code Health
- Log Health
- Train Health
Callouts
[edit]- Release Engineering
Train status and happenings
[edit]- Need to fix scap clean :\
- thcipriani has a crappy fix in mind until http tokens in gerrit are back
- Any idea when HTTP tokens will come back? Weeks? Months? Never? :-(
- ~Weeks
- 2019-05-06: cleaned up stuff last week on deploy hosts, just not the gerrit branches
- 2019-05-13: …
- thcipriani has a crappy fix in mind until http tokens in gerrit are back
- 1.33 branch cut for extensions is blocked (except tarball ones, which James did manually)
- 2019-05-06: Mukunda to do it this week
- Greg: email Cindy re process of this release
- 2019-05-13: We talked on Thursday. Mukunda will review hexmode's work, Cindy will email Greg with plan of action re timeline.
- Don't restart Gerrit during scap prep :)
Quarterly Goals for Q4
[edit]https://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2018-19_Q4
TEC1 (Maint): Outcome 1 / Output 1.1
[edit]- GOAL: Undeploy the CodeReview extension.
- WHO: James, need help from CPT
- James will ping CPT about this this week (April 8th)
- … and again w/c 15 April.
- … and again w/c 6 May (in SoS).
- … and again w/c 27 May (in SoS).
TEC1 (Maint): Outcome 1 / Output 1.1
[edit]- GOAL: Setup 1-3 of the CI WG options (Zuul v3, Argo, GitLab)
- WHO:
- Focus on a couple noteworthy repos: e.g.,
- core
- extensions
- ops/puppet
- Maybe setup in serial, i.e., a week per evaluation
- Questions:
- RelEng/Extended working group?
- At least in the WG eval it was good to have non-familiar people
- But maybe with the setup of options it might be beneficial to have experienced with current setup people.
- Folks outside the original working group to join-in to setup options; people TBD
- Do we need a rubric before we do this prototyping? (yes)
- DONE lars to work on rubric week of 2019-04-01
- See email 2019-04-08
- DONE lars to work on rubric week of 2019-04-01
- CI arch doc in team google drive now, open for feedback
- RelEng/Extended working group?
- 2019-05-06: Feedback from Android. Working on an arch document. Do in Q1?
TEC3 (Pipeline): Outcome 1 / Output 1.2
[edit]- GOAL: Instrument Quibble for data collection
- WHO: Mukunda, Antoine
TEC3 (Pipeline): Outcome 1 / Output 1.2
[edit]- GOAL: Create a graph where time is spent and make a prioritized list for improvements.
- WHO: Mukunda, Antoine
TEC3 (Pipeline): Outcome 1 / Output 1.2
[edit]- GOAL: Prepare the Deployment Pipeline for changes to our CI tooling.
- WHO: ???, ???
- Blocked by not having new CI tooling yet
TEC3 (Pipeline): Outcome 3 / Output 3.1
[edit]- GOAL: Create a .pipeline/config.yaml standard to give users more control over how their tests are run in the pipeline and allow the easy saving of artifacts at pipeline completion. (RelEng)
- WHO: Dan, Tyler, ???
- Implementation is working, but in testing a Blubber .pipeline/config.yaml there are some glaring deficiencies
- Re: https://gerrit.wikimedia.org/r/c/blubber/+/511784/6/.pipeline/config.yaml
- A high degree of repetition/duplication. Could use some sort of includes functionality (ala Blubber's variants) and/or a `defaults` section up top for `chart`, `blubberfile`, etc.
- A configuration validation system with human readible errors.
- Long term concerns about Groovy implementation include:
- Dependencies on Jenkins and many plugins
- Groovy CPS is a huge pain to debug and it's rarely clear that CPS is the issue when things go awry; Instead, the code just executes in unexpected ways.
TEC3 (Pipeline): Outcome 3 / Output 3.1
[edit]- GOALS:
- Adopt more services into Deployment pipeline - task T212801
- Wikidata Termbox SSR, Kask for Session Storage Service, cpjobqueue (stretch), ORES (stretch)
- Adopt more services into Deployment pipeline - task T212801
- WHO: Dan, Tyler, Lars
There are tasks: https://phabricator.wikimedia.org/T220403
- changeprop
- In progress ORES
- cf: Dan's comments
- Wikidata Termbox SSR
- Kask for Session Storage Service
- cpjobqueue (stretch)
TEC12 (DevProd): Outcome 1 / Output 1.1
[edit]- GOAL: Provide an "Official" Docker base image for local development of MediaWiki based on the production tooling.
- WHO: Jeena, Brennen
- https://phabricator.wikimedia.org/T212449
- Done for MediaWiki, for some values of "done" and "MediaWiki". Production-likeness needs considerable work.
TEC13 (Code Health): Outcome 1 / Output 3
[edit]- GOALs: Presentation/session(s) at the Wikimedia Hackathon on the current state of Code Health projects (technical debt and code stewardship)
- WHO: JR
- Done? T216630 Code Health Metrics working group, CI codehealth pipeline, and SonarQube
- Also had a session for Code Review and Cyclical Dependencies.
- Disussed Code Stewardship as part of the Code Review session
TEC13 (Code Health): Outcome 1 / Output 1.1
[edit]- GOAL:
- Publish a re-imagination of the Review Queue process.
- Develop and implement metrics around task and code-review responsiveness
- WHO: Greg, JR (and Andre)
- Code Review workgroup is being formed. Currently 24 people have shown interest to be involved.
= TEC13 (Code Health): Outcome 4 / Output 4.2
[edit]- GOALs:
- Expand SonarQube reporting into CI infrastructure
- Perform SonarQube analysis on all extensions
- Engage user communities in direct feedback solicitation
- WHO: JR, Zeljko, Code Health Metrics
- We had a session at the Hackathon to solicit feedback. Very positive reception. People seem geniunely interested in these efforts.
Other non-goal work
[edit]Release MW 1.33
[edit]Selenium
[edit]- T223774 The first Selenium test for WikibaseCirrusSearch - started at the hackathon, have to finish it
Gerrit
[edit]Phabricator
[edit]Jenkins
[edit]QA/Code Health
[edit]SCAP
[edit]Standup!
[edit]Antoine
[edit]- What I plan to do this week
- Caused an outage on CA wikinews
- What I'm blocked on
- Other?
- Out Thurs/Fri this week
Brennen
[edit]- What I plan to do this week
- Fix Ubuntu installs of local-charts
- Test Kosta's patch for xhprof with xdebug image
- Make helm lint happy with local-charts
- Follow up on James's intro to tech writer
- What I'm blocked on
- Nada
- Other?
- Off-grid June 6-9
Dan
[edit]- What I plan to do this week
- Follow up on the response from Analytics re: long-term event logging and analysis and set up a meeting
- Continue to debug and iron out issues in pipelinelib :(
- What I'm blocked on
- Other?
Greg
[edit]- What I plan to do this week
- Annual planning follow-up
- Annual Reviews
- TechConf19
- What I'm blocked on
- Other?
- SRE summit week of Jun 9
James
[edit]- What I plan to do this week
- Production config conversion to static files: https://phabricator.wikimedia.org/T223602
- Node 10 CI stuff, apparently. :-)
- Pipeline documentation
- Three of the eight remaining non-static prod extensions (Cirrus, Collection, FlaggedRevs) now ready-ish to migrate. Helping chivvy along the others.
- What I'm blocked on
- Extension undeployment stuff, as before.
- Other?
- Whatever blows up.
Jean-Rene
[edit]- What I plan to do this week
- Code Stewardship reviews candidates (Timo suggested 3 more)
- Quality and Test Engineering team brainstorming
- Code Review workgroup setup/planning
- What I'm blocked on
- Other?
Jeena
[edit]- What I plan to do this week
- What I'm blocked on
- Other?
- Out this week
Lars
[edit]- What I plan to do this week
- WMF Golang security symposium
- Update CI arch doc based on feedback
- Discuss things brought up based on CI arch doc
- Solicit feedback from outside team on CI arch doc
- What I'm blocked on
- Other?
- Taking Friday off (timeshifted from Thursday, which is a public holiday).
Mukunda
[edit]- What I plan to do this week
- Work on branching and tarballs
- Write a draft email re: grafana and sentry
- Try to figure out the status of barricade and be ready for gerrit changes next week
- review hexmode code
- What I'm blocked on
- Other?
Tyler
[edit]- What I plan to do this week
- What I'm blocked on
- Other?
- Out (most of) this week
Zeljko
[edit]- What I plan to do this week
- T220732 1.34.0-wmf.7 deployment blockers
- T223774 The first Selenium test for WikibaseCirrusSearch
- Blog post about Wikimedia Hackathon
- What I'm blocked on
- Other?
- Martin Urbanec is doing a great job during EU SWAT
Grooming
[edit]Team Kanban Board Review and Triage
[edit]- closed and touched in the 7 days
- No update for 4 weeks
- No update for 3 weeks
- No update for 2 weeks
- No update for 1 week
- All Open
- Review To Triage column of #releng
Once / month-ish review of backlog(s)
[edit]- releng Review To Triage column of #releng
- releng-kanban Review unassigned in kanban
- releng-kanban Review 'backlog' colum of -kanban
- releng-next - Review for things we need to put on our kanban backlog
- releng-backlog - oh my, the huge backlog of things...