Wikimedia Release Engineering Team/Checkin archive/20180326
2018-03-19
[edit]Vacations/Important dates
[edit]- Mar 26-29 (week since WMF holiday Fri): thcipriani vacation
- Mar 30 (Fri): WMF Holiday
- April 2: Željko (Holidays in Croatia - Easter Monday)
- Apr 3-13: Greg vacation
- April 16 (Mon): WMF Holiday
- May 1: Željko (Holidays in Croatia - Labor Day / May Day)
- May 14-17: Team offsite in Barcelona
- May 18-21: Wikimedia Hackathon in Barcelona
- May 21 (Mon): Tech-Mgt F2F
- May 31: Željko (Holidays in Croatia - Corpus Christi)
Rotating positions
[edit]Train
[edit]- Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R
- Feb 19 - wmf.22 - Mukunda
- Feb 26 - wmf.23 - Tyler
- Mar 05 - wmf.24 - Tyler
- Mar 12 - wmf.25 - Chad
- Mar 19 - wmf.26 - Chad
- Mar 26 - wmf.27 - Mukunda <----
- Apr 02 - wmf.28 - Mukunda
- Apr 09 - wmf.29 - Tyler
- Apr 16 - wmf.30 - Tyler
SoS
[edit]- Feb 19 - Chad
- Feb 26 - Mukunda
- Mar 05 - Mukunda
- Mar 12 - Tyler
- Mar 19 - Tyler
- Mar 26 - Chad <----
- Apr 02 - Chad
- Apr 09 - Mukunda
- Apr 16 - Mukunda
Team Business
[edit]Updates
[edit]Scrum of Scrums
[edit]- Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums
This week
[edit]Release Engineering
[edit]- Blocking
- Blocked
- Updates
Last week
[edit]Release Engineering
[edit]- Blocking
- Blocked
- Updates
- Minor Gerrit upgrade planned for this week (2.14.6 -> 2.14.7)
- Incident analysis started last week of the last year’s worth of incidents reports
- Scap 3.7.7 should be rolled out to production this week
- Quarterly goal dependency update:
- Continue improving the ways that users can download articles of interest for later consumption
- Reading Web: Tech Ops/RelEng (work is currently blocked on https://phabricator.wikimedia.org/T187821 which is part of a larger epic https://phabricator.wikimedia.org/T181084)
- Talked about in team meeting Monday
- is there a task?
- Continue improving the ways that users can download articles of interest for later consumption
Train status and happenings
[edit]
Past week status updates
[edit]- All of it in table form: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201718Q3
Quarterly Goals
[edit]Program 1: Outcome 5: Milestone 1: Develop and migrate to a JavaScript-based browser testing stack
[edit]- Due: End of this quarter
- What: Specific improvements to the now canonical framework, see: task T182421, notably:
- Upgrade webdriverIO to version 4.9
- Investigate replacing nodemw with mwbot
- Video recording for Selenium tests in Node.js
- Task: task T182421
Priority: high
- T179188 Video recording for Selenium tests in Node.js - in progress - will do this week
- T180144 Upgrade WebdriverIO to 4.12.0 - resolved
- T181284 Replace nodemw with mwbot - in progress - almost done, updating documentation
- T190426 Refactor AdvancedSearch browser tests which use nodemw module - in progress - helping AdvancedSearch team
Priority: normal
- T164721 Run Selenium tests in CI for extensions - not started - CI changing
- T180125 Refactor mediawiki-core-qunit-selenium-jessie Jenkins job so qunit/karma and webdriverio are invoked via npm script - not started
- T180482 Create mediawiki-core-qunit-selenium-composer-jessie - not started
- T182692 Document differences between Ruby and Node.js Selenium frameworks - not started - not hard to do, will do this week
- T185011 Create selenium-MediaWiki-jessie daily Jenkins job - in progress
- T188740 Retrospective for T139740 Port Selenium tests from Ruby to Node.js - resolved
Priority: low
- T182412 Investigate if WebdriverIO `sync: false` would be useful to us and document how to use it - in progress - it could be useful for some tests, documentation pending, will do this week
- T182691 Selenium tests should be easier to run - in progress - blocked by upstream bug
- T183160 Sample code in Node.js for repositories that still have Selenium+Ruby tests - not started
- T183162 Patches in Gerrit deleting Selenium+Ruby tests for repositories that still have them - not started
- T185094 Update page object pattern in Selenium tests - in progress - done, but probably will not be implemented, discussion with upstream to revert to previous recommendation is probably the best thing to do
- T187859 Move one Selenium tests from mediawiki/core to mediawiki/skins/Vector - in progress - blocked on understanding how it breaks Minerva
- T188742 Should selenium-EXTENSION-jessie run for all repositores with Selenium tests? - not started - have to contact repository owners
Program 1: Outcome 5: Objective 1: Maintain existing shared Continuous Integration infrastructure
[edit]- Goals
- Draft requirements for a Kubernetes based solution for CI - task T183513
- Migrate MediaWiki PHPUnit tests to Shipyard (docker-based CI) (~40% of Nodepool usage) - task T183512
- Will be worked on after the long tail task T187797
- Unify production and CI docker image build process - task T177276
- Done 01/15
Program 3: Outcome 1: Objective 2: Identify and find stewards for high-priority/high use code segment orphans
[edit]- Due: End of quarter
- task T174091
Pivoted on the stewardship review process. Working with delegates prior to engaging with Toby and Victoria. Scheduled standing review monthly with Toby and Victoria
Program 3: Outcome 2: Objective 2: Define and implement a process to regularly address technical debt across the Foundation
[edit]- Due: End of quarter
- task T174095
worked on technical debt avoidance framework.
Program 3: Outcome 2: Objective 3: Promote and surface important technical debt topics at large gatherings of Wikimedia developers (e.g., DevSummit and Hackathon(s))
[edit]- Due: End of next quarter
- task T174096
No activity
Program 6: Outcome 2: Objective 2: Set up a continuous integration and deployment pipeline
[edit]- Due: End of this quarter
- Keyword: SSD
- phab project: https://phabricator.wikimedia.org/project/view/2453/
- Goal:
- Verify basic functionality of 'production' deployment and image (initially targeting mathoid):
- Functional PoC within integration in the deployment-pipeline
- Deploy to isolated k8s
- Verify basic functionality of 'production' deployment and image (initially targeting mathoid):
thcipriani update
[edit]This is a severely long bit of notes about what I did last week so that you all can pick up where I left off...hopefully
- We are sooo close to getting the PoC working, I was trying to build an image that worked that I could then puppetize
- I ended up blocked on a few things, some of which were resolved over the weekend.
- https://phabricator.wikimedia.org/T190584 upgrade docker agents to stretch
- Needed for changes in systemd mostly (minikube shells out to it and says "Stopped" on Jessie)
- Subtasks are mostly resovled
- https://phabricator.wikimedia.org/T190584 upgrade docker agents to stretch
Creating a new minikube agent 1. Create a new machine in horizon named like: integration-slave-k8s-10XX 2. ssh to machine (have to wait a bit, puppet needs to run there) 3. Fix weird self hosted puppet issues (see https://www.mediawiki.org/wiki/Continuous_integration/Docker#Jenkins_Agent_Creation )
- * sudo rm -fR /var/lib/puppet/ssl
* sudo mkdir -p /var/lib/puppet/client/ssl/certs * sudo puppet agent -tv * sudo cp /var/lib/puppet/ssl/certs/ca.pem /var/lib/puppet/client/ssl/certs * sudo puppet agent -tv
4. Apply the role role::ci::slave::labs::docker to the instance via horizon 5. sudo puppet agent -tv (this was failing last week see https://phabricator.wikimedia.org/T190584 ) 6. Setup minikube:
sudo apt-get install -y helm minikube kubernetes-client export MINIKUBE_WANTUPDATENOTIFICATION=false export MINIKUBE_WANTREPORTERRORPROMPT=false export MINIKUBE_HOME=$HOME export CHANGE_MINIKUBE_NONE_USER=true mkdir $HOME/.kube || true touch $HOME/.kube/config export KUBECONFIG=$HOME/.kube/config
sudo -E minikube start --vm-driver none --bootstrapper=localkube
7. Clone all necessary repos git clone https://gerrit.wikimedia.org/r/operations/deployment-charts git clone https://gerrit.wikimedia.org/r/mediawiki/services/mathoid
8. Build mathoid image cd mathoid blubber dist/pipeline/blubber.yaml production | docker build -t mathoid -f - .
9. Setup helm/tiller This is where I got stuck :( See: https://phabricator.wikimedia.org/T190589
10. helm install && helm test? Maybe? Didn't get this far :(
Quaterly non-goal "Work"
[edit]Program 1: Outcome 1: Objective 1: Scap (Tech Debt Sprint FY201718-Q2)
[edit]- Worked with awight on git-lfs + scap
Program 1: Outcome 5: Objective 1: Maintain existing shared Continuous Integration infrastructure
[edit]- https://phabricator.wikimedia.org/T189660
- Fixed the phabricator-jessie-diffs job. Thanks to Antoine for identifying the problem.
- Also improved the logging on failures so jenkins-bot will now comment with more useful info.
Program 1: Outcome 6: Milestone 1: Maintain Gerrit
[edit]Program 1: Outcome 6: Milestone 2: Maintain Phabricator
[edit]- Streamline logspam workflows by adding some integration with phabricator
- Store git-lfs (and other phab uploads) in swift: task T182085
- Finally got back into this during the second half of the week.
- Found out that there is already a swift cluster in deployment-prep and started configuring phab.wmflabs.org to work with this shared swift cluster.
- Finally got back into this during the second half of the week.
Other work
[edit]Selenium retrospective tool place last week. See: https://phabricator.wikimedia.org/phame/post/view/88/selenium_tests_in_node.js_project_retrospective/ Post Mortem on 20180129-MediaWiki Incident. See: https://etherpad.wikimedia.org/p/postmortem-20180129-MediaWiki_Incident Code Health Group Meeting: See: https://etherpad.wikimedia.org/p/codehealthgroup-20180321
Standup!
[edit]Antoine
[edit]- What I plan to do this week
- Demo of quibble right now
- Add experimental job to CI for mediawiki/core that would run some subset of phpunit/qunit/composer test/npm test and webdriver.io
- What I'm blocked on
- Patch for MediaWiki
- https://gerrit.wikimedia.org/r/#/c/421500/ //Let built-in web server handle .php requests//
- https://gerrit.wikimedia.org/r/#/c/419605/ //Let install.php detect and inject extensions// + backports
- Could use a PHP 7.0 roadmap. Anyone knows who is in charge?
- Patch for MediaWiki
- Other?
- mediawiki/core suite fails on sqlite or when LANG is different from C.
- I didn't know there were other LANGs ;-)
- mediawiki/core suite fails on sqlite or when LANG is different from C.
Chad
[edit]- What I plan to do this week
- abusefilter private logs / data pruning
- gerrit missing branch thingie? I hate git
- helm helm helm
- MW general release planning?
- What I'm blocked on
- Other?
Dan
[edit]- What I plan to do this week
- Integrate new Blubber release into pipeline script
- Publish a common policy file for Blubber to integration.wikimedia.org
- Refactor scap's CI jobs to use blubber
- Starting working on composer support in Blubber
- What I'm blocked on
- Re-review from Antoine on https://phabricator.wikimedia.org/D993
- Other?
- thcipriani: see update Program 6 Outcome 2 Objective 2 for where I left off last week...
Greg
[edit]- What I plan to do this week
- MW Release meeting as well
- talking with Mark&Faidon re 'staging' tomorrow
- apparently another budget [urgent] review item
- Q4 team goals
- SWAT changes
- What I'm blocked on
- Other?
Jean-Rene
[edit]- What I plan to do this week
- Finish up Q3 goal work re Technical Debt process
- Q3 Stewardship review
- What I'm blocked on
- Other?
Mukunda
[edit]- What I plan to do this week
- Swift, Swift and train
- more Swift
- What I'm blocked on
- n/a
- Other?
Tyler
[edit]- What I plan to do this week
- Vacation
- What I'm blocked on
- Blocked? Baby I'm on vacation!
- Other?
- <3 you all -- have a good week (I posted an update in program 6)
Zeljko
[edit]- What I plan to do this week
- Should I move tasks marked not started to T182986 Selenium framework improvements?
- Greg: yeah, I think so
- T179188 Video recording for Selenium tests in Node.js
- T190426 Refactor AdvancedSearch browser tests which use nodemw module
- T182692 Document differences between Ruby and Node.js Selenium frameworks
- T188740 Retrospective for T139740 Port Selenium tests from Ruby to Node.js
- T185011 Create selenium-MediaWiki-jessie daily Jenkins job
- T182412 Investigate if WebdriverIO `sync: false` would be useful to us and document how to use it
- Should I move tasks marked not started to T182986 Selenium framework improvements?
- What I'm blocked on
- T182691 Selenium tests should be easier to run - blocked by upstream or a new idea
- T185094 Update page object pattern in Selenium tests - waiting to see if Timo will explain to upstream that they are doing it wrong
- T187859 Move one Selenium tests from mediawiki/core to mediawiki/skins/Vector - blocked on understanding how it breaks Minerva
- Other?
- T190039 - CirrusSearch smoke selenium tests cause failures of mediawiki-core-qunit-selenium-jessie job for extensions - CI fixed
- Will there be Q4 Selenium framework improvements?
- Ordered Kinesis Advantage2 <3
Grooming
[edit]Team Kanban Board Review and Triage
[edit]- closed and touched in the 7 days
- No update for 4 weeks
- No update for 3 weeks
- No update for 2 weeks
- No update for 1 week
- All Open
- Review To Triage column of #releng
Once / month-ish review of backlog(s)
[edit]- releng Review To Triage column of #releng
- releng-kanban Review unassigned in kanban
- releng-kanban Review 'backlog' colum of -kanban
- releng-next - Review for things we need to put on our kanban backlog
- releng-backlog - oh my, the huge backlog of things...