Deployment tooling/Cabal/2015-06-15

June 15th

Big goal for next quarter (discuss, change, etc):

** deploy services! (maybe pick one to be a focus—restbase?) *** current RESTBase deployment workflow: https://wikitech.wikimedia.org/wiki/RESTBase *** general service deployment workflow: https://wikitech.wikimedia.org/wiki/User:Mobrovac/Service_Deployment ** should allow batches (specify via config or at runtime) ** should run checks (what do those look like?) ** should roll back ** should get running in deployment-prep by quarter's end (no explicit dependencies on ops—ops feedback throughout [obvs]) ** keep deployment cabal group running as a means of sanity checks ** RelEng code: Tyler, Chad, Mukunda—Dan to facilitate discussion

Run down features that from scap/trebuchet tickets that we may want to move
Consolidate meta-tickets

- https://phabricator.wikimedia.org/T101022

Modularity

Transport mechanism
Version to deploy
- meaningful tags
- list different deploy versions
Signaling restart
- `service` command
- HUP
Testing at the end

Versioning

Services use semantic versioning, but not for deployments.
There is a task for making mediawiki follow semantic versioning as well.
It would be nice to use a standard versioning scheme, and some naming conventions for deployment tags, rather than long numeric deployment numbers like we have in trebuchet.
for phabricator I use a date based deployment tag like release/2015-06-10/1 where the /1 is a revision number, for hotfixes you just increment the tag

concerns

SSH for each host
Public key deploy
Sudoers roles
- troubleshooting deploys requiring escalation
- service user needs read/write (possibly)

- https://phabricator.wikimedia.org/T101024

interface

Tmux—lotsa feedback
Ability to abort at any point
Watching logs/backend
start from alternative interface, attach if problems
locking mechanism per repo (possibly global, not neccessarily)
single point of updates, multiple consumers (e.g. redis consumed by web page and by commandline)

TODO

1. Conversation with ops about ssh shared user (mwdeploy, whatever) 2. Regroup with RelEng figure out timelines 3. Granularity of ssh control