Deployment tooling/Cabal/2015-06-15
June 15th
[edit]- Big goal for next quarter (discuss, change, etc):
** deploy services! (maybe pick one to be a focus—restbase?) *** current RESTBase deployment workflow: https://wikitech.wikimedia.org/wiki/RESTBase *** general service deployment workflow: https://wikitech.wikimedia.org/wiki/User:Mobrovac/Service_Deployment ** should allow batches (specify via config or at runtime) ** should run checks (what do those look like?) ** should roll back ** should get running in deployment-prep by quarter's end (no explicit dependencies on ops—ops feedback throughout [obvs]) ** keep deployment cabal group running as a means of sanity checks ** RelEng code: Tyler, Chad, Mukunda—Dan to facilitate discussion
- Run down features that from scap/trebuchet tickets that we may want to move
- Consolidate meta-tickets
Modularity
[edit]- Transport mechanism
- Version to deploy
- meaningful tags
- list different deploy versions
- Signaling restart
- `service` command
- HUP
- Testing at the end
Versioning
[edit]- Services use semantic versioning, but not for deployments.
- There is a task for making mediawiki follow semantic versioning as well.
- It would be nice to use a standard versioning scheme, and some naming conventions for deployment tags, rather than long numeric deployment numbers like we have in trebuchet.
- for phabricator I use a date based deployment tag like release/2015-06-10/1 where the /1 is a revision number, for hotfixes you just increment the tag
concerns
[edit]- SSH for each host
- Public key deploy
- Sudoers roles
- troubleshooting deploys requiring escalation
- service user needs read/write (possibly)
interface
[edit]- Tmux—lotsa feedback
- Ability to abort at any point
- Watching logs/backend
- start from alternative interface, attach if problems
- locking mechanism per repo (possibly global, not neccessarily)
- single point of updates, multiple consumers (e.g. redis consumed by web page and by commandline)
TODO
[edit]1. Conversation with ops about ssh shared user (mwdeploy, whatever) 2. Regroup with RelEng figure out timelines 3. Granularity of ssh control