2020-06-24

Always

[1]
Archive
Task tree: https://phabricator.wikimedia.org/T198901

TODOs from last time

https://www.mediawiki.org/wiki/User:Jdforrester_(WMF)/MW-in-Containers_thoughts
- TODO: review for next meeting; real time discussion, maybe
TODO: thcipriani to file a task for Kask tests https://phabricator.wikimedia.org/T224041#6229281
TODO: will follow up with naïké on cross team collaboration
TODO: will to encourage more conversation on talk page from CPT
TODO: will to communicate with Product on the changes to find out impacts for their side

General

Máté email from Alex: next steps?
- Anythign more to forward? Waiting on them to say they are ready.
ORES K8s and COW - https://phabricator.wikimedia.org/T182331
- Takes advantage of CoW on a single machine
- Containerization might mean that we significantly increase memory usage
- This might add complexity and reduce capacity to ORES; i.e., sudden bursts of requests for uncommonly used wikis
- Alex: loosing excess capacity is not a big loss ( more efficeincy); adding capacity in k8s is easier
  - ORES relying on CoW will cause migration and pain issues in future; i.e., the kernel does not break user space; however, memory allocations may change in future
  - Tricks like apertium are using: loading and dropping models on demand
  - Actual RSS 210GB, with CoW 56GB. We have a lot of uwsgi and celery running. Currently, we have tuned the software to the hardware. If we need more capacity now, we need to add equipment (weeks to months). With K8s, room for growth.
  - Other benefits: standardized deployments, rollback being easier (no mess with scap handling virtualenvs), automatic garbage collecting of images. also a lot more control of CI -- less work inside of puppet. More self-serve.
- Aaron: we could say that we want to move *new* models to k8s -- want to ensure that we're using time efficiently
  - We have a backpressure system to determine whether or not we're running out of capcity (via queue and redis)
  - We need, though, to be able to handle random bots crawling uncommonly requested wikis -- how long does that scaling take?
    - Alex: seconds.
    - Aaron: if dynamic scaling is fast then that could give us more capacity that changes a lot -- how much is already built and how much would we have build ourselves for this scaling?
    - Alex: we can help the scaler by feeding it RpS to help it work better, but most components are there. We're not there yet; we're working with a contractor; it's on our roadmap. We should have that before end of September (hedging a bit).
Aaron: we're worried about being trailblazers. Multi-container deployment in particular.
- Tyler: PipelineLib exists. Not being used by other repos. No reason it shouldn't work. Re. deployment, there is not a lot of magic there that could fall apart (Helm and K8s -- not a trailblazer there).
- Alex: Only trailblazing is the creation of two different container images from the same repo.

RelEng

Jeena working on Kask tests
- https://phabricator.wikimedia.org/T224041
- Cassandra container
- Jeena: long response from eevans -- sounds like he didn't want to use service-checker -- but wasn't sure the reasoning there. Most of the extra work to get those go tests running is work I've already done. Only thing that needs to be done is adding deletion ability to PipelineLib.
- Will: CPT internally has an integration test tool -- this was also to be used for monitoring as well -- we're not ready to migrate away from service-checker which I think was most of the lament.
- Jeena: is he saying that service-checker doesn't fullfill all needs
- Will: We have a replacement, but it's lacking features
- Jeena: Should I wait for a response on go tests?
- Will: I will poke for clarification.

Deletion of PipelineLib images
- Important to make sure that we aren't deleting images being used in production :)

Serviceops

Working on getting chartmuseum (https://chartmuseum.com/) in production. This will make developing charts easier as we will no longer need to helm package and helm repo index manually.

CPT

Planning to start on MW on k8s in Q1
- Some discussion around configuration generation (maybe as a step as part of the pipeline)
- Agreement to start more topics on MW-in-containers thoughts page
- TODO: thcipriani to reach out to wmde about k8s/pipeline/mediawiki in k8s pipeline
- TODO: alexandros to start topic on mw-in-containers talk page about logos in mediawiki production images

As Always