Continuous integration/Docker
Support for Docker containers was added to Wikimedia CI in August 2017. These containers run from a cluster of persistent Jenkins agents in Wikimedia Cloud Services. We hope that in the future, these images will run from a Kubernetes cluster instead.
Overview
[edit]Each Docker image is designed to be self-sufficient, expecting nothing on disk, and expecting nothing to remain on disk afterward. Containers and their filesystem are destroyed after each run (except for the designated logs directory, which is preserved as the Jenkins build artefact for a few days). The behaviour of a container may only vary on environment variables provided by Jenkins (ZUUL_URL
, ZUUL_REF
, etc).
Administrative tasks must be handled solely by Jenkins, outside the container.
The Docker images for CI are published under the releng namespace at https://docker-registry.wikimedia.org/,.
The source code for each Dockerfile resides in the integration/config.git repository.
Build images locally
[edit]We use docker-pkg
to build Docker images, with Jinja for additional templating.
Installing docker-pkg
[edit]docker-pkg is a python3 program that can be cloned from operations/docker-images/docker-pkg:
$ git clone https://gerrit.wikimedia.org/r/operations/docker-images/docker-pkg
$ cd docker-pkg
It can then be installed in a virtual environment, for example with:
$ virtualenv venv
$ ./venv/bin/pip install .
or alternatively with pipx:
$ pipx install -e .
Also clone the integration/config
repository:
$ git clone https://gerrit.wikimedia.org/r/integration/config
At this point, the docker-pkg
command should be available in your terminal.
Download existing images
[edit]You can significantly speed up the building of images, by first downloading the existing latest version of the image you are interested in changing (or, if creating a new image, its parent image). Building all images from scatch may take several hours and consume ±40GB disk space.
The following command will download the latest versions of WMF CI's Docker images from docker-registry.wikimedia.org
. Note that this naturally skiips images you have previously pulled or build already (source):
$ cd integration/config $ ack -o -h -s 'docker-registry.*:[.\d]+' jjb/ | sort | uniq | xargs -n1 docker pull
Alternatively, you can download individual images like so:
$ docker pull docker-registry.wikimedia.org/releng/node20:latest
If the image in question has a debug script, you can also run that, which will naturally download it as-needed.
Build the images
[edit]
To build one or more images, run docker pull
with the --select
option. The "select" option takes a glob parameter that applies to the full reference name of the image (URL and version).
$ docker-pkg -c dockerfiles/config.yaml build --select '*example:*' dockerfiles/
This will scan the dockerfiles/
folder. For each one, it will find the last version tag in changelog
, and then if you don't have that version present in your local Docker registry, it will build start building it from the Dockerfile.
For example, --select '*node-test:*'
would build dockerfiles/node10-test/Dockerfile.template
as represented by docker-registry.wikimedia.org/releng/node10-test:0.3.0
.
You can also use this to build a large number of related images, for example --select '*quibble*'
(note the absence of a colon) would rebuild all images that have new versions in your working copy, of images that contain "quibble" in their name.
Alternatively, to build the entire catalog, omit the --select
option like so:
$ cd path/to/integration/config
$ docker-pkg -c dockerfiles/config.yaml build dockerfiles
By default docker-pkg does not use caching of image layers. If you want to use cached layers for quicker rebuilding of images, use the --use-cache option. |
Example output:
== Step 0: scanning dockerfiles == Will build the following images: * docker-registry.wikimedia.org/releng/ci-stretch:0.1.0 * docker-registry.wikimedia.org/releng/operations-puppet:0.1.0 * docker-registry.wikimedia.org/releng/ci-jessie:0.3.0 == Step 1: building images == => Building image docker-registry.wikimedia.org/releng/ci-stretch:0.1.0 => Building image docker-registry.wikimedia.org/releng/operations-puppet:0.1.0 => Building image docker-registry.wikimedia.org/releng/ci-jessie:0.3.0 == Step 2: publishing == NOT publishing images as we have no auth setup == Build done! == You can see the logs at ./docker-pkg-build.log
Troubleshooting
[edit]Could not find a suitable TLS CA certificate bundle
[edit]The following error is known to affect macOS: gerrit:500417
ERROR - Could not load image in integration/config/dockerfiles/…: Could not find a suitable TLS CA certificate bundle, invalid path: /etc/ssl/certs/ca-certificates.crt (builder.py:244) Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 228, in cert_verify "invalid path: {}".format(cert_loc)) OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /etc/ssl/certs/ca-certificates.crt
To workaround this issue, run the following in your Terminal window before running a docker-pkg
command:
$ export REQUESTS_CA_BUNDLE=; $ docker-pkg …
No images are built
[edit]If you don't see any output in between "Step 1: building images" and "Step 2: publishing", this means docker-pkg did not find any images or did not find images that have unbuilt newer versions. Review the following:
- "git status" in integration/config should show a change to the changelog file for the image you are updating.
- Make sure that the name used in the changelog file is correct, and matches your intended image name.
- Look for any errors in "docker-pkg-build.log".
- Make sure that you ran "docker-pkg -c dockerfiles/config.yaml dockerfiles" and not "docker-pkg -c dockerfiles/config.yaml dockerfiles/path-to-image"; docker-pkg will figure out which images to build by detecting modifications to the changelog.
Manifest not found
[edit]When adding a new version of an image, and also incrementing versions of dependant images, you may encounter the following error:
Build failed: manifest for example/my-parent-image:0.4.0 not found
This happens because docker-pkg (by default) fetches parent images from wikimedia.org and only rebuilds the child locally. If you are updating a parent from 0.3.0 to 0.4.0 and also incrementing the child images' versions, the parent will update fine, but then it will fail to build the children, despite the newer versions existing locally.
To mitigate this, pass --no-pull
. Like so:
docker-pkg --no-pull -c dockerfiles/config.yaml dockerfiles
Adjusting an image
[edit]Sometimes you need to edit an image, e.g. to add a new dependency or to update an existing one.
To do this, make your changes to the image's dockerfiles/ImageName/Dockerfile.template
file, and then run the following command:
$ docker-pkg -c dockerfiles/config.yaml --info update \ --reason "Briefly explain your change" \ --version NewImageVersion \ ImageName \ dockerfiles/
This will add a properly-formatted entry in the changelog
of the image you're changing, and all dependent images. You can then locally build the images to check that they build correctly, and the debug-image
command to check that it works as intended. Once you're happy with your fix, bundle the changes into a git commit for review and merging.
Manage local images
[edit]List local images:
$ docker images
Remove local images from wikimedia.org (source):
$ docker rmi $(docker images --format '{{.Repository}}:{{.Tag}}' | grep 'wikimedia.org')
Deploy images
[edit]Deploying a change to CI Dockerfiles requires shell access to the Docker registry on contint1001.wikimedia.org
(shell group: contint-docker). Ask Release Engineering team for help.
The change to integration/config
repository should first be merged in Gerrit.
After that, deploy it to the CI infrastructure. To do this, in the integration/config
directory, run: ./fab deploy_docker
. This connects to the contint1001.wikimedia.org
server and instruct it to build newer versions of Docker images in integration/config
.
Testing new images
[edit]Test an image locally
[edit]Use the below steps to test a docker image locally. This can be unpublished image you've built locally with docker-pkg
, or one that was pulled from the wikimedia.org repository.
Note that the below uses urls for the names of the images, but these refer to the ones you have locally (either created or pulled), they do not need to have been deployed or uploaded there yet. You can list the images you have locally using the docker images
command.
$ cd my-gerrit-project $ mkdir -m 777 cache log $ docker run \ --rm --tty \ --volume /"$(pwd)"/log://var/lib/jenkins/log \ --volume /"$(pwd)"/cache://cache \ --volume /"$(pwd)"://src \ docker-registry.wikimedia.org/releng/node10-test:0.3.0
Debug an image locally
[edit]The debug-image
script can be used to run a RelEng docker image locally.
$ cd integration-config/dockerfiles $ ./debug-image node10-test nobody@bfee2a999b20:/src$
The default behaviour for docker run
is to start the container and execute the entrypoint/cmd specified in the Dockerfile. To inspect the container instead, specify -i
to make it interactive, and override --entrypoint
to a shell (such as /bin/bash
). For example:
$ cd my-gerrit-project/ $ docker run \ --rm --tty \ --interactive --entrypoint /bin/bash \ docker-registry.wikimedia.org/releng/node10-test:0.3.0 nobody@5f4cdb0ab167:/src$ nobody@5f4cdb0ab167:/src$ env CHROMIUM_FLAGS=--no-sandbox XDG_CACHE_HOME=/cache
Test an image in CI
[edit]Once the new image is pushed to docker hub it should be tested on one of the integration-agent-docker-100x
machines. As of August 2017 there are 4 such machines: integration-agent-docker-100[1:4]
.
To test
- ssh to one of the
integration-agent-docker
machines andsu
to thejenkins-deploy
user.you@laptop:~$ ssh integration-agent-docker-1004 you@integration-agent-docker:~$ sudo su - jenkins-deploy
- Create a new directory and an environment file that contains the information passed from Jenkins in the form of
ZUUL_*
variablesjenkins-deploy@integration-agent-docker:~$ mkdir docker-test && cd docker-test jenkins-deploy@integration-agent-docker:docker-test$ printf "ZUUL_PROJECT=operations/puppet\nZUUL_URL=git://contint2001.wikimedia.org\nZUUL_REF=refs/zuul/production/Ze59ae894f02248d9888835dbaa14dfdf\nZUUL_COMMIT=045fcb14e9fd7885957d900b9a97c883fc5cd26d\n" > .env
- Run the new docker image with the environment file and ensure that it runs correctly
jenkins-deploy@integration-agent-docker:docker-test$ mkdir log jenkins-deploy@integration-agent-docker:docker-test$ docker run --rm -it --env-file .env --volume "$(pwd)"/log:/var/lib/jenkins/log contint/operations-puppet
- If everything is working as anticipated, update JJB with the Dockerfile version that has been pushed to the Wikimedia Docker registry.
Jenkins Agent
[edit]To create an additional Jenkins node that can run Docker-based Jenkins jobs.
- Create a new VM instance in Horizon with a name following the pattern 'integration-agent-docker-100X'.
- Wait for the first puppet run to complete and log in.
- Run the following to finish switching to the integration puppet master:
sudo rm -fR /var/lib/puppet/ssl sudo mkdir -p /var/lib/puppet/client/ssl/certs sudo puppet agent -tv sudo cp /var/lib/puppet/ssl/certs/ca.pem /var/lib/puppet/client/ssl/certs sudo puppet agent -tv
- Add the 'role::ci::slave::labs::docker' class to the instance in horizon
- For larger instance types (
m1.xlarge
andbigram
) specifytrue
for thedocker_lvm_volume
parameter.
- For larger instance types (
- Run a final update for puppet 'sudo puppet agent -tv'
- Pull an initial set of docker images onto the host (using latest tags) to avoid doing this in test runs:
sudo docker pull docker-registry.wikimedia.org/releng/castor:latest sudo docker pull docker-registry.wikimedia.org/releng/quibble-stretch:latest sudo docker pull docker-registry.wikimedia.org/wikimedia-stretch:latest
- Add the agent in the jenkins UI