Kask
Kask is an opaque key-value data store with a RESTful (HTTP) interface. It utilizes Apache Cassandra for persistence, making it suitable for very large and/or high-volume data sets, applications requiring geographically aware master-master replication, and high-availability.
Some of its features include:
- Support for Transport Layer Security (TLS) to secure end-to-end communications, both encryption of communications and providing authentication with public-key cryptography
- Expiration of values through the use of a service-wide time-to-live (TTL)
- Simplified consistency model; Reads and writes utilize Cassandra's data-center local quorum, while deletes block for quorum in each data-center
Kask
Multi-master replicated key-value data storage service
|
API
[edit]Operations
[edit]get
[edit]URL | /sessions/v1/{key}
| |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | GET
| |||||||||||||||
Params | None | |||||||||||||||
Data | None | |||||||||||||||
Success | Example:HTTP/1.0 200 OK
Content-Type: application/octet-stream
Content-Length: 27
Date: Mon, 22 Oct 2018 22:07:59 GMT
sample value
| |||||||||||||||
Error | Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json .
| |||||||||||||||
Example | $ curl http://api.example.org/sessions/v1/test_key
| |||||||||||||||
Notes |
set
[edit]URL | /sessions/v1/{key}
| |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Method | POST
| |||||||||
Params | None | |||||||||
Data | The body of the request is the value represented as arbitrary bytes, using a content-type of application/octet-stream .
sample value
| |||||||||
Success | Example:HTTP/1.0 201 CREATED
Content-Type: application/octet-stream
Content-Length: 0
Date: Tue, 23 Oct 2018 19:40:40 GMT
| |||||||||
Error | Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json .
| |||||||||
Example | $ curl -X POST -H 'Content-Type: application/octet-stream' -d 'sample value' \
http://api.example.org/sessions/v1/test_key
| |||||||||
Notes | This operation assigns a value to key . The return does not differentiate between a request that created a new value, or one that overwrote an existing one.
Values persist until expiring as the result of a TTL dictated by service configuration; this single TTL applies to all stored sessions. Values retrieved after expiry result in a 404 (see above).
|
delete
[edit]URL | /sessions/v1/{key}
| |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Method | DELETE
| |||||||||
Params | None | |||||||||
Data | None | |||||||||
Success | Example:HTTP/1.0 204 NO CONTENT
Content-Type: application/octet-stream
Content-Length: 0
Date: Tue, 23 Oct 2018 19:40:40 GMT
| |||||||||
Error | Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json .
| |||||||||
Example | $ curl -X DELETE http://api.example.org/sessions/v1/test_key
| |||||||||
Notes | This operation deletes the value associated with key , if it exists. The return status does not distinguish between a value that was not present at the time of delete (a no-op), and those where a value was successfully deleted.
When operated in a multi-datacenter environment, a successful return guarantees that subsequent |
Developing
[edit]Dependency Management
[edit]The libraries an application depends on are as much a part of the final product as the code we write ourselves, and yet it is all too common for us to choose them indiscriminately, retrieve them via untrusted sources, and treat them (and the entire graph of transitive dependencies) as black-boxes. Often this pattern is deeply ingrained in our tools and the culture surrounding them. Case in point: Kask is written in Go, where traditionally little emphasis has been placed on release management; Applications import external dependencies by referencing their remote Git repository, typically the HEAD of the master branch, with a result that is statically compiled (requiring recompilation to link against any updated dependencies). This –run the latest of everything, and hope for the best– mentality is antithetical to quality software. It makes reproducibility prohibitively difficult, and the complete lack of environmental stability makes tracking defects, (including those impacting security) and their interactions intractable.
Tooling notwithstanding, proper dependency management is difficult and labor intensive. It requires that each node in the dependency graph be released managed, and that compatibility between nodes be established to properly inform the edges. Change of any kind is as likely to introduce new bugs as it is to fix existing ones, and changes that alter existing or introduce new functionality disproportionately so. Sound judgement is required to balance the value of an update with the risks. When changes are made, careful testing is needed to ensure continued compatibility, and flag any new regressions. This is a tremendous amount of work, fortunately, there is an alternative to doing this ourselves.
Debian is a Linux distribution founded in 1993, with a long-standing reputation for quality control. Software that is packaged for Debian has been carefully curated. Packagers ensure that an active and responsive upstream exists, but accept responsibility for the duration of a release if an upstream becomes unwilling or unable to address issues. Care is taken to select the most appropriate version for release, and its transitive dependencies are satisfied by dependent relationships with other packages. Changes to a package during a stable release are made only on an as-needed basis (crippling bugs, security vulnerabilities, etc), and are as minimally invasive as possible. Additionally, PGP encryption is utilized to establish a strong chain of trust between the developers who upload packages, and the machines where they are ultimately installed. It would be difficult to overstate the amount of software life-cycle management work that goes into a distribution like Debian, work we do not have to do if we satisfy our dependencies using packaged software.
TL;DR Kask's code dependencies are sourced entirely from what is available in Debian GNU/Linux (Stretch/9.8 at the time of writing).
Setup
[edit]Clone Kask's source code repository. For example:
$ git clone https://gerrit.wikimedia.org/r/mediawiki/services/kask && cd kask
Builds at the Wikimedia Foundation are created using a Docker image generated by Blubber; Utilizing Blubber with Kask's deployment pipeline configuration is the easiest way to create a container for development use. Prebuilt, statically linked binaries for most platforms can be obtained from the Blubber download page.
Blubber outputs a Dockerfile based on Kask's pipeline configuration, and docker build
will create the corresponding image.
$ blubber .pipeline/blubber.yaml build | docker build --tag kask-dev -
...
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
kask-dev latest 5dec96f34114 About a minute ago 875MB
docker-registry.wikimedia.org/golang 1.11.5-1 58a912d05a49 2 months ago 344MB
Building
[edit]
The following can be copied to a file (buildenv.sh
for example) and invoked as a script to issue commands inside the development container.
#!/bin/sh
# Wraps docker-run to issue commands inside the development container.
# Usage: buildenv.sh make
# Usage: buildenv.sh make unit-test
# Usage: buildenv.sh ./kask --config config.test.yaml
set -e
docker run \
--rm \
--name kask-dev \
-u $UID \ # Avoid permissions issues; Use your UID inside container
-e GOPATH=/usr/share/gocode \ # Use Debian-based dependencies installed inside container
-e GOCACHE=/tmp/gocache \ # See https://github.com/golang/go/issues/26280
-v "$(pwd)":/kask \
-w /kask \
kask-dev \
"$@"
Releasing
[edit]To release a new version of Kask, create an annotated tag, and push it to Gerrit.
user@host ~$ VERSION="v1.0.0"
user@host ~$ git tag -am "$VERSION release" $VERSION master
user@host ~$ git push gerrit $VERSION
Running
[edit]The Wikimedia Foundation runs Kask in production using Kubernetes; The easiest way to get the service up and running is to use a Wikimedia Foundation Docker image.
Setup
[edit]
Since the Foundation's registry does not implement the latest tag, the first step is to browse the list of available image tags and select one appropriate. We'll use 2019-05-10-162420-production
as the tag in the following examples. Once you've selected a Docker image, use docker pull
to retrieve a copy locally, docker images
to verify success.
$ IMAGE_TAG=2019-05-10-162420-production
$ docker pull docker-registry.wikimedia.org/wikimedia/mediawiki-services-kask:$IMAGE_TAG
2019-05-10-162420-production: Pulling from wikimedia/mediawiki-services-kask
5c86276767f3: Pull complete
a413e562d2b8: Pull complete
648d537effeb: Pull complete
864e75c6ef22: Pull complete
49e16a850e4b: Pull complete
Digest: sha256:2d7f3118b6e091233e62760f37e44e590fd986f227516a1b54689507b433a41b
Status: Downloaded newer image for docker-registry.wikimedia.org/wikimedia/mediawiki-services-kask:2019-05-10-162420-production
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker-registry.wikimedia.../mediawiki-services-kask 2019-05-10-162420-production 73515b6aa2d3 13 days ago 110MB
It may prove useful to create an alias for the chosen tag, both to have something more descriptive, and to have a stable reference when starting containers. This step is entirely optional though.
$ docker tag docker-registry.wikimedia.org/wikimedia/mediawiki-services-kask:$IMAGE_TAG kask
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker-registry.wikimedia.../mediawiki-services-kask 2019-05-10-162420-production 73515b6aa2d3 13 days ago 110MB
kask latest 73515b6aa2d3 13 days ago 110MB
Starting a container
[edit]
The container expects Kask's configuration file to exist as /etc/mediawiki-services-kask/config.yaml
. To accomplish this we'll mount a local directory containing the configuration (as config.yaml
) inside the container as /etc/mediawiki-services-kask
. The following assumes config.yaml
is in the current working directory.
$ docker run --rm --name kask -v "$(pwd)":/etc/mediawiki-services-kask kask