To separate our services and the data our service operates on we use separate GitHub repositories as storage. This article will demonstrate how we make sure that our operational services always have the latest data available.
Our usecase
Image the following scenario: You have a service that is responsible for rendering templates. Let’s assume these templates are written in Markdown and rendered by the service into HTML.
The service is responsible for replacing variables with their respective values and converting Markdown into HTML.
The templates themselves will change a lot more frequently than the service itself. Additionally, we want non-developers to be able to edit the template content without risking to accidentally change code.
Making it work locally
One way to achieve this is to separate these responsibilities into two distinct projects which are represented by two distinct git repositories.
Let’s call the service repository templater-service
and the template repository templater-templates
.
For the sake of this article it’s completely irrelevant in what the templater-service
does internally or in which langue the service is written.
We simply assume it’s packaged as a Docker container.
The templater-templates
repository has a very simple structure:
+ templater-templates
+ folder1
+ foo.md
+ folder2
+ folder2b
+ bar.md
When requesting our templater-service
to render the template folder1/foo.md
, it looks up the corresponding file from a directory that has been configured as root directory for all templates.
This root directory is read by the templater-service
from an environment variable (let’s call that variable TEMPLATE_ROOT
)
It will then evaluate the content read from that file and ultimately generate the HTMl output.
When testing this locally, we can achieve this with a Docker command like this:
$ docker run -it \
-e TEMPLATER_ROOT="/templates" \
-v "/Users/christian/Development/projects/templater-templates:/templates" \
betterdoc/templater-service bash
With the docker
command line configuration we
- Set the environment variable
TEMPLATER_ROOT
to/templates
. This becomes the directory which thetemplater-service
will use as root for looking up the templates. - Mount the directory
/Users/christian/Development/projects/templater-templates
from the local machine and make it available within the container at/templates
.
Now, the service itself has access to the templates without having to know that the directory it uses contains the contents of a git repository checked out to the local file system.
Let’s test this inside the container:
templater@8f0fad931189:/$ echo TEMPLATER_ROOT
TEMPLATER_ROOT
templater@8f0fad931189:/$ tree /templates
/templates
|-- folder1
| `-- foo.md
`-- folder2
`-- folder2b
`-- bar.md
It works - but only on my machine. Knowing that we’re not running production services my machine, I need a way to make it work “for real”.
Making it work in Kubernetes
We’re using Kubernetes as container orchestration engine to run all our services.
The deployment
of our templater-service
looks something like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: templater-service
labels:
app: templater-service
spec:
replicas: 1
selector:
matchLabels:
app: templater-service
template:
metadata:
labels:
app: templater-service
spec:
containers:
- image: betterdoc/templater-service
name: service
ports:
- containerPort: 80
protocol: TCP
This configuration is sufficient to run the service within a Kubernetes cluster.
It doesn’t however allow that service access to the templates from the templater-templates
repository.
Luckily, the Kubernetes team provides a specialized Docker image for exactly this scenario: checking out content from a git repository and attaching it to a running container.
The configuration requires a bit of plumbing so let’s go over the details step by step.
First, we need to define the container within the deployment
.
Following the Sidecar pattern we will call this container our “git-sync sidecar container”:
apiVersion: apps/v1
kind: Deployment
metadata:
name: templater-service
labels:
app: templater-service
spec:
replicas: 1
selector:
matchLabels:
app: templater-service
template:
metadata:
labels:
app: templater-service
spec:
securityContext:
fsGroup: 65533 # to make the git-sync SSH key readable
volumes:
- name: templater-templates-ssh
secret:
secretName: templater-templates-ssh
defaultMode: 0400
containers:
- image: betterdoc/templater-service
name: service
ports:
- containerPort: 80
protocol: TCP
- image: k8s.gcr.io/git-sync:v3.1.6
name: git-sync-sidecar
args:
- "-repo=git@github.com:betterdoc-org/templater-templates.git"
- "-branch=master"
- "-depth=1"
- "-max-sync-failures=-1"
- "-root=/templates"
- "-dest=from-github"
- "-wait=60"
- "-ssh=true"
volumeMounts:
- name: templater-templates-ssh
mountPath: /etc/git-secret
readOnly: true
securityContext:
runAsUser: 65533 # git-sync-user
The configuration for the new container is more complicated than the one for the service container.
The args
section configures the sidecar container and tells it which repository to fetch the data from and also to which directory in the sidecar container to checkout the data to.
In this example we’re using the directory /templater-templates
.
It also tells the container to perform the synchronization every 60 seconds.
When accessing the git repository and trying to fetch data, we need to authenticate ourselves, as we don’t want to use a publicly accessible repository.
That’s what the volumes
and the volumeMounts
sections do.
The volumes
entry defines a Kubernetes volume that can be shared between multiple containers (for more about Kubernetes volumes see this page).
It defines a read-only volume whose files are mounted directly from a Kubernetes secret.
The volumeMounts
section attaches this volume to the git-sync-sidecar
container using the /etc/git-secret
as directory.
Details are available at: https://github.com/kubernetes/git-sync/blob/master/docs/ssh.md
This is what the Kubernetes secret templater-templates-ssh
looks like:
apiVersion: v1
kind: Secret
metadata:
name: templater-templates-ssh
type: Opaque
stringData:
ssh: |
-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAACFwAAAAdzc2gtcn
...
fVqHeV9JA4kAAAAIZ2l0LXN5bmMBAgM=
-----END OPENSSH PRIVATE KEY-----
known_hosts: "github.com ssh-rsa AAA...QQQ=="
The ssh
property of the secret will be mounted as file /etc/git-secret/ssh
and the known_hosts
property will be mounted as file /etc/git-secret/ssh
.
When we deploy the deployment
into the Kubernetes cluster a pod will be created that contains two container: The service
and the git-sync-sidecar
.
Looking into the logfiles of the git-sync-sidecar
container we can see that it’s running correctly and performing the clone operation:
2020-08-07T14:22:43.657970529Z INFO: detected pid 1, running init handler
2020-08-07T14:22:43.664540978Z I0807 14:22:43.663391 10 main.go:321] "level"=0 "msg"="starting up" "args"=["/git-sync","-repo=git@github.com:betterdoc-org/templater-templates.git","-branch=master","-depth=1","-max-sync-failures=-1","-wait=60","-root=/templates","-dest=from-github","-ssh=true"] "pid"=10
2020-08-07T14:22:43.664560412Z I0807 14:22:43.663762 10 main.go:574] "level"=0 "msg"="cloning repo" "origin"="git@github.com:betterdoc-org/templater-templates.git" "path"="/templates"
2020-08-07T14:22:44.844997956Z I0807 14:22:44.844869 10 main.go:480] "level"=0 "msg"="syncing git" "hash"="098d3a5ee2ea5a17a5f75dba7398194504f288f0" "rev"="HEAD"
2020-08-07T14:22:45.957778467Z I0807 14:22:45.957649 10 main.go:501] "level"=0 "msg"="adding worktree" "branch"="origin/master" "path"="/templates/rev-098d3a5ee2ea5a17a5f75dba7398194504f288f0"
2020-08-07T14:22:45.962202554Z I0807 14:22:45.962090 10 main.go:524] "level"=0 "msg"="reset worktree to hash" "hash"="098d3a5ee2ea5a17a5f75dba7398194504f288f0" "path"="/templates/rev-098d3a5ee2ea5a17a5f75dba7398194504f288f0"
2020-08-07T14:22:45.962229217Z I0807 14:22:45.962131 10 main.go:528] "level"=0 "msg"="updating submodules"
The synchronization is working, but a central piece is still missing: so far, the synchronization is only happening within the git-sync-sidecar
container.
Our service running in the service
container doesn’t have access to the files checked out by the sidecar container yet, so we need a way to share data between these two containers.
The Kubernetes way of doing this is by defining another volume and mount this volume into both containers, so this is what the configuration looks like after setting up this connection:
apiVersion: apps/v1
kind: Deployment
metadata:
name: templater-service
labels:
app: templater-service
spec:
replicas: 1
selector:
matchLabels:
app: templater-service
template:
metadata:
labels:
app: templater-service
spec:
securityContext:
fsGroup: 65533 # to make the git-sync SSH key readable
volumes:
- name: templater-templates-ssh
secret:
secretName: templater-templates-ssh
defaultMode: 0400
- name: templater-templates
emptyDir: {}
containers:
- image: betterdoc/templater-service
name: service
ports:
- containerPort: 80
protocol: TCP
volumeMounts:
- name: templater-templates
mountPath: "/templates"
readOnly: false
env:
- name: TEMPLATE_ROOT
value: "/templates/from-github/"
- image: k8s.gcr.io/git-sync:v3.1.6
name: git-sync-sidecar
args:
- "-repo=git@github.com:betterdoc-org/templater-templates.git"
- "-branch=master"
- "-depth=1"
- "-max-sync-failures=-1"
- "-root=/templates"
- "-dest=from-github"
- "-wait=60"
- "-ssh=true"
volumeMounts:
- name: templater-templates-ssh
mountPath: /etc/git-secret
readOnly: true
- name: templater-templates
mountPath: "/templates"
readOnly: false
securityContext:
runAsUser: 65533 # git-sync-user
The templater-templates
volume is configured within the volumes
section and a reference to the same volume within the volumeMounts
section of both our service
and our git-sync-sidecar
container.
We have also added the environment variable TEMPLATE_ROOT
to the service
container pointing to the location within the volume where the templates are being checked out into.
Don’t let the suffix from-github
confuse you, this has been configured as target within the git-sync-sidecar
container and will be setup as a link that always references the latest version checked out from the git repository (see https://github.com/kubernetes/git-sync for details).
Now, our service can correctly access the templates fetched by the sidecar container. Let’s SSH into the container and verify that the files are where we expect them to be:
$ kubectl exec -it deployment/templater-service --container service bash
templater@templater-service-59c946c5f6-xbwgj:/$ ls -l /templates
total 0
lrwxrwxrwx 1 65533 65533 44 Aug 7 13:53 from-github -> rev-c9a6fc1c725ff49fda06e97ea551049973a9d7b4
drwxr-sr-x 4 65533 65533 103 Aug 7 13:53 rev-c9a6fc1c725ff49fda06e97ea551049973a9d7b4
templater@templater-service-59c946c5f6-xbwgj:/$ tree /templates/from-github
/templates/from-github
|-- folder1
| `-- foo.md
`-- folder2
`-- folder2b
`-- bar.md
Accounting for potential errors
The setup as described above works as intended. The templates are fetched from the GitHub repository by the sidecar container and can be read by the service.
But there is one thing that may go wrong (and we know from Murphy’s Law that everything that can go wrong will go wrong).
What happens if both containers startup but the sidecar container fails to fetch the content from the GitHub repository? From the standpoint of the sidecar container nothing special happens, as it will simply wait for 60 seconds and then try again. Hopefully the issue will be resolved by then and it will be able to fetch the content. Otherwise it will wait another 60 and try again.
But our service has a bigger problem: It is up and running and Kubernetes is happily routing request to it. Those requests however cannot be fulfilled correctly. Without any templates the service cannot do anything useful.
The service itself is up and running, so from a pure infrastructure perspective all lights are green, but it isn’t able to do anything useful.
So what we actually want is Kubernetes to only route requests to our service after fetching the content from the GitHub repository has succeeded.
Kubernetes shouldn’t declare our pod ready
until this has happened.
One way of achieving this is to employ a Kubernetes init container.
Kubernetes ensures that all init container are run to completion before the actual containers (in our case the service
and the git-sync-sidecar
) containers) will be started up.
By using this method, we can build an additional edit container that also performs a GitHub checkout (similar to the git-sync-sidecar
container, actually we will use the exact same Docker image for it) but exists only if the checkout is successful:
apiVersion: apps/v1
kind: Deployment
metadata:
name: templater-service
labels:
app: templater-service
spec:
replicas: 1
selector:
matchLabels:
app: templater-service
template:
metadata:
labels:
app: templater-service
spec:
securityContext:
fsGroup: 65533 # to make the git-sync SSH key readable
volumes:
- name: templater-templates-ssh
secret:
secretName: templater-templates-ssh
defaultMode: 0400
- name: templater-templates
emptyDir: {}
initContainers:
- image: k8s.gcr.io/git-sync:v3.1.6
name: template-repository-setup
args:
- "-repo=git@github.com:betterdoc-org/templater-templates.git"
- "-branch=master"
- "-depth=1"
- "-max-sync-failures=6"
- "-wait=10" # max-sync-failures=6 * wait=10 = 60 seconds maximum runtime for the init container
- "-one-time=true" # exit after the first sync (we just want to initialize the file system)
- "-root=/templates"
- "-dest=from-github"
- "-ssh=true"
volumeMounts:
- name: templater-templates-ssh
mountPath: /etc/git-secret
readOnly: true
- name: templater-templates
mountPath: "/templates"
readOnly: false
securityContext:
runAsUser: 65533 # git-sync-user
containers:
- image: betterdoc/templater-service
name: service
ports:
- containerPort: 80
protocol: TCP
volumeMounts:
- name: templater-templates
mountPath: "/templates"
readOnly: false
env:
- name: TEMPLATE_ROOT
value: "/templates/from-github/"
- image: k8s.gcr.io/git-sync:v3.1.6
name: git-sync-sidecar
args:
- "-repo=git@github.com:betterdoc-org/templater-templates.git"
- "-branch=master"
- "-depth=1"
- "-max-sync-failures=-1"
- "-root=/templates"
- "-dest=from-github"
- "-wait=60"
- "-ssh=true"
volumeMounts:
- name: templater-templates-ssh
mountPath: /etc/git-secret
readOnly: true
- name: templater-templates
mountPath: "/templates"
readOnly: false
securityContext:
runAsUser: 65533 # git-sync-user
This is our final configuration!
This might look quite extensive for our simple usecase, but it ensures that when Kubernetes acknowleges our pod as ready and starts routing requests to it, the service itself has all the necessary data it needs: the templates checked out from the GitHub repository - even if the initial checkouts failed.