Accessing a GitHub repository within Kubernetes via a sidecar container | BetterDoc Product Development Blog

To separate our services and the data our service operates on we use separate GitHub repositories as storage. This article will demonstrate how we make sure that our operational services always have the latest data available.

Our usecase

Image the following scenario: You have a service that is responsible for rendering templates. Let’s assume these templates are written in Markdown and rendered by the service into HTML.

The service is responsible for replacing variables with their respective values and converting Markdown into HTML.

The templates themselves will change a lot more frequently than the service itself. Additionally, we want non-developers to be able to edit the template content without risking to accidentally change code.

Making it work locally

One way to achieve this is to separate these responsibilities into two distinct projects which are represented by two distinct git repositories.

Let’s call the service repository templater-service and the template repository templater-templates. For the sake of this article it’s completely irrelevant in what the templater-service does internally or in which langue the service is written. We simply assume it’s packaged as a Docker container.

The templater-templates repository has a very simple structure:

+ templater-templates
  + folder1
    + foo.md
  + folder2
    + folder2b
      + bar.md

When requesting our templater-service to render the template folder1/foo.md, it looks up the corresponding file from a directory that has been configured as root directory for all templates. This root directory is read by the templater-service from an environment variable (let’s call that variable TEMPLATE_ROOT) It will then evaluate the content read from that file and ultimately generate the HTMl output.

When testing this locally, we can achieve this with a Docker command like this:

$ docker run -it \
  -e TEMPLATER_ROOT="/templates" \
  -v "/Users/christian/Development/projects/templater-templates:/templates" \
  betterdoc/templater-service bash

With the docker command line configuration we

Set the environment variable TEMPLATER_ROOT to /templates. This becomes the directory which the templater-service will use as root for looking up the templates.
Mount the directory /Users/christian/Development/projects/templater-templates from the local machine and make it available within the container at /templates.

Now, the service itself has access to the templates without having to know that the directory it uses contains the contents of a git repository checked out to the local file system.

Let’s test this inside the container:

templater@8f0fad931189:/$ echo TEMPLATER_ROOT
TEMPLATER_ROOT
templater@8f0fad931189:/$ tree /templates
/templates
|-- folder1
|   `-- foo.md
`-- folder2
    `-- folder2b
        `-- bar.md

It works - but only on my machine. Knowing that we’re not running production services my machine, I need a way to make it work “for real”.

Making it work in Kubernetes

We’re using Kubernetes as container orchestration engine to run all our services.

The deployment of our templater-service looks something like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: templater-service
  labels:
    app: templater-service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: templater-service
  template:
    metadata:
      labels:
        app: templater-service
    spec:
      containers:
        - image: betterdoc/templater-service
          name: service
          ports:
            - containerPort: 80
              protocol: TCP

This configuration is sufficient to run the service within a Kubernetes cluster. It doesn’t however allow that service access to the templates from the templater-templates repository.

Luckily, the Kubernetes team provides a specialized Docker image for exactly this scenario: checking out content from a git repository and attaching it to a running container.

The configuration requires a bit of plumbing so let’s go over the details step by step.

First, we need to define the container within the deployment. Following the Sidecar pattern we will call this container our “git-sync sidecar container”:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: templater-service
  labels:
    app: templater-service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: templater-service
  template:
    metadata:
      labels:
        app: templater-service
    spec:
      securityContext:
        fsGroup: 65533 # to make the git-sync SSH key readable
      volumes:
        - name: templater-templates-ssh
          secret:
            secretName: templater-templates-ssh
            defaultMode: 0400
      containers:
        - image: betterdoc/templater-service
          name: service
          ports:
            - containerPort: 80
              protocol: TCP
        - image: k8s.gcr.io/git-sync:v3.1.6
          name: git-sync-sidecar
          args:
            - "-repo=git@github.com:betterdoc-org/templater-templates.git"
            - "-branch=master"
            - "-depth=1"
            - "-max-sync-failures=-1"
            - "-root=/templates"
            - "-dest=from-github"
            - "-wait=60"
            - "-ssh=true"
          volumeMounts:
            - name: templater-templates-ssh
              mountPath: /etc/git-secret
              readOnly: true
          securityContext:
            runAsUser: 65533 # git-sync-user

The configuration for the new container is more complicated than the one for the service container.

The args section configures the sidecar container and tells it which repository to fetch the data from and also to which directory in the sidecar container to checkout the data to. In this example we’re using the directory /templater-templates. It also tells the container to perform the synchronization every 60 seconds.

When accessing the git repository and trying to fetch data, we need to authenticate ourselves, as we don’t want to use a publicly accessible repository.

That’s what the volumes and the volumeMounts sections do. The volumes entry defines a Kubernetes volume that can be shared between multiple containers (for more about Kubernetes volumes see this page). It defines a read-only volume whose files are mounted directly from a Kubernetes secret. The volumeMounts section attaches this volume to the git-sync-sidecar container using the /etc/git-secret as directory.

Details are available at: https://github.com/kubernetes/git-sync/blob/master/docs/ssh.md

This is what the Kubernetes secret templater-templates-ssh looks like:

apiVersion: v1
kind: Secret
metadata:
  name: templater-templates-ssh
type: Opaque
stringData:
  ssh: |
    -----BEGIN OPENSSH PRIVATE KEY-----
    b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAACFwAAAAdzc2gtcn
    ...
    fVqHeV9JA4kAAAAIZ2l0LXN5bmMBAgM=
    -----END OPENSSH PRIVATE KEY-----
  known_hosts: "github.com ssh-rsa AAA...QQQ=="

The ssh property of the secret will be mounted as file /etc/git-secret/ssh and the known_hosts property will be mounted as file /etc/git-secret/ssh.

When we deploy the deployment into the Kubernetes cluster a pod will be created that contains two container: The service and the git-sync-sidecar.

Looking into the logfiles of the git-sync-sidecar container we can see that it’s running correctly and performing the clone operation:

2020-08-07T14:22:43.657970529Z INFO: detected pid 1, running init handler
2020-08-07T14:22:43.664540978Z I0807 14:22:43.663391      10 main.go:321]  "level"=0 "msg"="starting up"  "args"=["/git-sync","-repo=git@github.com:betterdoc-org/templater-templates.git","-branch=master","-depth=1","-max-sync-failures=-1","-wait=60","-root=/templates","-dest=from-github","-ssh=true"] "pid"=10
2020-08-07T14:22:43.664560412Z I0807 14:22:43.663762      10 main.go:574]  "level"=0 "msg"="cloning repo"  "origin"="git@github.com:betterdoc-org/templater-templates.git" "path"="/templates"
2020-08-07T14:22:44.844997956Z I0807 14:22:44.844869      10 main.go:480]  "level"=0 "msg"="syncing git"  "hash"="098d3a5ee2ea5a17a5f75dba7398194504f288f0" "rev"="HEAD"
2020-08-07T14:22:45.957778467Z I0807 14:22:45.957649      10 main.go:501]  "level"=0 "msg"="adding worktree"  "branch"="origin/master" "path"="/templates/rev-098d3a5ee2ea5a17a5f75dba7398194504f288f0"
2020-08-07T14:22:45.962202554Z I0807 14:22:45.962090      10 main.go:524]  "level"=0 "msg"="reset worktree to hash"  "hash"="098d3a5ee2ea5a17a5f75dba7398194504f288f0" "path"="/templates/rev-098d3a5ee2ea5a17a5f75dba7398194504f288f0"
2020-08-07T14:22:45.962229217Z I0807 14:22:45.962131      10 main.go:528]  "level"=0 "msg"="updating submodules"

The synchronization is working, but a central piece is still missing: so far, the synchronization is only happening within the git-sync-sidecar container. Our service running in the service container doesn’t have access to the files checked out by the sidecar container yet, so we need a way to share data between these two containers.

The Kubernetes way of doing this is by defining another volume and mount this volume into both containers, so this is what the configuration looks like after setting up this connection:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: templater-service
  labels:
    app: templater-service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: templater-service
  template:
    metadata:
      labels:
        app: templater-service
    spec:
      securityContext:
        fsGroup: 65533 # to make the git-sync SSH key readable
      volumes:
        - name: templater-templates-ssh
          secret:
            secretName: templater-templates-ssh
            defaultMode: 0400
        - name: templater-templates
          emptyDir: {}
      containers:
        - image: betterdoc/templater-service
          name: service
          ports:
            - containerPort: 80
              protocol: TCP
          volumeMounts:
            - name: templater-templates
              mountPath: "/templates"
              readOnly: false
          env:
            - name: TEMPLATE_ROOT
              value: "/templates/from-github/"
        - image: k8s.gcr.io/git-sync:v3.1.6
          name: git-sync-sidecar
          args:
            - "-repo=git@github.com:betterdoc-org/templater-templates.git"
            - "-branch=master"
            - "-depth=1"
            - "-max-sync-failures=-1"
            - "-root=/templates"
            - "-dest=from-github"
            - "-wait=60"
            - "-ssh=true"
          volumeMounts:
            - name: templater-templates-ssh
              mountPath: /etc/git-secret
              readOnly: true
            - name: templater-templates
              mountPath: "/templates"
              readOnly: false
          securityContext:
            runAsUser: 65533 # git-sync-user

The templater-templates volume is configured within the volumes section and a reference to the same volume within the volumeMounts section of both our service and our git-sync-sidecar container. We have also added the environment variable TEMPLATE_ROOT to the service container pointing to the location within the volume where the templates are being checked out into.

Don’t let the suffix from-github confuse you, this has been configured as target within the git-sync-sidecar container and will be setup as a link that always references the latest version checked out from the git repository (see https://github.com/kubernetes/git-sync for details).

Now, our service can correctly access the templates fetched by the sidecar container. Let’s SSH into the container and verify that the files are where we expect them to be:

$ kubectl exec -it deployment/templater-service --container service bash

templater@templater-service-59c946c5f6-xbwgj:/$ ls -l /templates
total 0
lrwxrwxrwx 1 65533 65533  44 Aug  7 13:53 from-github -> rev-c9a6fc1c725ff49fda06e97ea551049973a9d7b4
drwxr-sr-x 4 65533 65533 103 Aug  7 13:53 rev-c9a6fc1c725ff49fda06e97ea551049973a9d7b4

templater@templater-service-59c946c5f6-xbwgj:/$ tree /templates/from-github
/templates/from-github
|-- folder1
|   `-- foo.md
`-- folder2
    `-- folder2b
        `-- bar.md

Accounting for potential errors

The setup as described above works as intended. The templates are fetched from the GitHub repository by the sidecar container and can be read by the service.

But there is one thing that may go wrong (and we know from Murphy’s Law that everything that can go wrong will go wrong).

What happens if both containers startup but the sidecar container fails to fetch the content from the GitHub repository? From the standpoint of the sidecar container nothing special happens, as it will simply wait for 60 seconds and then try again. Hopefully the issue will be resolved by then and it will be able to fetch the content. Otherwise it will wait another 60 and try again.

But our service has a bigger problem: It is up and running and Kubernetes is happily routing request to it. Those requests however cannot be fulfilled correctly. Without any templates the service cannot do anything useful.

The service itself is up and running, so from a pure infrastructure perspective all lights are green, but it isn’t able to do anything useful.

So what we actually want is Kubernetes to only route requests to our service after fetching the content from the GitHub repository has succeeded. Kubernetes shouldn’t declare our pod ready until this has happened.

One way of achieving this is to employ a Kubernetes init container. Kubernetes ensures that all init container are run to completion before the actual containers (in our case the service and the git-sync-sidecar) containers) will be started up.

By using this method, we can build an additional edit container that also performs a GitHub checkout (similar to the git-sync-sidecar container, actually we will use the exact same Docker image for it) but exists only if the checkout is successful:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: templater-service
  labels:
    app: templater-service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: templater-service
  template:
    metadata:
      labels:
        app: templater-service
    spec:
      securityContext:
        fsGroup: 65533 # to make the git-sync SSH key readable
      volumes:
        - name: templater-templates-ssh
          secret:
            secretName: templater-templates-ssh
            defaultMode: 0400
        - name: templater-templates
          emptyDir: {}
      initContainers:
        - image: k8s.gcr.io/git-sync:v3.1.6
          name: template-repository-setup
          args:
            - "-repo=git@github.com:betterdoc-org/templater-templates.git"
            - "-branch=master"
            - "-depth=1"
            - "-max-sync-failures=6"
            - "-wait=10" # max-sync-failures=6 * wait=10 = 60 seconds maximum runtime for the init container
            - "-one-time=true" # exit after the first sync (we just want to initialize the file system)
            - "-root=/templates"
            - "-dest=from-github"
            - "-ssh=true"
          volumeMounts:
            - name: templater-templates-ssh
              mountPath: /etc/git-secret
              readOnly: true
            - name: templater-templates
              mountPath: "/templates"
              readOnly: false
          securityContext:
            runAsUser: 65533 # git-sync-user
      containers:
        - image: betterdoc/templater-service
          name: service
          ports:
            - containerPort: 80
              protocol: TCP
          volumeMounts:
            - name: templater-templates
              mountPath: "/templates"
              readOnly: false
          env:
            - name: TEMPLATE_ROOT
              value: "/templates/from-github/"
        - image: k8s.gcr.io/git-sync:v3.1.6
          name: git-sync-sidecar
          args:
            - "-repo=git@github.com:betterdoc-org/templater-templates.git"
            - "-branch=master"
            - "-depth=1"
            - "-max-sync-failures=-1"
            - "-root=/templates"
            - "-dest=from-github"
            - "-wait=60"
            - "-ssh=true"
          volumeMounts:
            - name: templater-templates-ssh
              mountPath: /etc/git-secret
              readOnly: true
            - name: templater-templates
              mountPath: "/templates"
              readOnly: false
          securityContext:
            runAsUser: 65533 # git-sync-user

This is our final configuration!

This might look quite extensive for our simple usecase, but it ensures that when Kubernetes acknowleges our pod as ready and starts routing requests to it, the service itself has all the necessary data it needs: the templates checked out from the GitHub repository - even if the initial checkouts failed.