what & why

The CKS (Certified Kubernetes Security Specialist) is a great resource for knowing how to secure a kubernetes cluster. It covers a lot of topics, from the cluster side (admission controller, webhooks, audit), app side (Pod Security Policies) and supply chain (image scanning). Another great resource for this is the Kubernetes Hardening Guidance by NSA & CISA

But some of the concepts defined in both these resources are very case-specific, and require a lot of time, tools & effort to setup. In some environnements, it might be infeasible to deploy each and every one of those concepts. But that doesn’t mean we should avoid some basic security-minded steps when deploying to k8s. I won’t cover things on the cluster-side (audit, tools like falco, or admission controllers), but how you can improve the security of your front-facing app by adding a few lines here and there.

how

Let say we have a python app, that exposes an API. We have a basic Dockerfile for it, and a simple deploy.yaml spec file containing our Deployment. They are what you could call ’typical’ of what can be found online when looking for a template of dockerfile or deployment:

Dockerfile:

FROM python:3-alpine
COPY requirements.txt .
RUN pip3 install -r requirements.txt

WORKDIR /app
COPY src/ ./

ENV PYTHONUNBUFFERED 1
CMD python3 app.py

deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
      - name: api
        image: registry/api:latest
        ports:
        - containerPort: 5000

Let’s secure things up :

non root user

The first recommendation is to run our containers as non-root users. For that, we’ll first add a few lines to our Dockerfile:

...
RUN addgroup -S app && adduser -H -D -S app -G app -s /dev/null
USER app:app
WORKDIR /app

Note that we are using an alpine-based container, so the exact command might vary on other distros, but the goal is the same

By creating a user ‘app’, and using it to run our app, we avoid giving way too much permissions to the app. We can check that by exec’ing into the running container:

(before)

/app # id
uid=0(root) gid=0(root) groups=1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),20(dialout),26(tape),27(video)
/app # ps
PID   USER     TIME  COMMAND
    1 root      0:00 {flask-api.py} /usr/local/bin/python3 ./flask-api.py
    7 root      0:00 sh
   14 root      0:00 ps

(now)

/app $ id
uid=101(app) gid=101(app)
/app $ ps
PID   USER     TIME  COMMAND
    1 app       0:00 {flask-api.py} /usr/local/bin/python3 ./flask-api.py
    7 app       0:00 sh
   14 app       0:00 ps

Right, so now we are not running as root on the container side, but what about the host ? To which user are we mapped to on the host side ?

when running the default image:

k8s-worker > ps -aux | grep flask
root     2266529  1.8  0.1  28168 23752 ?        Ss   17:31   0:00 /usr/local/bin/python3 ./flask-api.py

We are actually mapped to the root user ! that’s not the most secure setup ! If somehow an attacker gain control of the pod and is able to escape, he will land on the host as the root user.

And if we use the new Dockerfile with the USER directive:

ps -aux | grep flask
syslog   2267104  2.8  0.1  28168 23812 ?        Ss   17:32   0:00 /usr/local/bin/python3 ./flask-api.py

What ? Why are we mapped to the syslog user ? A quick check of /etc/passwd shows us why :

cat /etc/passwd | grep 101
syslog:x:101:107::/home/syslog:/usr/sbin/nologin

This is because when we created the user app in the Dockerfile, it assigned to it the uid 101, which is the same as the syslog user on our host.

To avoid clashing with a potential user with the same uid on the host, we can use a higher uid, in the 40000-60000 range.

runAsUser

To fix that, we will tweak our deploy.yaml:

      containers:
      - name: api
        image: registry/api:latest
        securityContext:
          runAsUser: 60096
          runAsGroup: 60096

Now, from the host side, we will appear as uid 60096, which isn’t mapped to a predefined user (unless a user with the same uid exists on the host obviously).

readOnlyRootFilesystem

Another great addition to the Deployment spec, it to set the filesystem as readonly. This will block any attempt to modify the filesystem of the container, like installing binaries, modifying configuration in /etc..

      containers:
      - name: api
        image: registry/api:latest
        securityContext:
          readOnlyRootFilesystem: true
          runAsUser: 60096
          runAsGroup: 60096

Now if we try to change something, we’ll be greeted by an error message, preventing any change on the root fs:

ERROR: Unable to lock database: Read-only file system
ERROR: Failed to open apk database: Read-only file system

If our app needs to be able to write in a given folder (like a /tmp/cache folder for caching), we can create a volumeMount in the deployment, mapped to an ’emptyDir’. This will allow the app to write at this folder, while retaining the read-only root fs:

      containers:
      - name: api
        image: registry/api:latest
        volumeMounts:
        - mountPath: /tmp/cache
          name: cache-volume
      volumes:
      - name: cache-volume
        emptyDir: {}

automountServiceAccountToken

If the pod is not going to communicate with the kubernetes API, we can avoid mounting the service account’s token in the pod (which by default will be mounted in /var/run/secrets/kubernetes.io/serviceaccount/token):

    spec:
      automountServiceAccountToken: false
      containers:

resources

While we’re on the deployment, it’s also a good idea to set some resources limits on the pod. This will prevent the pods from consuming all the resources from the host, which even if it’s from a genuine mistake, can result in outages or other disurptions:

      containers:
      - name: api
        image: registry/api:latest
        resources:
          limits:
            cpu: 500m
            memory: 128Mi
          requests:
            cpu: 100m
            memory: 64Mi

requests will tell k8s the requirements for the pod, i.e what we can expect the pod to consume. This will aid the scheduling on an appropriate node.

limits will actually stop the pod from consuming more than what is specified, either by throttling for the cpu, or OOM for the memory.

SHA tagging

Another recommendation would be to use SHA tagging on the base image of our Dockerfile. This serves two purposes:

  • making reproducible builds possible. Otherwise, as the base image can be updated, this will break our current setup even though our Dockerfile hasn’t changed.
  • aleviate supply chain attacks: If the base image used is subject to a supply chain attack (a threat actor injects/modify the image), our image becomes subject as well because we’ll pull the latest version of the image.

To do so, we simply add the SHA of the image we want to fix at the end of the tag:

FROM python:3.9-alpine3.15@sha256:f2aeefbeb3846b146a8ad9b995af469d249272af804b309318e2c72c9ca035b0

We can repeat the same behavior on the deployment. Instead of using registry/api:latest, but by using a versionning scheme (like semver or calver) and combining this with the SHA of the image, we ensure that the image used in the deployment will not be updated due to upstream changes.

Results

Final versions of the Dockerfile and deploy.yaml would look like this:

Dockerfile

FROM python:3.9-alpine3.15@sha256:f2aeefbeb3846b146a8ad9b995af469d249272af804b309318e2c72c9ca035b0
COPY requirements.txt .
RUN pip3 install -r requirements.txt

RUN addgroup -S app && adduser -H -D -S app -G app -s /dev/null
USER app:app
WORKDIR /app
COPY src/ ./

ENV PYTHONUNBUFFERED 1
CMD ["python3","app.py"]

deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      automountServiceAccountToken: false
      containers:
      - name: api
        image: registry/api:v1.0.3@sha256:f469d249272af80...
        securityContext:
          readOnlyRootFilesystem: true
          runAsUser: 69096
          runAsGroup: 69096
        ports:
        - containerPort: 5000
        volumeMounts:
        - mountPath: /tmp/cache
          name: cache-volume
      volumes:
      - name: cache-volume
        emptyDir: {}

Theses are simples yet effectives ways to secure your app on k8s. If you want to go deeper, I would suggests checking more advanced topics like host-side security (using apparmor profiles for example), falco , or supply chain security (Open Policy Agent, Admission controllers..). Theses are all topics that are covered by the CKS, and with plenty of information online.

Further resources