End-to-end tests for applications in Kubernetes

This article aims at introducing a small library to ease end-to-end testing of applications in Kubernetes environments.

An overview of what already existed

When typing “kubernetes e2e” or “kubernetes end to end” on Google, the first result I got was about testing a K8s cluster or component. It is what the project’s team is using to test the development of the Kubernetes project. This is not what I wanted. My goal was to test an application I packaged for K8s, not K8s itself.

Terratest is another solution I found. We have the same goal, but viewing this project made me realize I did not want a solution involving advanced programming. We have DevOps that can develop and maintain operative aspects. But they are not so many and most hardly know the Go language. All the team members learned kubectl and Helm commands easily. A scripting solution would be better. This would avoid the choice of a programming language (Go, Java…) and a thus a lot of arguing / reinventing.

Since we mostly had Helm packages, used by several internal projects, I tried to focus on Helm. I immediately found the solution used by the official Helm project. There are interesting parts, such as the linting and version checks. This can be convenient if you setup a internal Helm repository and that you want to make every chart use the same rules. There are also commands to check an installation. Anyway, this is tailored for a collection of Helm charts and still not adapted to what I wanted.

I then found the unit test plug-in for Helm. The principle is to create a YAML file that contains a set of tests. A Helm chart has to be deployed in the environment. The YAML files are passed to the chart that verifies them. This is an interesting solution, but it mostly tests the templating of your chart. Not the applicative behavior.

Testing an application in a Kubernetes environment means being able to deploy it, adapt the topology (scale replicas), verify everything works, execute scenarios and check assertions at various stages. The solution that fit the best this requirement was EUFT. This small project relies on BATS (Bash Automated Testing System), a script framework that allows to write and execute unit tests by using scripts. EUFT is in fact a solution to deploy a small K8s cluster and run BATS tests inside. Examples of tests are available in this repository. I also found out afterwards that Hashicorp was using the same technique for some of their Helm packages.

If I liked the principle of BATS, all the tests used by EUFT and Hashicorp are a little bit complex to maintain. Not everyone in our project is a script god. Besides, we do not want to deploy a K8s cluster in our tests: we want to use an existing one, with the same settings than our production one. This is important because of permissions and network policies. Running e2e tests in a ephemeral K8s installation is too limited. However, EUFT gave me a direction since I have not found anything else.

The DETIK library

I was not really inspired for a name…
DETIK stands for « DevOps End-to-End Testing In Kubernetes ». The idea is to write tests by using scripts, running them with BATS, and having a simple syntax, almost in natural language, to write assertions about resources in Kubernetes. With kubectl or Helm commands, a few knowledge in scripts (BASH, Ruby, Python, whatever…) and this library, anyone should be able to write applicative tests and be able to maintain them with very few efforts.

In addition to performing actions on the cluster, I also wanted to support the execution of scenarios. Scenarios can imply topology adaptations, but also user actions. BATS can integrate with many solutions, such as Selenium or Cypress for end-user scenarios, or Gatling for performance tests. With all these tools, it becomes possible to test an application from end-to-end in a K8s environment.


The following example is taken from the Git repository.
It show the test of a Helm package. A part of the syntax comes from BATS.

#!/usr/bin/env bats

# An example of tests for a Helm package
# that deploys Drupal and Varnish
# instances in a K8s cluster.

load "/home/testing/lib/detik.bash"

function setup() {

function verify_helm() {
 	helm template ../drupal | kubectl apply --dry-run -f -

@test "verify the linting of the chart" {

	run helm lint ../drupal
	[ "$status" -eq 0 ]

@test "verify the deployment of the chart in dry-run mode" {

	run verify_helm
	[ "$status" -eq 0 ]	

@test "package the project" {

	run helm -d /tmp package ../drupal
	# Verifying the file was created is enough
	[ -f /tmp/drupal-${pck_version}.tgz ]

@test "verify a real deployment" {

	[ -f /tmp/drupal-${pck_version}.tgz ]

	run helm install --name my-test \
		--set varnish.ingressHost=varnish.test.local \
		--set db.ip= \
		--set db.port=44320 \
		--tiller-namespace my-test-namespace \

	[ "$status" -eq 0 ]
	sleep 10

	# PODs
	run verify "there is 1 pod named 'my-test-drupal'"
	[ "$status" -eq 0 ]

	run verify "there is 1 pod named 'my-test-varnish'"
	[ "$status" -eq 0 ]

	# Postgres specifics
	run verify "there is 1 service named 'my-test-postgres'"
	[ "$status" -eq 0 ]

	run verify "there is 1 ep named 'my-test-postgres'"
	[ "$status" -eq 0 ]

	run verify \
		"'.subsets[*].ports[*].port' is '44320' " \
		"for endpoints named 'my-test-postgres'"
	[ "$status" -eq 0 ]

	run verify \
		"'.subsets[*].addresses[*].ip' is '' " \
		"for endpoints named 'my-test-postgres'"
	[ "$status" -eq 0 ]

	# Services
	run verify "there is 1 service named 'my-test-drupal'"
	[ "$status" -eq 0 ]

	run verify "there is 1 service named 'my-test-varnish'"
	[ "$status" -eq 0 ]

	run verify "'port' is '80' for services named 'my-test-drupal'"
	[ "$status" -eq 0 ]

	run verify "'port' is '80' for services named 'my-test-varnish'"
	[ "$status" -eq 0 ]

	# Deployments
	run verify "there is 1 deployment named 'my-test-drupal'"
	[ "$status" -eq 0 ]

	run verify "there is 1 deployment named 'my-test-varnish'"
	[ "$status" -eq 0 ]

	# Ingress
	run verify "there is 1 ingress named 'my-test-varnish'"
	[ "$status" -eq 0 ]

	run verify \
		"'.spec.rules[*].host' is 'varnish.test.local' " \
		"for ingress named 'my-test-varnish'"
	[ "$status" -eq 0 ]

	run verify \
		"'.spec.rules[*].http.paths[*].backend.serviceName' " \
		"is 'my-test-varnish' for ingress named 'my-test-varnish'"
	[ "$status" -eq 0 ]

	# PODs should be started
	run try "at most 5 times every 30s " \
		"to get pods named 'my-test-drupal' " \
		"and verify that 'status' is 'running'"
	[ "$status" -eq 0 ]

	run try "at most 5 times every 30s " \
		"to get pods named 'my-test-varnish' " \
		"and verify that 'status' is 'running'"
	[ "$status" -eq 0 ]

	# Indicate to other tests that the deployment succeeded
	echo "started" > tests.status.tmp

@test "verify the deployed application" {

	if [[ ! -f tests.status.tmp ]]; then
		skip " The application was not correctly deployed... "

	rm -rf /tmp.drupal.html
	curl -sL http://varnish.test.local -o /tmp/drupal.html
	[ -f ${BATS_TMPDIR}/drupal.html ]

	grep -q "<title>Choose language | Drupal</title>" /tmp/drupal.html
	grep -q "Set up database" /tmp/drupal.html
	grep -q "Install site" /tmp/drupal.html
	grep -q "Save and continue" /tmp/drupal.html

@test "verify the undeployment" {

	run helm del --purge my-test --tiller-namespace my-test-namespace
	[ "$status" -eq 0 ]
	[ "$output" == "release \"my-test\" deleted" ]

	run verify "there is 0 service named 'my-test'"
	[ "$status" -eq 0 ]

	run verify "there is 0 deployment named 'my-test'"
	[ "$status" -eq 0 ]

	sleep 60
	run verify "there is 0 pod named 'my-test'"
	[ "$status" -eq 0 ]

@test "clean the test environment" {
	rm -rf tests.status.tmp

These unit tests include the linting of the chart, a dry-run deployment, but also a real deployment with a basic topology. After deploying it, we verify assertions on K8s resources. Once the application (a simple Drupal) is started, we get the content of the web site and make sure it contains some expected words and sentences. We could replace it by a Selenium scenario.

Executing the bats my-test-file.bats command would start the execution.
A successful run would show the following output:

bats my-test-file.bats

✓ 1 verify the linting of the chart
✓ 2 verify the deployment of the chart in dry-run mode
✓ 3 package the project
✓ 4 verify a real deployment
✓ 5 verify the deployed application
✓ 6 verify the undeployment
✓ 7 clean the test environment

The command "bats my-test-file.bats" exited with 0.

Errors appear like below.


✗ 1 verify the linting of the chart
    (in test file my-test(file.bats, line 14)
     `[ "$status" -eq 0 ]' failed


Library Principles

Assertions are used to generate kubectl queries.
The output is extracted and compared to the values given as parameters.

There are very few queries in fact.
However, they work with all the kinds of resources of Kubernetes. That includes native K8s objects (POD, services….) but also OpenShift elements (routes, templates…) or custom resources (e.g. the upcoming Helm v3 objects).

Queries can be run with kubectl or with oc (the OpenShift client).
You only have to specify the client name in the DETIK_CLIENT_NAME variable (and make sure the client is available in the path).

With this, you can verify pre and post-conditions when using a Kubernetes client, Helm or even operators.


The library is available as a single file.
It can be donwloaded from this Github repository. The syntax is documented in the readme of the project.

A Dockerfile is provided as a basis in the project.
It embeds a kubectl client, a Helm client, BATS and the DETIK library. Depending on your cluster configuration, you might want to add other items (e.g. to log into your cluster).

Continuous Integration

The project is documented and explains how to execute (and write) such tests on your own machine. But the real interest of such tests is to be run in the last parts of an automated pipeline.

Here is a simple Jenkinsfile (for a Jenkins pipeline).

def label = "${env.JOB_NAME}.${env.BUILD_NUMBER}".replace('-', '_').replace('/', '_')
podTemplate(label: label, containers: [
			name: 'jnlp',
			image: 'jnlp-slave-alpine:3.27-1-alpine'), 
			name: 'detik',
			image: 'detik:LATEST',
			ttyEnabled: true,
			alwaysPullImage: true, 
			envVars: [
				envVar(key: 'http_proxy', value: 'http://proxy.local:3128'),
				envVar(key: 'https_proxy', value: 'http://proxy.local:3128'),
				envVar(key: 'TILLER_NAMESPACE', value: 'my-test-namespace')
]) {

	node(label) {
		container(name: 'jnlp') {
			stage('Checkout') {
				checkout scm

		container(name: 'ci-docker') {
			stage('Login') {
						credentialsId: 'k8s-credentials',
						passwordVariable: 'K8S_PASSWORD',
						usernameVariable: 'K8S_USERNAME')]) {

					echo 'log into the cluster...'
					// TODO: it depends on your cluster configuration

			stage('Build and Test') {
				sh 'bats tests/main.bats'

It can easily be adapted for Travis or GitLab CI.
You will find more examples on Github.

Using Graylog for Centralized Logs in K8s platforms and Permissions Management

This article explains how to centralize logs from a Kubernetes cluster and manage permissions and partitionning of project logs thanks to Graylog (instead of ELK). The fact is that Graylog allows to build a multi-tenant platform to manage logs. Let’s take a look at this.

Reminders about logging in Kubernetes

As it is stated in Kubernetes documentation, there are 3 options to centralize logs in Kubernetes environements.

The first one is about letting applications directly output their traces in other systems (e.g. databases). This approach always works, even outside Docker. However, it requires more work than other solutions. Not all the applications have the right log appenders. It can also become complex with heteregenous Software (consider something less trivial than N-tier applications). Eventually, log appenders must be implemented carefully: they should indeed handle network failures without impacting or blocking the application that use them, while using as less resources as possible. So, althouth it is a possible option, it is not the first choice in general.

Applications output their logs directly in a central store

The second solution is specific to Kubernetes: it consists in having a side-car container that embeds a logging agent. This agent consumes the logs of the application it completes and sends them to a store (e.g. a database or a queue). This approach is better because any application can output logs to a file (that can be consumed by the agent) and also because the application and the agent have their own resources (they run in the same POD, but in different containers). Side-car containers also gives the possibility to any project to collect logs without depending on the K8s infrastructure and its configuration. However, if all the projets of an organization use this approach, then half of the running containers will be collecting agents. Even though log agents can use few resources (depending on the retained solution), this is a waste of resources. Besides, it represents additional work for the project (more YAML manifests, more Docker images, more stuff to upgrade, a potential log store to administrate…). A global log collector would be better.

Collector in side-car containers

That’s the third option: centralized logging. Rather than having the projects dealing with the collect of logs, the infrastructure could set it up directly. The idea is that each K8s minion would have a single log agent and would collect the logs of all the containers that run on the node. This is possible because all the logs of the containers (no matter if they were started by Kubernetes or by using the Docker command) are put into the same file. What kubectl log does, is reading the Docker logs, filtering the entries by POD / container, and displaying them. This approach is the best one in terms of performances. What is difficult is managing permissions: how to guarantee a given team will only access its own logs. Not all the organizations need it. Small ones, in particular, have few projects and can restrict access to the logging platform, rather than doing it IN the platform. Anyway, beyond performances, centralized logging makes this feature available to all the projects directly. They do not have to deal with logs exploitation and can focus on the applicative part.

Centralized logging

Centralized Logging in K8s

Centralized logging in K8s consists in having a daemon set for a logging agent, that dispatches Docker logs in one or several stores. The most famous solution is ELK (Elastic Search, Logstash and Kibana). Logstash is considered to be greedy in resources, and many alternative exist (FileBeat, Fluentd, Fluent Bit…). The daemon agent collects the logs and sends them to Elastic Search. Dashboards are managed in Kibana.

Things become less convenient when it comes to partition data and dashboards. Elastic Search has the notion of index, and indexes can be associated with permissions. So, there is no trouble here. But Kibana, in its current version, does not support anything equivalent. All the dashboards can be accessed by anyone. Even though you manage to define permissions in Elastic Search, a user would see all the dashboards in Kibana, even though many could be empty (due to invalid permissions on the ES indexes). Some suggest to use NGinx as a front-end for Kibana to manage authentication and permissions. It seems to be what Red Hat did in Openshift (as it offers user permissions with ELK). What I present here is an alternative to ELK, that both scales and manage user permissions, and fully open source. This relies on Graylog.

Here is what Graylog web sites says: « Graylog is a leading centralized log management solution built to open standards for capturing, storing, and enabling real-time analysis of terabytes of machine data. We deliver a better user experience by making analysis ridiculously fast, efficient, cost-effective, and flexible. »

I heard about this solution while working on another topic with a client who attended a conference few weeks ago. And indeed, Graylog is the solution used by OVH’s commercial solution of « Log as a Service » (in its data platform products). This article explains how to configure it. It is assumed you already have a Kubernetes installation (otherwise, you can use Minikube). To make things convenient, I document how to run things locally.


Graylog is a Java server that uses Elastic Search to store log entries.
It also relies on MongoDB, to store metadata (Graylog users, permissions, dashboards, etc).

HA architecture for Graylog

What is important is that only Graylog interacts with the logging agents. There is no Kibana to install. Graylog manages the storage in Elastic Search, the dashboards and user permissions. Elastic Search should not be accessed directly. Graylog provides a web console and a REST API. So, everything feasible in the console can be done with a REST client.

Deploying Graylog, MongoDB and Elastic Search

Obviously, a production-grade deployment would require a highly-available cluster, for both ES, MongoDB and Graylog. But for this article, a local installation is enough. A docker-compose file was written to start everything. As ES requires specific configuration of the host, here is the sequence to start it:

sudo sysctl -w vm.max_map_count=262144
docker-compose -f compose-graylog.yml up

You can then log into Graylog’s web console at http://localhost:9000 with admin/admin as credentials. Those who want to create a highly available installation can take a look on Graylog’s web site.

Deploying the Collecting Agent in K8s

As discussed before, there are many options to collect logs.
I chose Fluent Bit, which was developed by the same team than Fluentd, but it is more performant and has a very low footprint. There are also less plug-ins than Fluentd, but those available are enough.

What we need to is get Docker logs, find for each entry to which POD the container is associated, enrich the log entry with K8s metadata and forward it to our store. Indeed, Docker logs are not aware of Kubernetes metadata. We therefore use a Fluent Bit plug-in to get K8s meta-data. I saved on Github all the configuration to create the logging agent. It gets logs entries, adds Kubernetes metadata and then filters or transforms entries before sending them to our store.

The message format we use is GELF (which a normalized JSON message supported by many log platforms). Notice there is a GELF plug-in for Fluent Bit. However, I encountered issues with it. As it is not documented (but available in the code), I guess it is not considered as mature yet. Instead, I used the HTTP output plug-in and built a GELF message by hand. Here is what it looks like before it is sent to Graylog.

"short_message":"2019/01/13 17:27:34 Metric client health check failed...",

Eventually, we need a service account to access the K8s API.
Indeed, to resolve to which POD a container is associated, the fluent-bit-k8s-metadata plug-in needs to query the K8s API. So, it requires an access for this.

You can find the files in this Git repository. The service account and daemon set are quite usual. What really matters is the configmap file. It contains all the configuration for Fluent Bit: we read Docker logs (inputs), add K8s metadata, build a GELF message (filters) and sends it to Graylog (output). Take a look at the Fluent Bit documentation for additionnal information.

Configuring Graylog

There many notions and features in Graylog.
Only few of them are necessary to manage user permissions from a K8s cluster. First, we consider every project lives in its own K8s namespace. If there are several versions of the project in the same cluster (e.g. dev, pre-prod, prod) or if they live in different clusters does not matter. What is important is to identify a routing property in the GELF message. So, when Fluent Bit sends a GELF message, we know we have a property (or a set of properties) that indicate(s) to which project (and which environment) it is associated with. In the configmap stored on Github, we consider it is the _k8s_namespace property.

Now, we can focus on Graylog concepts.
We need…

An input

An input is a listener to receive GELF messages.
You can create one by using the System > Inputs menu. In this example, we create a global one for GELF HTTP (port 12201). There are many options in the creation dialog, including the use of SSL certificates to secure the connection.

Screenshot of the inputs management in Graylog


Graylog indices are abstractions of Elastic indexes. They designate where log entries will be stored. You can associate sharding properties (logical partition of the data), retention delay, replica number (how many instances for every shard) and other stuff to a given index. Every projet should have its own index: this allows to separate logs from different projects. Use the System > Indices to manage them.

Indices in Graylog's web console

A project in production will have its own index, with a bigger retention delay and several replicas, while a developement one will have shorter retention and a single replica (it is not a big issue if these logs are lost).


A stream is a routing rule. They can be defined in the Streams menu. When a (GELF) message is received by the input, it tries to match it against a stream. If a match is found, the message is redirected into a given index.

Creating a stream in Graylog

When you create a stream for a project, make sure to check the Remove matches from ‘All messages’ stream option. This way, the log entry will only be present in a single stream. Otherwise, it will be present in both the specific stream and the default (global) one.

The stream needs a single rule, with an exact match on the K8s namespace (in our example).
Again, this information is contained in the GELF message. Notice that the field is _k8s_namespace in the GELF message, but Graylog only displays k8s_namespace in the proposals. The initial underscore is in fact present, even if not displayed.

Creating a rule for a stream

Do not forget to start the stream once it is complete.

Graylog streams


Graylog’s web console allows to build and display dashboards.
Make sure to restrict a dashboard to a given stream (and thus index). Like for the stream, there should be a dashboard per namespace. Using the K8s namespace as a prefix is a good option.

Dashboards are defined directly in Graylog

Graylog provides several widgets…
Take a look at the documentation for further details.

Sample dashboard in Graylog


Graylog allows to define roles.
A role is a simple name, coupled to permissions (roles are a group of permissions). You can thus allow a given role to access (read) or modify (write) streams and dashboards. For a project, we need read permissions on the stream, and write permissions on the dashboard. This way, users with this role will be able to view dashboards with their data, and potentially modifying them if they want.

Roles and users can be managed in the System > Authentication menu.

Managing access to streams

Managing access to dashboards

The list of roles


Apart the global administrators, all the users should be attached to roles.
These roles will define which projects they can access. You can consider them as groups. When a user logs in, Graylog’s web console displays the right things, based on their permissions.

Creating a user in Graylog

There are two predefined roles: admin and viewer.
Any user must have one of these two roles. He (or she) may have other ones as well. When a user logs in, and that he is not an administrator, then he only has access to what his roles covers.

The user only sees the stream for his stream

Clicking the stream allows to search for log entries.

Log entries associated with the stream

Notice that there are many authentication mechanisms available in Graylog, including LDAP.


Graylog uses MongoDB to store metadata (stream, dashboards, roles, etc) and Elastic Search to store log entries. We define an input in Graylog to receive GELF messages on a HTTP(S) end-point. These messages are sent by Fluent Bit in the cluster.

When such a message is received, the k8s_namespace_name property is verified against all the streams.
When one matches this namespace, the message is redirected in a specific Graylog index (which is an abstraction of ES indexes). Only the corresponding streams and dashboards will be able to show this entry.

How permissions and isolation are managed in Graylog

Eventually, only the users with the right role will be able to read data from a given stream, and access and manage dashboards associated with it. Logs are not mixed amongst projects. Isolation is guaranteed and permissions are managed trough Graylog.

In short : 1 project in an environment = 1 K8s namespace = 1 Graylog index = 1 Graylog stream = 1 Graylog role = 1 Graylog dashboard. This makes things pretty simple. You can obviously make more complex, if you want…

Testing Graylog

You can send sample requests to Graylog’s API.

# Found on Graylog's web site
curl -X POST -H 'Content-Type: application/json' -d '{ "version": "1.1", "host": "example.org", "short_message": "A short message", "level": 5, "_some_info": "foo" }' ''

This one is a little more complex.

# Home made
curl -X POST -H 'Content-Type: application/json' -d '{"short_message":"2019/01/13 17:27:34 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.","_stream":"stdout","_timestamp":"2019-01-13T17:27:34.567260271Z","_k8s_pod_name":"kubernetes-dashboard-6f4cfc5d87-xrz5k","_k8s_namespace_name":"test1","_k8s_pod_id":"af8d3a86-fe23-11e8-b7f0-080027482556","_k8s_labels":{},"host":"minikube","_k8s_container_name":"kubernetes-dashboard","_docker_id":"6964c18a267280f0bbd452b531f7b17fcb214f1de14e88cd9befdc6cb192784f","version":"1.1"}' http://localhost:12201/gelf

Feel free to invent other ones…

Automating stuff

Every features of Graylog’s web console is available in the REST API.
It means everything could be automated. Every time a namespace is created in K8s, all the Graylog stuff could be created directly. Project users could directly access their logs and edit their dashboards.

Check Graylog’s web site for more details about he API. When you run Graylog, you can also access the Swagger definition of the API at http://localhost:9000/api/api-browser/

Going further

The resources in this article use Graylog 2.5.
The next major version (3.x) brings new features and improvements, in particular for dashboards. There should be a new feature that allows to create dashboards associated with several streams at the same time (which is not possible in version 2.5, a dashboard being associated with a single stream – and so a single index). That would allow to have transverse teams, with dashboards that span across several projects.

See https://www.graylog.org/products/latestversion for more details.

Hints for Tests

If you do local tests with the provided compose, you can purge the logs by stopping the compose stack and deleting the ES container (docker rm graylogdec2018_elasticsearch_1). Then restart the stack.

If you remove the MongoDB container, make sure to reindex the ES indexes.
Or delete the Elastic container too.

Handshake failure with Maven

I have experienced an annoying issue while building a projet.
I got a javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure message while downloading resources at https://publicsuffix.org/list//effective_tld_names.dat.

I followed instructions from this forum and installed JCE librairies in my JDK’s security directory (I am using Oracle JDK 8). Unfortunately, Maven kept on failing with this host. I finally upgraded to a more recent version of Maven (from 3.2.2 to 3.6.0) and it worked.

I hope this will help someone if the case occurs once again.

Shared Responsibilities in Jenkins Pipelines for Docker Images

This article explains how to implement the pipeline described in this article (and summed up in the diagram below) for Docker images with Jenkins (I used version 2.145 for my tests).

Pipeline stages are defined in separate Git repositories

We assume Jenkins runs on nodes. The pipeline can be adapted to run in containers. However, running tests on Docker images might be harder. Solutions like Docker in Docker, and most of all, Docker outside of Docker (share the host’s Docker daemon inside a container to create siblings) may work. However, running tests of Docker containers is easier when the Jenkins agent runs directly on a node. And Docker outside of Docker is not considered as a safe practice.

Let’s see what we need for Jenkins:

  • A Jenkins master, with the Pipeline plug-in.
  • Jenkins agents that run on (virtual) machines.
  • At least one agent node with a Docker daemon installed. It will be used to build images and run tests.
  • The same agent node should have access to a Clair instance (through Clair-Scanner) or to a Dagda install.

Here is the Jenkins topology with Clair.
In this diagram, the Clair server would be managed by the security team/role. Not every project should have its own instance of Clair, that would make no sense. It is something that should be common.

Jenkins topology with a specialized node agent (Docker + access to Clair)

The topology would be almost the same with Dadga.
To integrate Dadga in a CI process, you have two options. Dagda being a Python script, either you create a web service to invoke during the build process (that will execute the Python script), or you deploy its database on a separate VM, and you execute Dadga as a container on the Jenkins slave. This container should connect to the remote database. That would be the best option to have quick scans. For the notice, Dadga uses MongoDB instead of PostgreSQL.

It is also important to secure the Docker daemon on the node.
As I do not want to spend too much time on it, here are some links:


With Jenkins, the convention is to define pipelines in a file named Jenkinsfile.
To make ours simple, we provide an all-in-one package. Projects only have to provide parameters and their own tests for their Docker image. The following sample gives an idea about what such a project repository would look like…

- Dockerfile
- ...
- execute-tests.sh
- ...

The project repository contains resources to build the image, resources to test it, and a Jenkinsfile, that describes the build process for Jenkins. Here is the content of this file.

    imageName: 'my-image-name',
    imageVersion: '1.0.0'

allInOne is not something Jenkins recognizes by default.
In fact, it refers to a Jenkins shared library we need to define. There will be 3 of them: one for the composite pipeline (« allInOne »), one for the security team, and one for the governance team. Each shared library is defined in its own Git repository, each one having its own permissions. This way, the security team can be sure only it can access/commit about security stuff. Same thing for the Software governance team. And they can all have read access to the shared library that agregates the various stages.

Shared library for the composite pipeline

This is the shared library to use for the « allInOne » tag.
Jenkins pipeline allows to define three things in a shared library: helpers, a pipeline or a step. What would have been better would have been defining stages. But this is not possible. Our allInOne library will thus declare a pipeline with parameters. Here is the main code:

// In vars/allInOne.groovy (shared library that defines the generic pipeline)
def call(Map config) { 
	node { 
		def timeStamp = Calendar.getInstance().getTime().format('YYYYMMdd-hhmmss', TimeZone.getTimeZone('Europe/Paris'))
		def buildId = "${config.imageVersion}-${timeStamp}"

		stage('Checkout') { 
			echo "Checking out the sources..." 
			checkout scm 

		stage('Build Image') {
			// Enforce the shape of the repository and assume the Dockerfile is always under image/
			sh 'docker build -t "${config.imageName}:${buildId}" image/'

		stage('Project tests') {
			def scriptFileContent = libraryResource( 'com/linagora/execute-project-tests.sh' )
			sh scriptFileContent

		stage('Security checks') {
			echo "Checking security..."
			securityInspection( "${config.imageName}", "${buildId}" )

		stage('Software Governance') {
			echo "Handling Software checks..."
			softwareCheck( "${config.imageName}", "${buildId}" )

		stage('Promotion') {
			echo "Promoting the local image to a trusted repository..."
			def scriptFileContent = libraryResource( 'com/linagora/promote-image.sh' )
			sh scriptFileContent

All the stages are sequential. If one fails, everything stops. The securityInspection and softwareCheck steps are defined farther as global shared libraries. The promote-image.sh and execute-project-tests.sh scripts are provided as resources of the shared library. You can find the whole project online, on Github.

Notice the build stage is very basic.
In this example, we do not handle build parameters. This part can easily be adapted.

Project tests

How can a project specify tests for its Docker image?
Well, this is not complicated. I already faced the issue with a previous project. The idea is the following:

  1. Have a script that launches tests.
  2. This script instantiates the image in detached mode, with a shared volume.
  3. We then execute another script inside the container, by using docker exec.
  4. This second script is in charge of verifying assertions inside the container. Example: verify some process is running, verify a given file was created, etc. When an assertion fails, a message is written inside a file, located in the shared volume. When the container terminates, this file will remain available on the host system.
  5. Once the script has completed, the first script can verify the content of the error file. If there are errors, it can just fail the build.
  6. It is also possible to verify some assertions about the container from the outside (e.g. try to ping or reach a given port).

Here is another repository that illustrates this strategy.
It was used to test RPM and Debian packages in Docker containers. The script that launches the tests and verifies the result is here. Scripts that verify assertions in the containers are located in this directory.

This kind of tests works for functional tests, but it can also be applied for security checks (e.g. permissions on files) or governance ones. The difference here is that these tests lie in the same Git repository than the Dockerfile.

To run them, the allInOne pipeline must verify some script test exists and start it. A naming convention is enough.

Security checks

Security checks are maintained by the security team/role as a shared library, hosted in its own Git repository. Security checks can be verified with a script (e.g. permissions, etc). But for Docker images, there also exist solutions that inspect images and detect leaks.

Clair, from CoreOS, is one of them.
This project uses a database of known vulnerabilities, and scans images. It then provides a dashboard, indicating which CVE were found and for which images.

Dadga is another solution.
Although it is more recent, it provides additional features than Clair. It works the same way, but also uses ClamAV to search for virus, trojans and so on. What takes time for both Dadga and Clair is to update the database of known vulnerabilities. This is why the database cannot be deployed on the fly during the build process. It must preexist to pipeline executions.

Both solutions can also be used to regularly control Docker images. But since the focus of these articles is about delivery pipelines, you will have to dig this part yourself. What is interesting is to integrate these tools in the delivery pipeline, as a validation step. To keep the example simple, I only include Clair. Clair is great, but using it as a command-line is not easy. The best option is to use Clair Scanner as a complement.

And here is our shared library

// In vars/securityInspection.groovy (shared library for the security role)
def call(String imageName, String buildId) { 

	// We assume clair-scanner is available in the path
	def host = sh(returnStdout: true, script: 'hostname -i').trim()
	clair-scanner -c <CLAIR_SERVER_URL> --ip ${host} --t High ${imageName}:${buildId}

Here, Clair will scan the given image. If vulnerabilities are found with a severity higher or equal to high, then clair-scanner will return a non-zero exit code and thus fail the build.

If you use Dadgda instead of Clair, you simply run a Python script. The installation is a little bit different, but the pipeline step would remain simple. You can also add custom scripts to perform additional verifications (just add new steps in the pipeline).

In addition, one could also use Anchore, an open source solution to perform static analysis and check custom policies against Docker images. I have found it after writing this article, so I just put a mention here.

Software Governance

Software governance can be managed in the same way that previous stages.
Since it depends on the organization itself, I have no generic tool to suggest. Instead, I assume there is some REST end-point somewhere that can be contacted during the build. The goal is to extract information and send them to a remote web service that will store them, and optionally trigger an alert or a build failure in case of exotic findings.

So, here is an example of associated shared library

// In vars/softwareCheck.groovy (shared library for the Software Governance role)
def call(String imageName, String buildId) { 

	def scriptFileContent = libraryResource( 'com/linagora/analyze-dockerfile.sh' )
	sh scriptFileContent
	sh 'echo "imageName: ${imageName}" >> /tmp/gov.results.txt'
	sh 'echo "imageVersion: ${buildId}" >> /tmp/gov.results.txt'
	sh 'curl --data-binary "@/tmp/gov.results.txt" -X POST...'
	sh 'rm -rf /tmp/gov.results.txt'

Pipeline for existing Docker images

The initial pipeline includes a stage to build an image.
However, a project team may want to use an existing image. Community images benefit from various feedbacks and contributors. How to guarantee a project can safely use it inside an organization with its own guidelines?

Well, our generic pipeline, with project tests, security checks and governance, perfectly fits such a use case. The guidelines should be enforced in the automated pipeline. The only difference is that we do not build an image, we use an existing one from the outside. So, let’s adapt our allInOne shared library to cover such a scenario.

// In vars/allInOne.groovy (shared library that defines the generic pipeline, upgraded to support existing images)
def call(Map config) { 

	node { 
		def timeStamp = Calendar.getInstance().getTime().format('YYYYMMdd-hhmmss', TimeZone.getTimeZone('Europe/Paris'))
		def buildId = "${config.imageVersion}-${timeStamp}"

		// Alway checkout the sources, as they may include tests
		stage('Checkout') { 
			echo "Checking out the sources..." 
			checkout scm 

		if (config.existing == true) {
			stage('Docker pull') {
				def buildId = "${config.imageVersion}"
				sh 'docker pull "${config.imageName}:${buildId}"'

		if (config.existing != true) {
			stage('Build Image') {
				// Enforce the shape of the repository and assume it is always under image/
				sh 'docker build -t "${config.imageName}:${buildId}" image/'

		stage('Project tests') {
			def scriptFileContent = libraryResource( 'com/linagora/execute-project-tests.sh' )
			sh scriptFileContent

		stage('Security checks') {
			echo "Checking security..."
			securityInspection( "${config.imageName}", "${buildId}" )

		stage('Software Governance') {
			echo "Handling Software checks..."
			softwareCheck( "${config.imageName}", "${buildId}" )

		stage('Promotion') {
			echo "Promoting the local image to a trusted repository..."
			def scriptFileContent = libraryResource( 'com/linagora/promote-image.sh' )
			sh scriptFileContent

If we use an existing image we simply pull it. Otherwise, we build it. The other parts of the pipeline are the same.
A project using an existing image would then declare its Jenkinsfile as…

    imageName: 'my-image-name',
    imageVersion: '1.0.0',
    existing: true

Integration in Jenkins

The most simple solution is to define our 3 shared libraries as global shared libraries. Besides, the shared libraries above all expose global variables, which avoids using import declarations in our Jenkinsfiles. To do so, go into Jenkins administration, system configuration and find the pipeline section. The shared library will be loaded on the fly from a git repository, in every job. It is never cached. For security reasons, the (Jenkins) user that pulls the repository should only have read access.

Here is a screenshot that shows how a shared library is defined in Jenkins.

Jenkins Administration

You can add as many ones as necessary. This article has defined 3 shared libraries, so you would add 3 into Jenkins. It is possible to set versions to shared libraries, but I think it is not necessary for global shared libraries. If stages had to differ between projects, you would define different composite pipelines. And the stages are managed on the fly, in the same fashion than 12 factors.


Two questions can arise once we are here.

  • Where does this pipeline lead?
  • How to prevent a project from by-passing these checks?

The answer to the first question influences the second one.
In my opinion, this pipeline promotes the built image into a trusted Docker registry, that can then be used in Kubernetes environments. You cannot test this image in a K8s cluster before (or if you can, then it must be a sandbox cluster). Once this is clear, the second answer becomes obvious. A project cannot by-pass this pipeline because otherwise, it cannot import its Docker images in the trusted registry. The allInOne shared-library is the only part that can access the trusted registry. It cannot be done from anywhere else: the credentials are kept secret and a single user (Jenkins) should have the write permissions to push an image in the trusted registry. All the other users have read-only access.


This article has shown how to use Jenkins shared libraries to build a validation process for Docker images, process that is under different responsibilities. I skipped some details to not lost readers, but I believe the main aspects are all explained here.

The next article is about Helm packages: how to verify quality criteria (e.g. linting), test them and so on. All of this with a Jenkins pipeline and the same distribution of responsibilities.