Dynamic Configuration Discovery in Grafana

Posted on Mar 25, 2019 --- 5 minute read

#Prometheus #Grafana #Kubernetes

A few of my colleagues have written posts recently on the Prometheus stack so I thought I’d get in on the action.

In this post I’ll walk through how Grafana uses sidecar containers to dynamically discover datasources and dashboards declared as ConfigMaps in Kubernetes to allow easy and extensible configuration for cluster operators.

Let’s dive in!

For the following examples I used the Prometheus Operator Helm chart which installs the operator stack (Prometheus, Grafana, Alertmanager, CRDs, RBAC resources, etc…) including some default dashboards and alerts as defined by the awesome kube-prometheus project.

Datasources

Grafana is a tool for visualizing data, and can be connected to multiple query sources. The way that Grafana connects (URL, credentials, etc…) to backend datasources is configured using one or more configuration files which Grafana reads during provisioning. The chart / operator makes this even easier by giving us a neat way of defining Kubernetes ConfigMaps with our datasource configuration and using a sidecar to drop them into the correct directory (provisioning/datasources) for Grafana to use.

If you’re following along at home, make sure you’ve enabled these features in the chart values.yaml:

grafana:
  sidecar:
    datasources:
      enabled: true
      label: grafana_datasource

The label key grafana_datasource is default, but if you change this make sure to remember as we’ll need this value later when creating our ConfigMap.

Now let’s take a look at the Grafana deployment itself:

# Source: prometheus-operator/charts/grafana/templates/deployment.yaml
# ...
- name: grafana
  image: "grafana/grafana:6.0.0"
  imagePullPolicy: IfNotPresent
  volumeMounts:
    - name: config
      mountPath: "/etc/grafana/grafana.ini"
      subPath: grafana.ini
    - name: sc-datasources-volume
      mountPath: "/etc/grafana/provisioning/datasources"
# ...

We’re mounting the sc-datasources-volume to the provisioning-datasources directory in the Grafana container. That volume is being declared further down the deployment as an emptyDir:

volumes:
  - name: sc-datasources-volume
    emptyDir: {}

Now the neat part is an initContainer (learn more) which is defined as part of the Grafana pod:

initContainers:
  - name: grafana-sc-datasources
    image: "kiwigrid/k8s-sidecar:0.0.11"
    imagePullPolicy: IfNotPresent
    env:
      - name: METHOD
        value: LIST
      - name: LABEL
        value: "grafana_datasource"
      - name: FOLDER
        value: "/etc/grafana/provisioning/datasources"
    volumeMounts:
      - name: sc-datasources-volume
        mountPath: "/etc/grafana/provisioning/datasources"

This is running an image called kiwigrid/k8s-sidecar and setting some env variables. Let’s take a look at what the sidecar does. It’s actually a pretty cool (but simple, less than 200 lines of python!) utility that talks to the Kubernetes API and watches a configurable set of namespaces for ConfigMaps with a specific label (this is the grafana_datasource label we set earlier in the values.yaml). It then grabs the data in the ConfigMaps and writes to a corresponding file in the FOLDER that we pass in.

Now for the final piece, let’s take a look at the default ConfigMap generated as part of the operator / chart:

# Source: prometheus-operator/templates/grafana/configmaps-datasources.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-operator-grafana-datasource
  labels:
    grafana_datasource: "1"
    app: prometheus-operator-grafana
    chart: prometheus-operator-4.3.1
    release: "prometheus-operator"
    heritage: "Tiller"
data:
  datasource.yaml: |-
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: http://prometheus-operator-prometheus:9090/
      access: proxy
      isDefault: true

We can see our grafana_dashboard label, and the contents of the ConfigMap specify that Grafana should query Prometheus on it’s internal Kubernetes service URL / port. Lastly if we look at the logs for the initContainer we can see it working and discovering the ConfigMap:

Starting config map collector
Config for cluster api loaded...
Working on configmap monitoring/prometheus-operator-etcd
Working on configmap monitoring/prometheus-operator-grafana
Working on configmap monitoring/prometheus-operator-grafana-config-dashboards
Working on configmap monitoring/prometheus-operator-grafana-datasource
Configmap with label found
Working on configmap monitoring/prometheus-operator-k8s-cluster-rsrc-use
Working on configmap monitoring/prometheus-operator-k8s-coredns

Note: Grafana only checks it’s provisioning/datasources directory on startup, which is why the sidecar is implemented as an initContainer in this case, rather than a long-running watcher. This can lead to race conditions where the datasource ConfigMap(s) may not have been applied before the initContainer runs, leaving the datasources section in Grafana empty.

Go To Top

Dashboards

Grafana is super flexible and there is almost no limit to the variety of dashboards that you can create. Dashboards in Grafana are defined as JSON and Grafana looks for them by reading a provider configuration file:

# Source: prometheus-operator/charts/grafana/templates/configmap-dashboard-provider.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app: grafana
    chart: grafana-2.2.0
    release: prometheus-operator
    heritage: Tiller
  name: prometheus-operator-grafana-config-dashboards
data:
  provider.yaml: |-
    apiVersion: 1
    providers:
    - name: 'default'
      orgId: 1
      folder: ''
      type: file
      disableDeletion: false
      options:
        path: /tmp/dashboards

We can see that /tmp/dashboards is where we need to put our dashboard definitions for Grafana to pick them up, and Grafana again uses the sidecar model to allow us to define dashboards as Kubernetes ConfigMap objects. Make sure that dynamic dashboards is enabled in the chart’s values.yaml:

grafana:
  sidecar:
    dashboards:
      enabled: true
      label: grafana_dashboard

Now let’s look at the relevant section of the Grafana deployment:

containers:
  - name: grafana-sc-dashboard
    image: "kiwigrid/k8s-sidecar:0.0.11"
    imagePullPolicy: IfNotPresent
    env:
      - name: LABEL
        value: "grafana_dashboard"
      - name: FOLDER
        value: "/tmp/dashboards"
    volumeMounts:
      - name: sc-dashboard-volume
        mountPath: "/tmp/dashboards"

There are a few changes from the sidecar configuration used for datasources. Firstly, we are not using an initContainer, but defining another container that will run indefinitely alongside the actual Grafana application container. The reason for this is that unlike datasources, Grafana will check for new dashboards regularly, so our sidecar is constantly polling for new ConfigMaps and writing them to the /tmp/dashboards directory. In order to make the sidecar behave more like a daemon in this regard (rather than a one-shot process as before), we omit the METHOD environment variable from it’s definition. A quick look at the code shows how this effects the behaviour:

# ...
k8s_method = os.getenv("METHOD")
if k8s_method == "LIST":
    listConfigmaps(label, targetFolder, url, method, payload, namespace, folderAnnotation)
else:
    while True:
        try:
            watchForChanges(label, targetFolder, url, method, payload, namespace, folderAnnotation)
# ...

Secondly we are changing the LABEL that should be watched (grafana_dashboard instead of grafana_datasource). Finally we are specifying the dashboard directory (/tmp/dashboards) rather than the provisioning/datasources directory.

The chart / operator includes a number of pre-built dashboards that can be found in the templates/grafana/dashboards directory. Here’s a snipper from the nodes.yaml dashboard:

# Source: prometheus-operator/templates/grafana/dashboards/nodes.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-operator-nodes
  labels:
    grafana_dashboard: "1"
    app: prometheus-operator-grafana
    chart: prometheus-operator-4.3.1
    release: "prometheus-operator"
    heritage: "Tiller"
data:
  nodes.json: |-
    {
        "__inputs": [

        ],
        "__requires": [

        ],
        "annotations": {
            "list": [

            ]
        },
        "editable": false,
# ...

A quick look at the logs for the grafana-sc-dashboard container shows the sidecar doing it’s work:

Starting config map collector
Config for cluster api loaded...
Working on configmap monitoring/prometheus-operator-statefulset
Configmap with label found
File in configmap statefulset.json ADDED
Working on configmap monitoring/prometheus-operator-grafana-datasource
Working on configmap monitoring/prometheus-operator-k8s-resources-namespace
Configmap with label found
File in configmap k8s-resources-namespace.json ADDED
Working on configmap monitoring/prometheus-operator-grafana
Working on configmap monitoring/prometheus-operator-grafana-config-dashboards
Working on configmap monitoring/prometheus-operator-k8s-coredns
Configmap with label found
File in configmap k8s-coredns.json ADDED
Working on configmap monitoring/prometheus-operator-k8s-resources-cluster
Configmap with label found
File in configmap k8s-resources-cluster.json ADDED

I hope this has been a useful introduction to how dynamic configuration discovery works in Grafana for datasources and dashboards. Feel free to share using the button below and contact me on Twitter if you have questions or comments on this post or suggestions for future posts!