Monitoring Open Policy Agent/Gatekeeper violations with kube-state-metrics

Jun 19, 2023·
Manuel Rüger
Manuel Rüger
· 7 min read
Image credit: Unsplash

This article explains how kube-state-metrics can be used to monitor Open Policy Agent’s Gatekeeper policy audit violations, what to take care about and where kube-state-metrics’ Custom Resource state might evolve to.

Table of Contents

Introduction

Open Policy Agent is a policy engine that allows you to enforce policies on various systems. With Gatekeeper, it provides a service, that utilizes an admission controller webhook to enforce policies on Kubernetes resources. This means, whenever a Kubernetes resource changes through a request to the Kubernetes API, a rule-based decision to allow or deny this action is made. To organize and manage these rules, Gatekeeper makes use of Kubernetes’ Custom Resource Definitions (CRDs). Gatekeeper does not only allow checking against its policies at admission. It also provides an audit service, that can regularly check resources in the Kubernetes cluster.

Kube-state-metrics is a microservice that makes the state of the Kubernetes’ REST API available as Prometheus metrics. It queries the API and exposes the gathered information on a metrics HTTP endpoint. Recently, it gained support for Custom Resource State. This is a flexible way to query and extract user-defined information from the Kubernetes API.

Exposing Violations as Prometheus Metrics

Custom Resource Definitions in Gatekeeper

Gatekeeper uses two custom resource definitions to manage policies. A ConstraintTemplate (group: templates.gatekeeper.sh), that contains the rule definition. A corresponding constraint (group: constraints.gatekeeper.sh) allows setting enforcement actions and stores information from the audit.

Here is an example of how this looks like according to the docs from Open Policy Agent:

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
  ...
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels

        violation[{"msg": msg, "details": {"missing_labels": missing}}] {
          provided := {label | input.review.object.metadata.labels[label]}
          required := {label | label := input.parameters.labels[_]}
          missing := required - provided
          count(missing) > 0
          msg := sprintf("you must provide labels: %v", [missing])
        }

This policy checks for specific metadata labels in the object. If the labels are not included in the object, it will return a violation.

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: namespaces-must-have-gatekeeper-label
spec:
  enforcementAction: warn
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Namespace"]
  parameters:
    # Note that "labels" is now contained in an array item, rather than an object key under "parameters"
    - labels: ["gatekeeper"]

This constraint will limit the policy enforcement to Kubernetes namespace objects and require a label with the key gatekeeper to be set. Its enforcement action is set to warn. This means it will omit a warning when applying a resource that violates the policy.

If we run

kubectl get k8srequiredlabels.constraints.gatekeeper.sh/v1 namespaces-must-have-gatekeeper-label -o yaml

we receive this output:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: namespaces-must-have-gatekeeper-label
  ...
status:
  totalViolations: 1
  violations:
    - enforcementAction: warn
      kind: Namespace
      message: 'you must provide labels: ["gatekeeper"]'
      name: kube-system
    ...

If you enable Gatekeeper to audit your objects regularly, this object will include besides the total number of violations, also a report from the latest audit run in the status key. Unfortunately, Gatekeeper does not expose this level of detail via a Prometheus exporter itself. There have been separate specialized exporters like opa-scorecard built in the past. In the next paragraph, we will see how kube-state-metrics, as a generic exporter, can provide the same level of information.

Custom Resource State in kube-state-metrics

If you run kube-state-metrics already in your cluster, you can now generate metrics on policy violations with the Custom Resource State. And if you don’t run kube-state-metrics, this might be a good opportunity to install it on your cluster to get better insights.

Before we can expose Custom Resource State metrics, we need to define them via a configuration. kube-state-metrics supports the configuration of Custom Resource State via command line arguments or a config file. In this example, we assume you have a config file, so you can include the following snippets in it.

Add the following paragraph to your kube-state-metrics’ Custom Resource State configuration:

kind: CustomResourceStateMetrics
spec:
  resources:
    - groupVersionKind:
        group: constraints.gatekeeper.sh
        kind: "K8sRequiredLabels"
        version: "v1beta1"
      metrics:
        - name: "gatekeeper_violations_total"
          help: "Number of violations"
          each:
            type: Gauge
            gauge:
              path: [status, totalViolations]

        - name: "gatekeeper_violation_info"
          help: "Information about the detected violation"
          each:
            type: Info
            info:
              path: [status, violations]
              labelsFromPath:
                enforcement_action: [enforcementAction]
                violating_kind: [kind]
                violating_message: [message]
                violating_name: [name]
                violating_namespace: [namespace]

and the following metric series will be exported:

# HELP kube_customresource_gatekeeper_violations_total Number of violations
# TYPE kube_customresource_gatekeeper_violations_total gauge
kube_customresource_gatekeeper_violations_total{customresource_group="constraints.gatekeeper.sh",customresource_kind="k8srequiredlabels",customresource_version="v1beta1"} 35
# HELP kube_customresource_gatekeeper_violation Violations detected
# TYPE kube_customresource_gatekeeper_violation gauge
kube_customresource_gatekeeper_violation{customresource_group="constraints.gatekeeper.sh",customresource_kind="Namespace",customresource_version="v1",enforcementAction="warn",violating_message="you must provide labels: [\"gatekeeper\"]",violating_kind="Namespace", violating_name="kube-system"} 1
...

These metric series can now be ingested into Prometheus or a similar TSDB and be used for alerting or visualizing violations on a dashboard.

Collecting metrics from all constraints via wildcard matching

With kube-state-metrics v2.9.2 and later, another helpful feature around Custom Resource State got included: kube-state-metrics supports wildcards for the kind as well as version keys now. This allows to collect every violation from every constraint CRD created by gatekeeper with the following configuration:

kind: CustomResourceStateMetrics
spec:
  resources:
    - groupVersionKind:
        group: "constraints.gatekeeper.sh"
        kind: "*"
        version: "v1beta1"
      metrics:
        - name: "gatekeeper_violations_total"
          help: "Number of violations"
          each:
            type: Gauge
            gauge:
              path: [status, totalViolations]
        - name: "gatekeeper_violation_info"
          help: "Information about the detected violation"
          each:
            type: Info
            info:
              path: [status, violations]
              labelsFromPath:
                enforcement_action: [enforcementAction]
                violating_kind: [kind]
                violating_message: [message]
                violating_name: [name]
                violating_namespace: [namespace]

As you might have already spotted, this time the namespace is included in the labelsFromPath map. Objects violating policy can be namespaced or non-namespaced. If the specified path - in this case, the namespace key - does not exist, no label will be added to the metric series. If the path exists, you will see the label appear in the metric series.

Add a new Constraint to your cluster. This time a “team” label on Deployments is desired:

apiVersion: constraints.gatekeeper.sh/v1
kind: K8sRequiredLabels
metadata:
  name: deployments-must-have-team-label
spec:
  enforcementAction: warn
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Deployment"]
  parameters:
    # Note that "labels" is now contained in an array item, rather than an object key under "parameters"
    - labels: ["team"]

This will generate the following metrics:

# HELP kube_customresource_gatekeeper_violations_total Number of violations
# TYPE kube_customresource_gatekeeper_violations_total gauge
kube_customresource_gatekeeper_violations_total{customresource_group="constraints.gatekeeper.sh",customresource_kind="k8srequiredlabels",customresource_version="v1", name="namespaces-must-have-gatekeeper-label"} 35
kube_customresource_gatekeeper_violations_total{customresource_group="constraints.gatekeeper.sh",customresource_kind="k8srequiredlabels",customresource_version="v1", name="deployments-must-have-team-label"} 39
# HELP kube_customresource_gatekeeper_violation Violations detected
# TYPE kube_customresource_gatekeeper_violation info
kube_customresource_gatekeeper_violation{customresource_group="constraints.gatekeeper.sh",customresource_kind="k8srequiredlabels",customresource_version="v1",enforcementAction="warn",violating_message="you must provide labels: [\"gatekeeper\"]",violating_kind="Namespace",violating_name="kube-system"} 1
kube_customresource_gatekeeper_violation{customresource_group="constraints.gatekeeper.sh",customresource_kind="k8srequiredlabels",customresource_version="v1",enforcementAction="warn",violating_message="you must provide labels: [\"team\"]",violating_kind="Deployment",violating_name="coredns",violating_namespace="kube-system} 1
...

Caveats with this approach

When using custom resource state metrics, there are a couple of things to keep in mind, as they might cause issues on your cluster or confusion for consumers.

Unexposed Violations

Gatekeeper has a flag called --constraint-violations-limit which limits the number of violations added to the Constraint custom resource. You might need to increase it to get more data on violations. Be aware that Kubernetes limits how big a custom resource object can grow. This does not affect the total count of violations, which is always showing the correct count.

High Cardinality Data

With this approach, there is a chance to feed data with high cardinality into Prometheus. This means, if you decide to expose values like Pod IDs or other values that change a lot, Prometheus’ database will grow in space and queries might take longer.

Kubernetes Resource Version Upgrades

Using a wildcard for the version of the resource you want to monitor does not provide a useful benefit as Kubernetes automatically upgrades the version before exposing it on its API.

Summary

To sum it all up, in this example we have seen that Custom Resource State is a powerful tool to collect metrics from Custom Resources and its configuration offers a great flexibility to the user. We can define individual metric series to be extracted from the Kubernetes API and customize metric series labels to meet specific needs as well as apply it over multiple kinds and different versions of the same CRD group.

If you are interested in more Custom Resource State configurations, I have started a repository to collect more here.

Finally, I want to thank Garrybest, iamnoah, chrischdi, rexagod and CatherineF-dev for their work on the implementation in kube-state-metrics.

Manuel Rüger
Authors
Engineer - interested in Systems, Platforms and Architecture