kube-Prometheus-stack

kube-prometheus-stack

Prometheus Deployment & Operations Guide

1. Overview

kube-prometheus-stack delivers production-ready Kubernetes monitoring using Prometheus Operator.

It provides:

Cluster metrics collection
Alerting and rule management
Pre-built Grafana dashboards
Node and workload monitoring

This Helm chart replaces the former prometheus-operator chart and reflects the full monitoring stack.

2. What Is Included

By default, the stack deploys:

Prometheus Operator
Prometheus
Alertmanager
Grafana
kube-state-metrics
node-exporter

Not included:

Prometheus Adapter
Blackbox Exporter

3. Architecture Summary

The monitoring flow works like this:

Node Exporters → Prometheus → Alertmanager → Grafana

Exporters collect metrics.
Prometheus scrapes and stores metrics.
Alertmanager handles alerts.
Grafana visualizes data.

Think of it as a health monitoring system for your cluster.

4. Prerequisites

Requirement	Version
Kubernetes	1.19+
Helm	3+

5. Installation

Install via OCI Registry (Recommended)

CODE

helm install <release-name> \
  oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

View Configurable Values

CODE

helm show values oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack

6. High Availability (HA)

For production environments, enable multiple Prometheus replicas.

Example configuration:

CODE

prometheus:
  prometheusSpec:
    replicas: 2
    podAntiAffinity: "hard"
    externalLabels:
      cluster: prod-cluster

Important:

Always use anti-affinity.
Do not remove replica external labels.
For global query deduplication, use Thanos.

HA improves uptime but does not automatically deduplicate samples.

7. Grafana Dashboards

Pre-configured dashboards are automatically deployed.
Loaded via Kubernetes ConfigMaps.
Sourced from upstream Prometheus mixins.
Custom dashboards can be added through Helm values.

8. Upgrading

CODE

helm upgrade <release-name> <chart>

Note:

Helm v3 does not automatically upgrade CRDs.
Major version upgrades may require manual steps.
Review release notes before upgrading.

9. Uninstalling

CODE

helm uninstall <release-name>

CRDs are not removed automatically.

They must be deleted manually if required:

CODE

kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com

10. Running Multiple Instances

You may deploy multiple Prometheus instances in one cluster.

Only one Prometheus Operator should run.

Disable shared components for secondary releases:

CODE

kubeStateMetrics.enabled: false
nodeExporter.enabled: false
grafana.enabled: false

11. Private Cluster Considerations

In private clusters (e.g., private GKE):

Webhooks may not be reachable by the control plane.
Add appropriate firewall rules.

Or disable admission webhooks:

CODE

prometheusOperator:
  admissionWebhooks:
    enabled: false

12. Persistent Volume Migration

To migrate without losing metrics:

Patch PV reclaim policy to Retain.
Delete PVC.
Remove claimRef from PV.
Reinstall stack with matching:
- Storage size
- Access mode
- Storage class
- Availability zone

All values must match exactly for successful re-binding.

13. ServiceMonitor & PodMonitor Discovery

By default, Prometheus discovers monitors:

In its namespace
Matching its release label

To discover all monitors in the namespace:

CODE

prometheus:
  prometheusSpec:
    podMonitorSelectorNilUsesHelmValues: false
    serviceMonitorSelectorNilUsesHelmValues: false

14. Known Operational Considerations

kube-proxy Metrics

Default bind address:

CODE

127.0.0.1:10249

To enable scraping:

CODE

0.0.0.0:10249

Update via:

CODE

kubectl -n kube-system edit cm kube-proxy

15. Migration from Older Charts

Zero downtime migration is possible from stable/prometheus-operator using:

CODE

helm upgrade prometheus-operator \
  prometheus-community/kube-prometheus-stack \
  --reuse-values \
  --set nameOverride=prometheus-operator

For full renaming, follow the persistent volume migration process.