kube-Prometheus-stack
kube-prometheus-stack
Prometheus Deployment & Operations Guide
1. Overview
kube-prometheus-stack delivers production-ready Kubernetes monitoring using Prometheus Operator.
It provides:
Cluster metrics collection
Alerting and rule management
Pre-built Grafana dashboards
Node and workload monitoring
This Helm chart replaces the former prometheus-operator chart and reflects the full monitoring stack.
2. What Is Included
By default, the stack deploys:
Prometheus Operator
Prometheus
Alertmanager
Grafana
kube-state-metrics
node-exporter
Not included:
Prometheus Adapter
Blackbox Exporter
3. Architecture Summary
The monitoring flow works like this:
Node Exporters → Prometheus → Alertmanager → Grafana
Exporters collect metrics.
Prometheus scrapes and stores metrics.
Alertmanager handles alerts.
Grafana visualizes data.
Think of it as a health monitoring system for your cluster.
4. Prerequisites
Requirement | Version |
|---|---|
Kubernetes | 1.19+ |
Helm | 3+ |
5. Installation
Install via OCI Registry (Recommended)
helm install <release-name> \
oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack \
--namespace monitoring \
--create-namespace
View Configurable Values
helm show values oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack
6. High Availability (HA)
For production environments, enable multiple Prometheus replicas.
Example configuration:
prometheus:
prometheusSpec:
replicas: 2
podAntiAffinity: "hard"
externalLabels:
cluster: prod-cluster
Important:
Always use anti-affinity.
Do not remove replica external labels.
For global query deduplication, use Thanos.
HA improves uptime but does not automatically deduplicate samples.
7. Grafana Dashboards
Pre-configured dashboards are automatically deployed.
Loaded via Kubernetes ConfigMaps.
Sourced from upstream Prometheus mixins.
Custom dashboards can be added through Helm values.
8. Upgrading
helm upgrade <release-name> <chart>
Note:
Helm v3 does not automatically upgrade CRDs.
Major version upgrades may require manual steps.
Review release notes before upgrading.
9. Uninstalling
helm uninstall <release-name>
CRDs are not removed automatically.
They must be deleted manually if required:
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
10. Running Multiple Instances
You may deploy multiple Prometheus instances in one cluster.
Only one Prometheus Operator should run.
Disable shared components for secondary releases:
kubeStateMetrics.enabled: false
nodeExporter.enabled: false
grafana.enabled: false
11. Private Cluster Considerations
In private clusters (e.g., private GKE):
Webhooks may not be reachable by the control plane.
Add appropriate firewall rules.
Or disable admission webhooks:
prometheusOperator:
admissionWebhooks:
enabled: false
12. Persistent Volume Migration
To migrate without losing metrics:
Patch PV reclaim policy to
Retain.Delete PVC.
Remove claimRef from PV.
Reinstall stack with matching:
Storage size
Access mode
Storage class
Availability zone
All values must match exactly for successful re-binding.
13. ServiceMonitor & PodMonitor Discovery
By default, Prometheus discovers monitors:
In its namespace
Matching its release label
To discover all monitors in the namespace:
prometheus:
prometheusSpec:
podMonitorSelectorNilUsesHelmValues: false
serviceMonitorSelectorNilUsesHelmValues: false
14. Known Operational Considerations
kube-proxy Metrics
Default bind address:
127.0.0.1:10249
To enable scraping:
0.0.0.0:10249
Update via:
kubectl -n kube-system edit cm kube-proxy
15. Migration from Older Charts
Zero downtime migration is possible from stable/prometheus-operator using:
helm upgrade prometheus-operator \
prometheus-community/kube-prometheus-stack \
--reuse-values \
--set nameOverride=prometheus-operator
For full renaming, follow the persistent volume migration process.