limx 2020-06-18
对k8s来说主要监控集群本身和Pod,集群监控主要有集群节点资源的监控,要了解每个节点的资源利用率如何,工作负载如何,这样可以了解集群中是否增加或减少节点。节点数要了解可用的节点有多少,不可用的节点有多少,这样可以对集群的成本做一定的评估。运行的pod的数量将显示可以的节点数是否足够,当某些节点挂掉之后,是否影响集群负载,能撑起整个集群
Pod的监控由这三个节点:kubernetes 指标,自身的指标主要是pod的实例数量和预期的数量,第二点容器的指标,每个pod要知道他的cpu、内存、网络的使用情况,第三点应用程序,主要和业务相关的
监控方案 | 告警 | 特点 | 适用 |
---|---|---|---|
Zabbix | Y | 大量定制工作 | 大部分的互联网 |
open-falcon | Y | 功能模块分解比较细,显得更复杂 | 系统和应用监控 |
cAdvisor+Heapster+InfluxDB+Grafana | Y | 简单易用 | 容器监控 |
cAvisor/exporter+Prometheus+Grafana | Y | 扩展性好 | 容器,应用,主机全方面监控 |
cAdisor是谷歌开源的一个容器监控系统,能采集容器的监控指标和宿主机的监控指标,Heapster这是谷歌开源的,主要收集cAdisor汇总的数据的,因为cAdisor不具有存储的功能只会实时的收集,用cAdisor必须要给他提供一个持久化存储,Heapster将每个节点cAdisor存储到InfluxDB中,cAdisor集成在kubelet中,只要kubelet启用的监控端口,都可以访问cAdisor收集的监控数据
kubelet会暴露一个端口,这个端口就是cAdisor采集数据的监控指标,Heapster是运行在k8s中作为一个Pod,他会从每个节点中收集cAdisor采集的数据,采集完后会存储到InfluxDB数据库中,InfluxDB是一个时序的数据库,非常适合以时间为查询条件的数据,Grafana进行仪表盘的展示
采用Deployment方式,命名空间为kube-system
[ scripts]# cat influxdb.yaml apiVersion: apps/v1 kind: Deployment metadata: name: monitoring-influxdb namespace: kube-system spec: selector: matchLabels: k8s-app: influxdb replicas: 1 template: metadata: labels: task: monitoring k8s-app: influxdb spec: containers: - name: influxdb image: registry.cn-shenzhen.aliyuncs.com/cn-k8s/heapster-influxdb-amd64:v1.5.2 volumeMounts: - mountPath: /data name: influxdb-storage volumes: - name: influxdb-storage emptyDir: {} --- apiVersion: v1 kind: Service metadata: labels: task: monitoring kubernetes.io/cluster-service: ‘true‘ kubernetes.io/name: monitoring-influxdb name: monitoring-influxdb namespace: kube-system spec: ports: - port: 8086 targetPort: 8086 selector: k8s-app: influxdb [ scripts]# kubectl create -f influxdb.yaml deployment.apps/monitoring-influxdb created service/monitoring-influxdb created [ scripts]# kubectl get pods -n kube-system monitoring-influxdb-64f46fdcf-5jk8k NAME READY STATUS RESTARTS AGE monitoring-influxdb-64f46fdcf-5jk8k 1/1 Running 0 18s [ scripts]#
Heapster首先从apiserver获取集群中所有Node的信息,然后通过这些Node上的kubelet获取有用数据,而kubelet本身的数据则是从cAdvisor得到。所有获取到的数据都被推到Heapster配置的后端存储中,并还支持数据的可视化。
由于Heapster需要从apiserver获取数据,所以需要对其进行授权。用户为cluster-admin,集群管理员用户。
[ scripts]# cat heapster.yaml apiVersion: v1 kind: ServiceAccount metadata: name: heapster namespace: kube-system --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: heapster roleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.io subjects: - kind: ServiceAccount name: heapster namespace: kube-system --- apiVersion: apps/v1 kind: Deployment metadata: name: heapster namespace: kube-system spec: selector: matchLabels: k8s-app: heapster replicas: 1 template: metadata: labels: task: monitoring k8s-app: heapster spec: serviceAccountName: heapster containers: - name: heapster image: registry.cn-shenzhen.aliyuncs.com/cn-k8s/heapster-amd64:v1.5.4 imagePullPolicy: IfNotPresent command: - /heapster - --source=kubernetes:https://kubernetes.default - --sink=influxdb:http://monitoring-influxdb:8086 --- apiVersion: v1 kind: Service metadata: labels: task: monitoring kubernetes.io/cluster-service: ‘true‘ kubernetes.io/name: Heapster name: heapster namespace: kube-system spec: ports: - port: 80 targetPort: 8082 selector: k8s-app: heapster [ scripts]# kubectl create -f heapster.yaml serviceaccount/heapster created clusterrolebinding.rbac.authorization.k8s.io/heapster created deployment.apps/heapster created service/heapster created [ scripts]# kubectl get pods -n kube-system heapster-76d7cbbb56-lk27t NAME READY STATUS RESTARTS AGE heapster-76d7cbbb56-lk27t 1/1 Running 0 54s [ scripts]#
[ scripts]# cat grafana.yaml apiVersion: apps/v1 kind: Deployment metadata: name: monitoring-grafana namespace: kube-system spec: selector: matchLabels: k8s-app: grafana replicas: 1 template: metadata: labels: task: monitoring k8s-app: grafana spec: containers: - name: grafana image: registry.cn-shenzhen.aliyuncs.com/cn-k8s/heapster-grafana-amd64:v5.0.4 ports: - containerPort: 3000 protocol: TCP volumeMounts: - mountPath: /var name: grafana-storage env: - name: INFLUXDB_HOST value: monitoring-influxdb - name: GF_AUTH_BASIC_ENABLED value: "false" - name: GF_AUTH_ANONYMOUS_ENABLED value: "true" - name: GF_AUTH_ANONYMOUS_ORG_ROLE value: Admin - name: GF_SERVER_ROOT_URL value: / volumes: - name: grafana-storage emptyDir: {} --- apiVersion: v1 kind: Service metadata: labels: kubernetes.io/cluster-service: ‘true‘ kubernetes.io/name: monitoring-grafana name: monitoring-grafana namespace: kube-system spec: type: NodePort ports: - port : 80 targetPort: 3000 selector: k8s-app: grafana [ scripts]# kubectl create -f grafana.yaml deployment.apps/monitoring-grafana created service/monitoring-grafana created [ scripts]# kubectl get pods -n kube-system monitoring-grafana-8546b578df-fbckb NAME READY STATUS RESTARTS AGE monitoring-grafana-8546b578df-fbckb 1/1 Running 0 52s [ scripts]#
[ scripts]# kubectl get pods,svc -n kube-system NAME READY STATUS RESTARTS AGE pod/coredns-6d8cfdd59d-8flfs 1/1 Running 2 47h pod/heapster-76d7cbbb56-lk27t 1/1 Running 0 12m pod/kube-flannel-ds-amd64-2pl7k 1/1 Running 10 7d23h pod/kube-flannel-ds-amd64-8b2rz 1/1 Running 1 30h pod/kube-flannel-ds-amd64-jtwwr 1/1 Running 5 8d pod/monitoring-grafana-8546b578df-fbckb 1/1 Running 0 110s pod/monitoring-influxdb-64f46fdcf-5jk8k 1/1 Running 0 18m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/heapster ClusterIP 10.0.0.190 <none> 80/TCP 12m service/kube-dns ClusterIP 10.0.0.2 <none> 53/UDP,53/TCP 8d service/monitoring-grafana NodePort 10.0.0.81 <none> 80:31920/TCP 110s service/monitoring-influxdb ClusterIP 10.0.0.148 <none> 8086/TCP 18m [ scripts]#
通过31920端口访问: