Ref
https://prometheus.io/docs/guides/cadvisor/
https://prometheus.io/docs/guides/node-exporter/
cAdvisor: metric agent for docker swarm cluster
Node_exporter: metric agent for linux host
Prometheus
Server for collecting metric from each agents
config
configure prometheus scraping jobs
prometheus.yml
...
- job_name: 'dsg-container'
scrape_interval: 60s
static_configs:
- targets: ['192.168.0.2:8080', '192.168.0.3:8080']
- job_name: 'dsg-node_exporter'
scrape_interval: 60s
static_configs:
- targets: ['192.168.0.2:9100', '192.168.0.3:9100']
Docker swarm
Deploy metric agent on the cluster
cAdvisor
cadvisor port number: 8080
# docker command should be executed on the manager node
# docker command for deploy cadvisor container on each nodes
docker service create --name cadvisor --mode=global --publish target=8080,mode=host --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock,ro --mount type=bind,src=/,dst=/rootfs,ro --mount type=bind,src=/var/run,dst=/var/run --mount type=bind,src=/sys,dst=/sys,ro --mount type=bind,src=/var/lib/docker,dst=/var/lib/docker,ro google/cadvisor -docker_only
Node_exporter
node exporter port number: 9100
# docker command for deploy node_exporter on each node
docker service create --name node_exporter --mode=global --publish 9100:9100 --mount type=bind,src=/,dst=/host,ro,bind-propagation=rslave quay.io/prometheus/node-exporter --path.rootfs=/host
Check each services are correctly running on each node
# docker service ls |egrep 'node_exporter|cadvisor'
[CONTAINERID] cadvisor global 2/2 google/cadvisor:latest *:8080->8080/tcp
[CONTAINERID] node_exporter global 2/2 quay.io/prometheus/node-exporter:latest *:9100->9100/tcp
Also can check with metric URLs on the web browser
- http://192.168.0.2:8080/metrics
- http://192.168.0.3:9100/metrics
Check on prometheus
from the gui, Status > Targets can see the scraping jobs you configured before
Grafana
Create or import new dashboard
import existing dashboard from community (Grafana Labs. https://grafana.com/grafana/dashboards?pg=dashboards&plcmt=featured-sub1)
Sample query
Sample queries for monitoring docker swarm cluster
Docker node count
count(cadvisor_version_info)
System load on each node
$instance: grafana variable that you can configure dashboard settings with query(label_values(instance))
node_load5{instance=~"$instance"}
Available memory on node
node_memory_MemAvailable_bytes{instance=~"$instance"}
Memory usage per container
label_replace(topk($topk, sum(container_memory_usage_bytes{container_label_com_docker_stack_namespace=~".+",container_label_com_docker_swarm_service_name =~"$service_name",container_label_com_docker_swarm_node_id=~"$node"}) by (name, container_label_com_docker_swarm_task_name)), "task_name", "$1", "container_label_com_docker_swarm_task_name", "(.*\\.[0-9]*).*\\..*")
'Devops' 카테고리의 다른 글
LXD: Copy container to remote LXD server (0) | 2021.06.17 |
---|---|
Geofront server with automatic colonize: ssh key management (0) | 2021.06.17 |
Kubernetes: Create Cluster with HA in v1.13 (0) | 2020.05.25 |
Jenkins: Restart the server with URL (0) | 2020.05.25 |
ElasticSearch: Install and configure the Curator (0) | 2020.05.25 |