개발/DevOps
Docker 환경에서 Grafana, Prometheus 적용시키기
8시20분
2021. 9. 18. 08:20
Docker를 기반으로 진행한 프로젝트에서 모니터링 용도의 Grafana와 Prometheus를 적용시키게 되었다.
이에 적용과정을 정리한다.
전제 조건
Docker-compose로 관리되고 있는 Docker환경에서 백엔드는 Django로, 프론트는 React, DB는 Docker 위에서 MySQL로 돌아가고 있는 상태이다. 모든 설정 파일은 .yml로 관리되고 있다. 각 디렉토리마다 DockerFile이 존재한다.
파일 구조
📦 your-repository
├─ .gitignore
├─ docker-compose.yml
├─ django
│ └─ src
├─ prometheus
│ ├─ data
│ ├─ alert.rules
│ └─ prometheus.yml
├─ grafana
│ ├─ data
│ └─ config.monitoring
├─ ai
├─ react
│ └─ src
├─ mysql
│ └─ db
├─ webserver(nginx)
│ └─ nginx.proxy.conf
├─ alertmanager
│ └─ config.yaml
└─ README.md
메인 docker-compose.yml
최상위 디렉토리에 docker를 관리하는 docker-compose.yml을 세팅하였다. 세팅은 아래와 같다.
version: '3'
networks:
app-tier:
driver: bridge
grafana-tier:
driver: bridge
services:
mysql_db:
image: mysql:latest
hostname: mysql_db
command:
- --character-set-server=utf8mb4
- --collation-server=utf8mb4_unicode_ci
restart: always
volumes:
- ./mysql/db:/var/lib/mysql
ports:
- "33061:3306"
cap_add:
- SYS_NICE
environment:
MYSQL_DATABASE: maskon
MYSQL_USER: mask
MYSQL_PASSWORD: root
MYSQL_ROOT_PASSWORD: rootpassword
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus/:/etc/prometheus/
- ./prometheus/data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
ports:
- 9090:9090
links:
- cadvisor:cadvisor
- alertmanager:alertmanager
depends_on:
- cadvisor
networks:
- app-tier
restart: always
node-exporter:
image: prom/node-exporter
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- --collector.filesystem.ignored-mount-points
- "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"
ports:
- 9100:9100
restart: always
networks:
- app-tier
deploy:
mode: global
alertmanager:
image: prom/alertmanager
ports:
- 9093:9093
volumes:
- ./alertmanager/:/etc/alertmanager/
networks:
- app-tier
restart: always
command:
- '--config.file=/etc/alertmanager/config.yml'
- '--storage.path=/alertmanager'
cadvisor:
image: google/cadvisor
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- 8080:8080
networks:
- app-tier
restart: always
deploy:
mode: global
restart_policy:
condition: on-failure
grafana:
image: grafana/grafana
user: "472"
depends_on:
- prometheus
ports:
- 3000:3000
volumes:
- ./grafana/data:/var/lib/grafana
- ./grafana/provisioning/:/etc/grafana/provisioning/
env_file:
- ./grafana/config.monitoring
networks:
- app-tier
- grafana-tier
restart: always
backend:
container_name: maskon_web
build: ./back/django
command: gunicorn app.wsgi:application --bind 0.0.0.0:8000
volumes:
- ./back/django:/code
networks:
- app-tier
expose:
- "8000"
ports:
- "8000:8000"
links:
- mysql_db:mysql_db
depends_on:
- mysql_db
migration:
build: ./back/django
image: app
command: sh -c "python manage.py migrate"
volumes:
- ./back/django:/code
links:
- mysql_db
depends_on:
- make_migrations
make_migrations:
build: ./back/django
image: app
command: sh -c "python manage.py makemigrations"
volumes:
- ./back/django:/code
links:
- mysql_db
depends_on:
- mysql_db
nginx:
image: nginx:latest
ports:
- 80:80
volumes:
- ./webserver/nginx-proxy.conf:/etc/nginx/conf.d/default.conf:ro
- ./front/kakao/build:/var/www/frontend
networks:
- app-tier
depends_on:
- backend
volumes:
build_folder:
프로메테우스가 서버 리소스 데이터를 수집하고, 수집한 데이터를 그라파나를 통해 시각화하여 대시보드로 보여주는 것이다.
프로메테우스 설정
프로메테우스는 prometheus.yml로 설정을 한다.
프로메테우스에는 데이터를 수집하기 위한 node-exporter와 cadvisor, 서버 이슈가 발생시 알림을 전송하는 alertmanager가 딸려온다. prometheus.yml 설정은 다음과 같다.
# my global config
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
evaluation_interval: 15s # By default, scrape targets every 15 seconds.
# scrape_timeout is set to the global default (10s).
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'maskon_web'
# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
- 'alert.rules'
# - "first.rules"
# - "second.rules"
# alert
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "alertmanager:9093"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 10s
static_configs:
- targets: ['localhost:9090']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'node-exporter'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'backend'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
metrics_path: /metrics
# static_configs:
# - targets: ['nginx:80']
dns_sd_configs:
- names: ['maskon_web']
port: 8000
type: A
refresh_interval: 5s
알림을 전송하기 위해서는 추가로 alert.rules로 알림 세팅을 해야한다.
groups:
- name: example
rules:
# Alert for any instance that is unreachable for >2 minutes.
- alert: service_down
expr: up == 0
for: 2m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 2 minutes."
- alert: high_load
expr: node_load1 > 0.5
for: 2m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} under high load"
description: "{{ $labels.instance }} of job {{ $labels.job }} is under high load."
위와 같이 세팅하고 terminal에 docker-compose up을 입력하면 정상적으로 작동한다.