Docker 환경에서 Grafana, Prometheus 적용시키기

Docker를 기반으로 진행한 프로젝트에서 모니터링 용도의 Grafana와 Prometheus를 적용시키게 되었다.

이에 적용과정을 정리한다. 

전제 조건

Docker-compose로 관리되고 있는 Docker환경에서 백엔드는 Django로, 프론트는 React, DB는 Docker 위에서 MySQL로 돌아가고 있는 상태이다. 모든 설정 파일은 .yml로 관리되고 있다. 각 디렉토리마다 DockerFile이 존재한다.

파일 구조

📦 your-repository
├─ .gitignore
├─ docker-compose.yml
├─ django
│  └─ src
├─ prometheus
│  ├─ data
│  ├─ alert.rules
│  └─ prometheus.yml
├─ grafana
│  ├─ data
│  └─ config.monitoring
├─ ai
├─ react
│  └─ src
├─ mysql
│  └─ db
├─ webserver(nginx)
│  └─ nginx.proxy.conf
├─ alertmanager
│  └─ config.yaml
└─ README.md

메인 docker-compose.yml

최상위 디렉토리에 docker를 관리하는 docker-compose.yml을 세팅하였다. 세팅은 아래와 같다.

version: '3'

networks:
  app-tier:
    driver: bridge
  grafana-tier:
    driver: bridge

services:
   mysql_db:
     image: mysql:latest
     hostname: mysql_db
     command:
       - --character-set-server=utf8mb4
       - --collation-server=utf8mb4_unicode_ci
     restart: always
     volumes:
       - ./mysql/db:/var/lib/mysql
     ports:
       - "33061:3306"
     cap_add:
       - SYS_NICE
     environment:
       MYSQL_DATABASE: maskon
       MYSQL_USER: mask
       MYSQL_PASSWORD: root
       MYSQL_ROOT_PASSWORD: rootpassword
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus/:/etc/prometheus/
      - ./prometheus/data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/consoles'
    ports:
      - 9090:9090
    links:
      - cadvisor:cadvisor
      - alertmanager:alertmanager
    depends_on:
      - cadvisor
    networks:
      - app-tier
    restart: always

  node-exporter:
      image: prom/node-exporter
      volumes:
        - /proc:/host/proc:ro
        - /sys:/host/sys:ro
        - /:/rootfs:ro
      command:
        - '--path.procfs=/host/proc'
        - '--path.sysfs=/host/sys'
        - --collector.filesystem.ignored-mount-points
        - "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"
      ports:
        - 9100:9100
      restart: always
      networks:
        - app-tier
      deploy:
        mode: global

  alertmanager:
      image: prom/alertmanager
      ports:
        - 9093:9093
      volumes:
        - ./alertmanager/:/etc/alertmanager/
      networks:
        - app-tier
      restart: always
      command:
        - '--config.file=/etc/alertmanager/config.yml'
        - '--storage.path=/alertmanager'

  cadvisor:
      image: google/cadvisor
      volumes:
        - /:/rootfs:ro
        - /var/run:/var/run:rw
        - /sys:/sys:ro
        - /var/lib/docker/:/var/lib/docker:ro
      ports:
        - 8080:8080
      networks:
        - app-tier
      restart: always
      deploy:
        mode: global
        restart_policy:
          condition: on-failure

  grafana:
      image: grafana/grafana
      user: "472"
      depends_on:
        - prometheus
      ports:
        - 3000:3000
      volumes:
        - ./grafana/data:/var/lib/grafana
        - ./grafana/provisioning/:/etc/grafana/provisioning/
      env_file:
        - ./grafana/config.monitoring
      networks:
        - app-tier
        - grafana-tier
      restart: always
  backend:
    container_name: maskon_web
    build: ./back/django
    command: gunicorn app.wsgi:application --bind 0.0.0.0:8000
    volumes:
      - ./back/django:/code
    networks:
      - app-tier
    expose:
      - "8000"
    ports:
      - "8000:8000"
    links:
       - mysql_db:mysql_db
    depends_on:
       - mysql_db

  migration:
      build: ./back/django
      image: app
      command: sh -c "python manage.py migrate"
      volumes:
          - ./back/django:/code
      links:
           - mysql_db
      depends_on:
          - make_migrations
  
  make_migrations:
      build: ./back/django
      image: app
      command: sh -c "python manage.py makemigrations"
      volumes:
          - ./back/django:/code
      links:
          - mysql_db
      depends_on:
          - mysql_db
  nginx:
    image: nginx:latest
    ports:
      - 80:80
    volumes:
      - ./webserver/nginx-proxy.conf:/etc/nginx/conf.d/default.conf:ro
      - ./front/kakao/build:/var/www/frontend
    networks:
      - app-tier
    depends_on:
      - backend
volumes:
  build_folder:

프로메테우스가 서버 리소스 데이터를 수집하고, 수집한 데이터를 그라파나를 통해 시각화하여 대시보드로 보여주는 것이다.

 

프로메테우스 설정

프로메테우스는 prometheus.yml로 설정을 한다.

프로메테우스에는 데이터를 수집하기 위한 node-exporter와 cadvisor,  서버 이슈가 발생시 알림을 전송하는 alertmanager가 딸려온다.  prometheus.yml 설정은 다음과 같다.

# my global config
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'maskon_web'

# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
  - 'alert.rules'
  # - "first.rules"
  # - "second.rules"

# alert
alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets:
      - "alertmanager:9093"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 10s

    static_configs:
         - targets: ['localhost:9090']


  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

  - job_name: 'node-exporter'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'backend'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    metrics_path: /metrics
#    static_configs:
#      - targets: ['nginx:80']

    dns_sd_configs:
      - names: ['maskon_web']
        port: 8000
        type: A
        refresh_interval: 5s

알림을 전송하기 위해서는 추가로 alert.rules로 알림 세팅을 해야한다.

groups:
- name: example
  rules:

  # Alert for any instance that is unreachable for >2 minutes.
  - alert: service_down
    expr: up == 0
    for: 2m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 2 minutes."

  - alert: high_load
    expr: node_load1 > 0.5
    for: 2m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} under high load"
      description: "{{ $labels.instance }} of job {{ $labels.job }} is under high load."

 

 

 

위와 같이 세팅하고 terminal에 docker-compose up을 입력하면 정상적으로 작동한다.