黄东旭解析 TiDB 的核心优势
536
2024-05-01
prometheus-webhook 是对alertmanager 告警的一个扩展,支持钉钉,微信,邮件告警和自建告警模板
#下载
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v2.1.0/prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz
#解压
tar -zxvf prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz
[tidb@vm172-16-201-64 prometheus-webhook-dingtalk-2.1.0.linux-amd64]$ ll
总用量 18744
-rw-r--r-- 1 tidb tidb 1299 4月 21 16:20 config.example.yml
drwxr-xr-x 4 tidb tidb 4096 4月 21 16:20 contrib
-rw-r--r-- 1 tidb tidb 11358 4月 21 16:20 LICENSE
-rwxr-xr-x 1 tidb tidb 19172733 4月 21 16:19 prometheus-webhook-dingtalk
[tidb@vm172-16-201-64 prometheus-webhook-dingtalk-2.1.0.linux-amd64]$
more /data/webhook-dingtalk/webhook-dingtalk.sh
#!/bin/bash
set -e
WEBHOOK_BIN=/data/webhook-dingtalk/prometheus-webhook-dingtalk
exec $WEBHOOK_BIN \
--web.listen-address=":8060" \
--config.file="/data/webhook-dingtalk/jms_config.yml" \
--log.level="info" \
--log.format="logfmt" \
--web.enable-lifecycle \
--web.enable-ui \
more /data/webhook-dingtalk_config.yml
## Request timeout
# timeout: 5s
## Uncomment following line in order to write template from scratch (be careful!)
#no_builtin_template: true
## Customizable templates path
#templates:
# - contrib/templates/legacy/template.tmpl
## You can also override default template using `default_message`
## The following example to use the legacy template from v0.3.0
#default_message:
# title: {{ template "legacy.title" . }}
# text: {{ template "legacy.content" . }}
## Targets, previously was known as "profiles"
targets:
webhook1:
# secret for signature
secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
#webhook2:
webhook_legacy:
secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# Customize template content
message:
# Use legacy template
title: {{ template "legacy.title" . }}
text: {{ template "legacy.content" . }}
#webhook_mention_all:
# mention:
# all: true
webhook_mention_users:
secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
mention:
mobiles: [XXXXXXXXXXXX]
more /data/dm-deploy/alertmanager-9093/conf/alertmanager.yml
global:
# The smarthost and SMTP sender used for mail notifications.
smtp_smarthost: "localhost:25"
smtp_from: "alertmanager@example.org"
smtp_auth_username: "alertmanager"
smtp_auth_password: "password"
# smtp_require_tls: true
# The Slack webhook URL.
# slack_api_url:
route:
# A default receiver
receiver: "webhook"
# The labels by which incoming alerts are grouped together. For example,
# multiple alerts coming in for cluster=A and alertname=LatencyHigh would
# be batched into a single group.
group_by: ["env", "instance", "alertname", "type", "group", "job"]
# When a new group of alerts is created by an incoming alert, wait at
# least group_wait to send the initial notification.
# This way ensures that you get multiple alerts for the same group that start
# firing shortly after another are batched together on the first
# notification.
group_wait: 30s
# When the first notification was sent, wait group_interval to send a batch
# of new alerts that started firing for that group.
group_interval: 3m
# If an alert has successfully been sent, wait repeat_interval to
# resend them.
repeat_interval: 3m
routes:
# - match:
# receiver: webhook-kafka-adapter
# continue: true
# - match:
# env: test-cluster
# receiver: db-alert-slack
# - match:
# env: test-cluster
# receiver: db-alert-email
#配置的IP地址就是部署webhook的机器地址
receivers:
- name: webhook
webhook_configs:
- send_resolved: true
url: http://XX.XX.XX.:8060/dingtalk/webhook1/send
#- name: db-alert-slack
# slack_configs:
# - channel: #alerts
# username: db-alert
# icon_emoji: :bell:
# title: {{ .CommonLabels.alertname }}
# text: {{ .CommonAnnotations.summary }} {{ .CommonAnnotations.description }} expr: {{ .CommonLabels.expr }} http://1
72.0.0.1:9093/#/alerts
# - name: "db-alert-email"
# email_configs:
# - send_resolved: true
# to: "example@example.com"
# This doesnt alert anything, please configure your own receiver
#- name: "blackhole"
more /etc/systemd/system/prometheus-webhook.service
[Unit]
Description=prometheus-webhook service
After=syslog.target network.target remote-fs.target nss-lookup.target
[Service]
LimitNOFILE=1000000
LimitSTACK=10485760
User=tidb
ExecStart=/data/webhook-dingtalk/webhook-dingtalk.sh
Restart=always
RestartSec=15s
[Install]
WantedBy=multi-user.target
#启动webhook
sudo systemctl start prometheus-webhook.service
#停止webhook
sudo systemctl stop prometheus-webhook.service
#查看服务状态
sudo systemctl status -l prometheus-webhook.service
tiup clutster stop tidb-test -N x:9093
tiup clutster start tidb-test-N x:9093
#查看启动后状态
tiup clutster display tidb-jms -N x:9093
8、告警展示
[FIRING:1] tidb_tikvclient_backoff_seconds_count
Alerts Firing
TiDB tikvclient_backoff_count error
Description: cluster: tidb-test, instance: xxxx:10081, values:253.33333333333331
Graph:
Details:
alertname: tidb_tikvclient_backoff_seconds_count
cluster: tidb-test
env: tidb-test
expr: increase( tidb_tikvclient_backoff_seconds_count[10m] ) > 10
instance: xxxx:10081
job: tidb
level: warning
monitor: prometheus
type: regionMiss
需要注意的是,TiUP 会使用自己的配置参数覆盖监控组件的配置,如果你直接修改监控组件的配置文件,修改的配置文件可能在对集群进行 deploy/scale-out/scale-in/reload 等操作中被 TiUP 所覆盖,导致配置不生效。
config_file:该字段指定一个本地文件,该文件会在集群配置初始化阶段被传输到目标机器上,作为 Alertmanager 的配置
Plain Text
alertmanager_servers:
- host: 172.16.201.64
ssh_port: 22
web_port: 9093
cluster_port: 9094
deploy_dir: /data1/tidb-deploy/alertmanager-9093
data_dir: /data1/tidb-data/alertmanager-9093
log_dir: /data1/tidb-deploy/alertmanager-9093/log
arch: amd64
os: linux
config_file: /data1/tidb-deploy/alertmanager-9093/conf/alertmanager_test.yml
[tidb@vm172-16-201-64 ~]$ cd /data/tidb-deploy/prometheus-9090/conf/
[tidb@vm172-16-201-64 conf]$ ll
总用量 96
-rw-r--r-- 1 tidb tidb 3500 6月 28 15:34 binlog.rules.yml
-rw-r--r-- 1 tidb tidb 4492 6月 28 15:34 blacker.rules.yml
-rw-r--r-- 1 tidb tidb 37 6月 28 15:34 bypass.rules.yml
-rw-r--r-- 1 tidb tidb 1964 6月 28 15:34 kafka.rules.yml
-rw-r--r-- 1 tidb tidb 459 6月 28 15:34 lightning.rules.yml
-rw-r--r-- 1 tidb tidb 507 6月 28 15:34 ngmonitoring.toml
-rw-r--r-- 1 tidb tidb 5214 6月 28 15:34 node.rules.yml
-rw-r--r-- 1 tidb tidb 7920 6月 28 15:34 pd.rules.yml
-rw-r--r-- 1 tidb tidb 6199 6月 28 15:34 prometheus.yml
-rw-r--r-- 1 tidb tidb 6507 6月 28 15:34 ticdc.rules.yml
-rw-r--r-- 1 tidb tidb 6271 6月 28 15:34 tidb.rules.yml
-rw-r--r-- 1 tidb tidb 3112 6月 28 15:34 tiflash.rules.yml
-rw-r--r-- 1 tidb tidb 4685 6月 28 15:34 tikv.accelerate.rules.yml
-rw-r--r-- 1 tidb tidb 13977 6月 28 15:34 tikv.rules.yml
[tidb@vm172-16-201-64 conf]$
cp tidb.rules.yml tidb.rules.yml_20220628
vi tidb.rules.yml
Plain Text
tiup clutster stop tidb-jms -N x:9090
tiup clutster start tidb-jms -N x:9090
#查看启动后状态
tiup clutster display tidb-jms -N x:9090
https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/alert/alert-manager-inhibit
用户或者管理员可以直接通过Alertmanager的UI临时屏蔽特定的告警通知。通过定义标签的匹配规则(字符串或者正则表达式),如果新的告警通知满足静默规则的设置,则停止向receiver发送通知。
进入Alertmanager UI,点击"New Silence"显示如下内容:
用户可以通过该UI定义新的静默规则的开始时间以及持续时间,通过Matchers部分可以设置多条匹配规则(字符串匹配或者正则匹配)。填写当前静默规则的创建者以及创建原因后,点击"Create"按钮即可。
通过"Preview Alerts"可以查看预览当前匹配规则匹配到的告警信息。静默规则创建成功后,Alertmanager会开始加载该规则并且设置状态为Pending,当规则生效后则进行到Active状态。
活动的静默规则
当静默规则生效以后,从Alertmanager的Alerts页面下用户将不会看到该规则匹配到的告警信息。
告警信息
对于已经生效的规则,用户可以通过手动点击”Expire“按钮使当前规则过期。
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。