wenwst 2020-04-21
2019/09/23 Chenxin Wuweiwei
参考资料
https://grafana.com/grafana
https://blog.52itstyle.vip/archives/1984/
https://blog.52itstyle.vip/archives/2014/
https://blog.52itstyle.vip/archives/2029/
https://blog.52itstyle.vip/archives/2049/
https://blog.52itstyle.vip/archives/2059/
https://blog.csdn.net/liuxiao723846/article/details/79627092
https://www.cnblogs.com/txwsqk/p/3974915.html
https://www.cnblogs.com/smallSevens/p/7837361.html
其他Grafana,Zabbix,Nagios,Ganglia,Open-Falcon 方案,请参考<Zabbix监控方案>文档内的说明.
官方简介
The analytics platform for all your metrics
Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture.
使用GO,JS语言开发.有以下特点
可视化:快速和灵活的客户端图形.面板插件.
报警:可视化地为最重要的指标定义警报规则。Grafana将持续评估它们,并发送通知。
通知:警报更改状态时,它会发出通知。接收电子邮件通知。
动态仪表盘:使用模板变量创建动态和可重用的仪表板,这些模板变量作为下拉菜单出现在仪表板顶部。
混合数据源:在同一个图中混合不同的数据源!可以根据每个查询指定数据源。这甚至适用于自定义数据源。
注释:注释来自不同数据源图表。将鼠标悬停在事件上可以显示完整的事件元数据和标记。
过滤器:过滤器允许您动态创建新的键/值过滤器,这些过滤器将自动应用于使用该数据源的所有查询。
可以使用Grafana对接阿里云的日志服务(Grafana是一个日志模板展示的工具,界面的展示需要另一方平台提供日志数据源),Grafana可以在接受到数据源后,新增自己支持的图形界面插件(包括条形图、饼状图、世界地图)来展示日志数据,一个展示界面称为Panel.
以便获取如worldPing的插件支持.
http://grafana.com/profile/api-keys
邮箱
用户名 Chanix
pw: xxx
AK: eyJrI...MwNzl9
Hardware requirements
Grafana does not use a lot of resources and is very lightweight in use of memory and CPU. Minimum recommendation is 255mb of memory and 1 CPU.
下载grafana包
wget https://dl.grafana.com/oss/release/grafana-6.3.6-1.x86_64.rpm
yum localinstall grafana-6.3.6-1.x86_64.rpm
安装完后,官网说明(修改配置文件)
Choose your Configuration Options
The Grafana backend has a number of configuration options defined in its config file (usually located at /etc/grafana/grafana.ini on linux systems).
In this config file you can change things like the default admin password, http port, grafana database (sqlite3, mysql, postgres), authentication options (google, github, ldap, auth proxy) along with many other options.
Start your grafana server. Login with your admin user (default admin/admin). Open side menu (click the Grafana icon in top menu) head to Data Sources and add your data source.
官方配置文件说明 https://grafana.com/docs/installation/configuration/
启停 /etc/rc.d/init.d/grafana-server 将init.d脚本复制到?/etc/init.d/grafana-server 启停脚本.会调用二进制以及多个配置文件和变量定义文件. /usr/sbin/grafana-cli,grafana-server 二进制文件,由启停脚本调用 /usr/lib/systemd/system/grafana-server.service systemd服务 配置 /etc/sysconfig/grafana-server grafana的环境变量,如LOG_DIR,DATA_DIR,CONF_FILE,PLUGINS_DIR,PID_FILE_DIR.被启停脚本调用. /etc/grafana/grafana.ini 主配文件.优先级最高的配置文件,会覆盖之前的配置文件设置.启停脚本调用它. /etc/grafana/ldap.toml 被grafana.ini配置文件引用(ini里默认关闭) 日志 /var/log/grafana/ 日志目录 数据 /var/lib/grafana/ 默认配置指定sqlite3数据库?/var/lib/grafana/grafana.db.升级时,需要提前备份该数据文件. 其他 /usr/share/grafana 含bin,conf,public,scripts,tools文件夹 /usr/share/grafana/public/sass/grafana.dark.scss /usr/share/grafana/public/img/grafana_net_logo.svg /usr/share/grafana/public/fonts/grafana-icons.svg /usr/share/grafana/public/app/plugins/datasource/ 插件数据源 /usr/share/grafana/public/build/grafana.dark.4cb59bbda465c391ae8e.css ...
配置文件修改:
端口,认证方式等.具体略.
systemd方式
systemctl daemon-reload
systemctl start grafana-server
systemctl status grafana-server
开机自启
systemctl enable grafana-server.service
启动后进程与开放端口
/usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-server.pid --packaging=rpm cfg:default.paths.logs=/var/log/grafana cfg:default.paths.data=/var/lib/grafana cfg:default.paths.plugins=/var/lib/grafana/plugins cfg:default.paths.provisioning=/etc/grafana/provisioning tcp6 0 0 :::3000 :::* LISTEN 16549/grafana-serve
登陆提示修改默认密码
admin / admin
admin / Grxxxx.....123456
之后可以对接LDAP,请移步对应章节.
grafana 的权限分为三个等级:Viewer、Editor 和 Admin,Viewer 只能查看 grafana 已经存在的面板而不能编辑,Editor 可以编辑面板,Admin 则拥有全部权限例如添加数据源、添加插件、增加 API KEY。
参考
https://grafana.com/docs/administration/cli/
参考 https://grafana.com/docs/plugins/installation/
官方插件支持grafana-cli plugins install xxx /git/unzip 3种安装方式.
第三方插件支持git/unzip 2种安装方式.
会安装到 /var/lib/grafana/plugins/ 目录.
使用grafana-cli工具安装
获取可用插件列表 grafana-cli plugins list-remote
grafana-cli plugins install raintank-worldping-app 各地到站点的网络情况监测
grafana-cli plugins install grafana-piechart-panel 增加饼状(官方插件)安装
grafana-cli plugins install grafana-clock-panel 钟表形展示
grafana-cli plugins install briangann-gauge-panel
grafana-cli plugins install natel-discrete-panel
grafana-cli plugins install vonage-status-panel
插件卸载
例:grafana-cli plugins uninstall vonage-status-panel
安装和卸载后需要重启grafana才能够生效
数据源定义
Grafana需要提供数据后才能在模板中展示图形,数据的格式大部分是一个json格式,里面包括各种数据的key-value键值对.
阿里云支持两种数据源格式,
1.阿里云官方的 GMS 云监控数据源.
2.阿里云的 LogService云日志数据源.
GMS和LogService相比,使用更简单,但是由于模板固定,当数据源传过来的数据与模板兼容性低的时候,监控界面效果很乱.
LogService使用时因为要书写查询语句,难度相对要高很多,但是监控效果相比GMS要更好,实际使用时可根据两者合并使用来提高监控效果;
coinw使用了以上2个.CMS数据源,在添加监控项的时候仅仅选择已有的监控指标就可以.LogService数据源,通过此数据源来添加监控项的,需要根据阿里的官方文档来写sql的查询语句.
阿里云logservice数据源
参考
阿里云文档:
日志服务>查询与分析>可视化分析>其他可视化方案>对接Grafana 对应网址 https://help.aliyun.com/document_detail/60952.html?spm=a2c4g.11186623.6.887.8e217c69jBQoV6
阿里云此文档有很多地方无法适用(查询放入Grafana后就会卡死).
日志服务 >查询与分析>查询语法与功能>查询语法 对应网址
https://help.aliyun.com/document_detail/29060.html?spm=a2c4g.11186623.6.793.28ec735djvrmG1
使用logservice数据源需要自己手写监控查询的语句,但是监控查询语句较复杂.
1.安装阿里云日志服务插件
cd /var/lib/grafana/plugins/ 插件目录
git clone https://github.com/aliyun/aliyun-log-grafana-datasource-plugin
systemctl restart grafana-server 重启grafana
2.添加Log-Service 数据源
选择左边工具栏里面
Configuration -> datasource -> add datasource
选择Grafana提供的Log-Service模板后,进入配置界面.配置如下.
HTTP URL: http://waf-project-1783799610063532-ap-southeast-1.ap-southeast-1.log.aliyuncs.com
Auth: 保持默认.
log service details的Project名称: waf-project-1783799610063532-ap-southeast-1
logstore名称: waf-logstore
AK信息: LTAIseUbaRxeHLeu ayoI3FUJg891bMRfimV4NVYudOy1th
保存.之后需要创建DashBoard(请移步对应章节).
阿里云CMS数据源
参考
阿里云文档:
云监控 >用户指南 >可视化报表 >对接Grafana 对应网址
https://help.aliyun.com/document_detail/109434.html?spm=5176.10695662.1996646101.searchclickresult.356e2509mNXR10
1.安装CMS插件
cd /var/lib/grafana/plugins/
git clone https://github.com/aliyun/aliyun-cms-grafana.git
systemctl restart grafana-server 重启grafana
2.添加数据源
Name: 随意
HTTP URL: metrics.cn-hongkong.aliyuncs.com
AK: 略
3.增加panel(具体参考对应章节)
至DashBoard里,增加新的panel->选择对应的数据源(CMS的)->增加Query(Project里输入帮助信息里的acs_esc_dashboard(阿里云插件里的)->Metric、period、Y轴和X轴等.
每个ECS可以一个Query,多个ECS,就多个查询.
完成后,数据会展示出来.通过Visualization修改图形展示细节.
创建1个DashBoard
New->DashBoard
配置变量(模板,或者叫"维度"比较合适)
创建模板(配置模板变量),比如根据不同的时间段/域名展示.
添加时间间隔维度: 选择对应的DashBoard>DashBoard Setting>Variables>Edit. General如下 Name: myinterval Type: Interval(下拉选择) Label: time interval Interval Options如下 Values: 1m,10m,30m,1h,6h,12h,1d,7d,14d,30d 添加域名维度: General Name: hostname Type: Custom(下拉选择) Label: Domain-Name Custom Options Values separated by comma: *,coinw.ai,www.coinw.ai,api.coinw.ai,byw.ai,www.byw.ai
模板变量的说明(摘抄网上部分)
配置路径 dashboard-01>Settings>Variables>New
当表格中出现数据后,需要通过筛选条件进行筛选,grafana提供了模板变量用于自定义筛选字段。
Type:定义变量类型
Query:这个变量类型允许您编写一个数据源查询,该查询通常返回一个 metric names, tag values or keys。例如,返回erver names, sensor ids or data centers列表的查询。
interval:interval值。这个变量可以代表时间跨度。不要按时间或日期直方图间隔硬编码一个组,使用这种类型的变量。
Datasource:此类型允许您快速更改整个仪表板的数据源。如果在不同环境中有多个数据源实例,则非常有用。
Custom:使用逗号分隔列表手动定义变量选项。
Constant:定义一个隐藏常数。有用的metric路径前缀的dashboards,你想分享。在dashboard export,期间,常量变量将作为一个重要的选项。
Ad hoc filters:非常特殊类型的变量,目前只对某些数据源,InfluxDB及Elasticsearch。它允许您添加将自动添加到使用指定数据源的所有metric查询的key/value 过滤器。
添加与配置Panel
配置PV(页面访问量),UV(独立访客)
Add Panel,添加panel,分为4步.
Queries(查询语句)->Visualization(图标模板)->General(修改标题)->Alert(报警).
Queries配置.
在Metrics配置中,选择datasource为logservice,输入Query、Y轴和X轴.
Query: $hostname| select approx_distinct(remote_addr) as uv ,count(1) as pv , __time__ - __time__ % $$myinterval as time group by time order by time limit 1000
X轴: time
Y轴: uv,pv
之前创建的"维度"会出现在这个Panel的上方(可以根据不同的维度来变化显示).
panel的Queries配置项解析
Project:项目名称,因是从阿里云日志服务获取,所以固定为ace_ecs_dashboard;
Metric:选择监控的指标,可以选取cpu、memory、network等监控信息;
Period:监控图标刷新时间,一般每隔一分钟刷新一次;
Dimensions:选择监控实例,即选择阿里云中ecs的实例ID;
Group:监控项目组,选了组的类别后,需要在Dimensions 设置显示模板;
Y-column :显示Y轴数据类别,有最大、最小、平均三种;
Query的构建,由Grafana进行,解析后的语句,发给对应的数据源,由数据源完成解析.不同数据源,对应的Query也不同.比如Prometheus就是PromQL.
比如阿里云 log-service,获取uv,pv如下:
$hostname| select approx_distinct(remote_addr) as uv ,count(1) as pv , __time__ - __time__ % $$myinterval as time group by time order by time limit 1000 其中的$hostname由Grafana进行解析.也可以硬编码为阿里云的说明,为"__topic__: waf_access_log"(这个是阿里云waf导入到阿里云日志服务的).
CPU,内存,磁盘空间,IOPS,进出网络流量,负载情况.
这里采用Prometheus的数据源(请参考Prometheus笔记内容).Prometheus几乎可以获取到任何相关数据.
磁盘使用率,数据源为Prometheus,在metric里,需要输入的PromQL语句为(在Queries里输入的时候,会有自动提示):
100-node_filesystem_free_bytes{mountpoint="/"}/node_filesystem_size_bytes{mountpoint="/"}*100
参考: https://songjiayang.gitbooks.io/prometheus/content/promql/summary.html PromQL 基本使用
PromQL (Prometheus Query Language) 是 Prometheus 自己开发的数据查询 DSL 语言,语言表现力非常丰富,内置函数很多,在日常数据可视化以及rule 告警中都会使用到它。
Grafana的报警不是实时的,因为是先获取数据源,根据查询情况,再进行是否报警.时效性可能会差一些.建议采用其他方案的报警.比如云平台自身的报警服务.
开启告警
目前只有Graph支持报警功能,所以我们选择Graph相关图表。
grafana的告警通知渠道有很多种,像Email、Teams、钉钉等都有支持。
在grafana.ini中开启告警:
#################################### Alerting ############################ [alerting] # Disable alerting engine & UI features enabled = true #开启 # Makes it possible to turn off alert rule execution but alerting UI is visible execute_alerts = true #开启 # Default setting for new alert rules. Defaults to categorize error and timeouts as alerting. (alerting, keep_state) ;error_or_timeout = alerting # Default setting for how Grafana handles nodata or null values in alerting. (alerting, no_data, keep_state, ok) ;nodata_or_nullvalues = no_data # Alert notifications can include images, but rendering many images at the same time can overload the server # This limit will protect the server from render overloading and make sure notifications are sent out quickly ;concurrent_render_limit = 5
STMP服务器配置
要能发送邮件通知,首先需要在配置文件grafana.ini中配置邮件服务器等信息:
#################################### SMTP / Emailing ########################## [smtp] enabled = true #是否允许开启 host = #发送服务器地址,可以再邮箱的配置教程中找到: user = 你的邮箱 # If the password contains # or ; you have to wrap it with trippel quotes. Ex """#password;""" password = 这个密码是你开启smtp服务生成的密码 ;cert_file = ;key_file = skip_verify = true from_address = 你的邮箱 from_name = Grafana # EHLO identity in SMTP dialog (defaults to instance_name) ;ehlo_identity = dashboard.example.com [emails] ;welcome_email_on_sign_up = false
钉钉
申请钉钉机器人获取对应token
钉钉里的机器人管理,新增一个自定义(通过Webhook接入自定义服务)的钉钉机器人.
在Grafana控制台->Alerting->新增.
Name: 钉钉报警
Type: DingDing
URL: 钉钉的token.
到具体的Panel里去指向这个报警.
参考 https://blog.52itstyle.vip/archives/2049/
安装对应插件.配置(指定对应的IP:PORT,password).
修改Prometheus配置文件.重启Prometheus.
参考 https://blog.52itstyle.vip/archives/2059/
安装mysqld_exporter插件.略.
参考 https://grafana.com/docs/auth/ldap/
申请多个用户,分别属于不同访问权限组.
Freeipa
Freeipa的LDAP申请用户与组信息
1.申请用户 对应DN
grafana-test DN: uid=grafana-test,cn=users,cn=accounts,dc=chanix,dc=top 计划放到admin组
chenxin DN: uid=chenxin,cn=users,cn=accounts,dc=chanix,dc=top
wuweiwei DN: uid=wuweiwei,cn=users,cn=accounts,dc=chanix,dc=top 计划放到editor组
jumpserver DN: uid=jumpserver,cn=users,cn=accounts,dc=chanix,dc=top 不放到admin和edit组,作为other用户测试
2.创建组 对应DN
grafana DN: cn=grafana,cn=groups,cn=accounts,dc=chanix,dc=top 包含grafana-test,chenxin这2个用户(拥有grafana的admin权限)
test DN: cn=test,cn=groups,cn=accounts,dc=chanix,dc=top 包含wuweiwei用户(拥有edit权限,无法修改数据源等)
其他组 (只有view权限)
/etc/grafana/ldap.toml
1.grafana主配文件 /etc/grafana/grafana.ini 修改
#################################### Auth LDAP ########################## [auth.ldap] enabled = true #开启 config_file = /etc/grafana/ldap.toml #LDAP配置文件路径 ;allow_sign_up = true #允许用户注册LDAP用户(关闭) # LDAP backround sync (Enterprise only) # At 1 am every day ;sync_cron = "0 0 1 * * *" ;active_sync_enabled = true
2.LDAP配置文件/etc/grafana/ldap.toml 示例
# To troubleshoot and get more log info enable ldap debug logging in grafana.ini # [log] # filters = ldap:debug [[servers]] # Ldap server host (specify multiple hosts space separated) #host = "127.0.0.1" host = "47.91.215.12" # LDAP地址 # Default port is 389 or 636 if use_ssl = true port = 389 #LDAP端口 # Set to true if ldap server supports TLS use_ssl = false # Set to true if connect ldap server with STARTTLS pattern (create connection in insecure, then upgrade to secure connection with TLS) start_tls = false # set to true if you want to skip ssl cert validation ssl_skip_verify = false # set to the path to your root CA certificate or leave unset to use system defaults # root_ca_cert = "/path/to/certificate.crt" # Authentication against LDAP servers requiring client certificates # client_cert = "/path/to/client.crt" # client_key = "/path/to/client.key" # Search user bind dn bind_dn = "uid=grafana-test,cn=users,cn=accounts,dc=chanix,dc=top" #绑定一个DN用于认证 # Search user bind password # If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;""" bind_password = ‘xxx‘ #密码 # User search filter, for example "(cn=%s)" or "(sAMAccountName=%s)" or "(uid=%s)" #search_filter = "(cn=%s)" search_filter = "(uid=%s)" #使用uid进行匹配用户 # An array of base dns to search through #search_base_dns = ["dc=grafana,dc=org"] search_base_dns = ["cn=users,cn=accounts,dc=chanix,dc=top"] #搜索用户的路径,勿注释,否则无法认证,会报LDAP config file is missing option: search_base_dns" ## For Posix or LDAP setups that does not support member_of attribute you can define the below settings ## Please check grafana LDAP docs for examples # group_search_filter = "(&(objectClass=posixGroup)(memberUid=%s))" # group_search_base_dns = ["ou=groups,dc=grafana,dc=org"] # group_search_base_dns = ["cn=groups,cn=accounts,dc=chanix,dc=top"] #用户所属组的路径(组要匹配).可以注释掉. # group_search_filter_user_attribute = "uid" # Specify names of the ldap attributes your ldap uses [servers.attributes] #这里好像没什么影响 name = "givenName" surname = "sn" #username = "cn" username = "uid" #使用UID member_of = "memberOf" email = "email" # Map ldap groups to grafana org roles.组映射(将ldap的组映射为grafana的). [[servers.group_mappings]] group_dn = "cn=grafana,cn=groups,cn=accounts,dc=chanix,dc=top" org_role = "Admin" #grafana管理员组 # To make user an instance admin (Grafana Admin) uncomment line below # grafana_admin = true # The Grafana organization database id, optional, if left out the default org (id 1) will be used # org_id = 1 [[servers.group_mappings]] group_dn = "cn=test,cn=groups,cn=accounts,dc=chanix,dc=top" #group_dn = "cn=users,dc=grafana,dc=org" org_role = "Editor" #可编辑grafana的组(但不能修改数据源等配置) [[servers.group_mappings]] # If you want to match all (or no ldap groups) then you can use wildcard group_dn = "*" org_role = "Viewer" #可查看的.默认为所有ldap用户都可以查看
主要为配置文件,以及数据库文件
/etc/grafana/
/var/lib/grafana/grafana.db
世界地图插件目前使用的数据库为InfluxDB数据库,是一个时空序列的数据格式,使用时先安装InfluxDB,然后到Grafana上面对接数据源,InfluxDB安装及简单的增删改查参考如下:
Influxdb安装 : https://blog.51cto.com/oybw88/2107228
(后面会通过python先对阿里云日志服务的数据进行清洗,然后写进InfluxDB,让Grafana可以实时监控阿里云服务业务情况)
python获取waf日志脚本:
#! /usr/bin/python3 import time from aliyun.log import * import pygeohash import pymysql import requests import json def getLocation(Longitude,Latitude): ak = "nwBNILPav66Baa0OHZNtVUwVcxmNKaBG" # url = "http://api.map.baidu.com/reverse_geocoding/v3/?ak={}&output=json&coordtype=wgs84ll&location={},{}".format(ak,Latitude,Longitude) url= ‘http: //ditu.aliyun.com/regeocoding?l={},{}&type=111‘.format(Latitude,Longitude) requests.get(url=‘http://api.map.baidu.com/geocoder/v2/‘, params={‘location‘:‘39.934,116.329‘,‘ak‘:‘nwBNILPav66Baa0OHZNtVUwVcxmNKaBG‘,‘output‘:‘json‘}) r = requests.get(url) print (r.json()) country = r.json()["result"]["addressComponent"]["country"] province = r.json()["result"]["addressComponent"]["province"] return (country,province) def main(): access_key = "LTAIseUbaRxeHLeu" serect_key = "ayoI3FUJg891bMRfimV4NVYudOy1th" endpoint = "http://ap-southeast-1.log.aliyuncs.com" project = "waf-project-1783799610063532-ap-southeast-1" logstore = "waf-logstore" query1 = "__topic__: waf_access_log | SELECT ip_to_country(if(real_client_ip=‘-‘, remote_addr, real_client_ip)) as country, count(1) as accesstimes group by country" query2 = "__topic__: waf_access_log | SELECT ip_to_geo(if(real_client_ip=‘-‘, remote_addr, real_client_ip)) as country, count(1) as accesstimes group by country" query3 = "* | select count(1) as accesstimes, ip_to_country(if(real_client_ip=‘-‘, remote_addr, real_client_ip)) as country, ip_to_geo(real_client_ip) as geo group by country,geo" topic = "" From = int(time.time()) - 3000 To = int(time.time()) client = LogClient(endpoint, access_key, serect_key) sql = [] req1 = GetLogsRequest(project, logstore, From, To, topic, query1, 10, 0, False) req2 = GetLogsRequest(project, logstore, From, To, topic, query3, 10, 0, False) times = 0 while (True): try: response2 = client.get_logs(req2).get_body() times += 1 print ("************************") print (times) except Exception as e: print (e.message) if (len(response2)>= 50): break if (times > 4): print ("重试五次获取日志后失败") break for i in range(len(response2)): # print (response2) if (response2[i][‘geo‘]==‘‘): response2[i][‘geo‘] = ‘22,118‘ location_geolist = response2[i][‘geo‘].split(",") location_geohash = pygeohash.encode(float(location_geolist[0]),float(location_geolist[1])) location_country = location_province = response2[i][‘country‘] location_accesstimes = response2[i][‘accesstimes‘] sql.append("insert into coinw_location values(now(),‘{}‘,‘{}‘,{},‘{}‘);".format( location_geohash, location_province, location_accesstimes, location_country)) return sql def mysqlConnection(sql): # 初始化数据库信息 host = "localhost" user = "root" passwd = "coinw_grafana" database = "legend_grafana" port=10306 delete_sql = "delete from coinw_location where datediff(curdate(), time)>=3;" conn = pymysql.connect(host,user,passwd,database,port) cursor = conn.cursor() // 根据传进来的sql,依次执行sql语句 for i in range(len(sql)): cursor.execute(sql[i]) cursor.execute(delete_sql) conn.commit() cursor.close() conn.close() def mysqltest(sql): host = "localhost" user = "root" passwd = "coinw_grafana" database = "legend_grafana" port = 10306 conn = pymysql.connect(host, user, passwd, database, port) cursor = conn.cursor() print (cursor.execute(sql)) cursor.close() conn.close() if __name__ == "__main__": sql = main() print ("----------------") mysqlConnection(sql)