annan 2020-08-02
? 生产上新入网的服务器都需要安装prometheus的监控客户端软件,主要步骤有:新建监控用户、拷贝客户端软件、拉起客户端进程、开机自启动。本文记录了使用ansible的role方式批量快速的安装该客户端软件。
? 本文使用到的主要模块:user、stat、copy、shell、script、lineinfile等。
环境说明:
主机名 | 操作系统版本 | ip | ansible version | 备注 |
---|---|---|---|---|
ansible | Centos 7.6.1810 | 172.27.34.51 | 2.9.9 | ansible管理服务器 |
ansible-awx | Centos 7.6.1810 | 172.27.34.50 | / | 被管服务器 |
[ ~]# cd /etc/ansible/roles [ roles]# ansible-galaxy init prometheus - Role prometheus was created successfully [ roles]# tree prometheus prometheus ├── defaults │?? └── main.yml ├── files ├── handlers │?? └── main.yml ├── meta │?? └── main.yml ├── README.md ├── tasks │?? └── main.yml ├── templates ├── tests │?? ├── inventory │?? └── test.yml └── vars └── main.yml 8 directories, 8 files
使用ansible-galaxy命令初始化role的目录
[ ~]# yum -y install python3-pip
[ ~]# cd /tmp [ tmp]# pip3 download passlib==1.7.2 -d /tmp/pkg [ tmp]# more requirements.txt passlib==1.7.2 [ tmp]# pip3 install --no-index --find-links=./pkg -r requirements.txt WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead. Collecting passlib==1.7.2 (from -r requirements.txt (line 1)) Installing collected packages: passlib Successfully installed passlib-1.7.2
生产密码会使用到Python的passlib模块
[ ~]# python3 -c "from passlib.hash import sha512_crypt; import getpass; print(sha512_crypt.using(rounds=5000).hash(getpass.getpass()))" Password: $6$irgqm/Fea6/O07B7$LJpYtZoKqUkF.pN4D71LX2Cac3TNrF2.1GKGLfaSWxvKupknNLbWNcYym3LuojT3BqUeUCgsrmD/M6FqTx4lK/
输入明文密码会生成密码密文,复制该密码,后面创建用户时会用到。
[ ansible]# pwd /etc/ansible [ ansible]# more prometheus.yaml --- - hosts: "{{ hostlist }}" gather_facts: no roles: - role: prometheus
[ ~]# cd /etc/ansible/roles [ roles]# more prometheus/tasks/main.yml --- # tasks file for prometheus # author: loong576 - name: user search shell: id {{ user_name }} register: user_search ignore_errors: true - name: user add user: name: "{{ user_name }}" shell: "{{ user_bash }}" password: "{{ user_password }}" when: user_search.failed == true - name: file search stat: path: "{{ file_dest }}/{{ file_src }}" register: file_search - name: copy files copy: src: "{{ file_src }}" dest: "{{ file_dest }}" mode: 0755 when: file_search.stat.exists == false - name: process search shell: "ps -ef|grep node_exporter |grep -v grep" register: process ignore_errors: true - name: install node_exporter environment: dest: "{{ file_dest }}" src: "{{ file_src }}" port: "{{ node_port }}" script: startup.sh register: start tags: start when: process.failed == true - name: exec when startup lineinfile: dest: /etc/rc.local line: nohup {{ file_dest }}/{{ file_src }} --web.listen-address=:{{ node_port }} >/dev/null &
执行逻辑为:判断被执行主机上有无监控用户,若无则新增;判断被执行主机有无客户端文件,若无则拷贝;判断被执行主机有无客户端进程,若无则拉起;最后设置客户端进程开机自启动。
[ roles]# more prometheus/defaults/main.yml --- # defaults file for prometheus user_name: sysmonitor user_bash: /bin/bash user_password: $6$bB7R8JF3U7L7s/3E$fKOQwpoZ7RESfMmX6uqts1gw4yeXniRNctI2JRBRS2/120EgrHCWS3DboiRhO5sN0CjoVxvtAKgeDVQRaPlc0/ file_src: node_exporter file_dest: /home/sysmonitor node_port: 9100
定义监控用户的用户名、shell、密码,客户端执行文件的文件名、文件路径和端口。
[ roles]# ll prometheus/files/ 总用量 16512 -rw-r--r-- 1 root root 16900416 7月 30 16:04 node_exporter -rwxr--r-- 1 root root 102 7月 31 11:32 startup.sh [ roles]# more prometheus/files/startup.sh #/bin/bash echo $dest echo $src echo $port nohup $dest/$src --web.listen-address=:$port >/dev/null &
file文件有两个,node_exporter为客户端执行文件,startup.sh为客户端进程拉起脚本。
[ ansible]# pwd /etc/ansible [ ansible]# ansible-playbook prometheus.yaml -e hostlist=test50
‘ -e hostlist=test50’指定被执行的主机为test50,即172.27.34.50
登陆被管主机test50,发现监控用户和监控进程都在且加入到了开机自启动文件中,符合预期。
本文所有脚本和配置文件已上传github:ansible-production-practice-2
更多请点击:ansible系列文章