Ansible Best Practices
来源:互联网 发布:知乎机构号运营方案 编辑:程序博客网 时间:2024/04/26 04:21
Author: Haohao Zhang
Summary
In order to manager thousands of servers , we need adeployment tool to do all kinds of things.
The most used tools are puppet, saltstack , ansible .
Puppet and saltstack both have agent , but ansible donothave agent which is the advantage , because you donot have to manage theseagents using another tool.
Also ansible is written in python language , have lots ofmodules .
You could develop your own modules and contribute back tocommunity.
Ansible use ssh protocal to transfer data .
Here are some best practices that we want to share withyou .
Practice1
Problem:
Result is output to terminal after you execute theansible or ansible-playbook command, sometimes you want it to run in thebackground , and output the result to log . you might use “nohup” to do it ,but you will find it is a disaster .
nohup ansible-playbook -i inventory main.yml -k -K -U root -u test -s -f 10 > ansible_log
Output:
File "/usr/lib/python2.6/site-packages/ansible/runner/connection_plugins/ssh.py", line 162, in _communicate rfd, wfd, efd = select.select(rpipes, [], rpipes, 1)ValueError: filedescriptor out of range in select()
Reason:
The python client uses select() to wait forsocket activity. select() is used because it is available on most platforms.However, select() has a hard limit onthe value of an file descriptor. If a socket iscreated that has a file descriptor > the value of FD_SETSIZE, the followingexception is thrown:
Note well: this is caused by the value of the fd,not the number of open fds.
Related issues in community:
https://github.com/ansible/ansible/issues/10157
https://issues.apache.org/jira/browse/QPID-5588
https://github.com/ansible/ansible/issues/14143
Reproduce:
cat test.py#!/usr/bin/env pythonimport subprocessimport osimport selectimport getpasshost = 'example.com'timeout = 5password = getpass.getpass(prompt="Enter your password:")for i in range(10): (r, w) = os.pipe() ssh_cmd = ['sshpass', '-d%d ' % r] ssh_cmd += ['ssh', '%s' % host, 'uptime'] print ssh_cmd os.write(w, password + '\n') os.close(w) os.close(r) p = subprocess.Popen(ssh_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) rpipes = [p.stdout, p.stderr] print "file descriptor: %r" % [x.fileno() for x in rpipes] rfd, wfd, efd = select.select([p.stdout, p.stderr], [], [p.stderr], timeout) if rfd: print p.stdout.read() print p.stderr.read() p.stdout.close() p.stderr.close()
Solution:
Using nohup to run ansible-playbook command will resultfile descriptors leak problem .
The right way to do it:
or you could leverage “screen” to keep the session .
ansible-playbook -i inventory main.yml -k -K -U root -u test -s -f 10 >ansible_log 2>&1 </dev/null
Practice2
Problem:
Imagine that we will make a change to hadoopconfiguration files , then restart hadoop service .
but we donot want to restart the whole cluster at once ,we need rolling restart .
let’s say 10 servers for a batch.
How to we do this?
Solution:
Add "serial: $NUM" to the main playbook .sample:
---- hosts: upgrade_rack gather_facts: yes vars_files: - vars.yml pre_tasks: - include: turn_off_monitor.yml tasks: - include: pre_upgrade.yml - include: upgrade.yml - include: post_upgrade.yml post_tasks: - include: turn_on_monitor.yml serial: 10
cat main.yml
Practice3
Problem:
We know that ansible to used to deploy things to remotehosts , but sometimes we want to login to other servers and do something when runningthe playbook tasks.
How do we do this?
Solution:
Ansible provides "delegate_to" feature to do this. sample
Ansible just turn over to delegated hosts to executecommand , after that it turns back .
- name: "Refresh nodes on resourcemanager" shell: "yarn rmadmin -refreshNodes" delegate_to: "example.com"
Practice4
Problem:
Sometimes we need to make changes on files when usingansible ,ansible provides some modules to do this , like “lineinfile” , “replace”, “blockinfile” .
Let's think a little more complex , assume we use module“replace” to modify a configuration file on the same server with forks 10 .
What will happen ?
We could imagine the configuration file will be messed up, because it is written by multiple processes at the same time .
- name: "Add hosts into mapred-exclude" replace: dest=mapred-exclude regexp='\Z' replace='{{inventory_hostname}}\n' owner=hadoop group=hadoop mode=644 backup=yes delegate_to: "example.com"
Solution:
We could add lock in the source code of module “replace”.and release thefile lock after write to file .
f = open(dest, 'rb+')fcntl.flock(f, fcntl.LOCK_EX)contents = f.read()result = do_something_to_contentsf.seek(0)f.write(result[0])f.truncate()fcntl.flock(f, fcntl.LOCK_UN)f.close()
Practice5
Problem:
When running against a batch of hosts withansible-playbook , we often met following error in “gather facts” step :
failed: [example.com] => {"cmd": "/bin/lsblk -ln --output UUID /dev/sdn1", "failed": true, "rc": 257}msg: Traceback (most recent call last): …………………TimeoutError: Timer expired
Solution:
The reason is“timeout” for get_mount_facts functionin /usr/lib/python2.6/site-packages/ansible/module_utils/facts.py is hardcoded to 10 seconds .
Hadoop nodes often have high IO , so disks may delay toresponse , so 10 seconds is not enough .
This problem have been fixed in ansible2.2 withintroducing a parameter gather_timeout .
- Ansible Best Practices
- Best Practices -
- Web Services Best Practices
- JUnit best practices
- Javascript Best Practices
- CAB Best Practices
- 最佳实践(Best Practices)
- Java Database Best Practices
- 一些C# Best Practices
- LIVE Networking: Best Practices
- Scalability Best Practices
- Best Practices for WOW64
- Siebel Scripting Best Practices
- Javascript Best Practices
- Log4j Best Practices
- Best practices when developing
- Java Best Practices
- Siebel Scripting Best Practices
- Html让页脚始终居于屏幕最下(css让页脚始终在底部不论页面内容多少)
- C# Selenium学习(二)-查找(定位)元素
- poj 2400 Supervisor, Supervisee
- MongoDb的安装和使用
- android .9图边角模糊
- Ansible Best Practices
- CSS实现图片在DIV中上下左右居中(1)
- ubuntu 14.04 常用的配置
- ubuntu运行命令tee显示和保存为log
- Apache Solr查询语法
- spring定时器设置规则
- [iOS 项目视频] iOS视频大全-转
- Android-ListView中添加Button或者可点击TextView进行页面跳转
- Squid代理FQ失败问题处理