How to check health of Linux OS
来源:互联网 发布:阿里云克隆 编辑:程序博客网 时间:2024/05/16 08:06
I learned some experience while Nanjing found 0x03 error. At begining, we don't know why our GSRM(a linux process) hang in a short time 5 seconds sometimes. It didn't handle any message at that time and the interruption is not regularly. So we assume we have Linux OS problem. We did following checks:
1. Turn off Iptables service.
[root@Motorola-SRM-1A ~]# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT tcp -- Motorola-SRM-1A anywhere tcp dpt:glrpc flags:FIN,SYN,RST,ACK/SYN
ACCEPT tcp -- 10.0.0.2 anywhere tcp dpt:glrpc flags:FIN,SYN,RST,ACK/SYN
ACCEPT tcp -- Motorola-SRM-1B anywhere tcp dpt:glrpc flags:FIN,SYN,RST,ACK/SYN
DROP tcp -- anywhere anywhere tcp dpt:glrpc flags:FIN,SYN,RST,ACK/SYN
ACCEPT tcp -- Motorola-SRM-1A anywhere tcp dpt:sqlexec flags:FIN,SYN,RST,ACK/SYN
ACCEPT tcp -- 10.0.0.2 anywhere tcp dpt:sqlexec flags:FIN,SYN,RST,ACK/SYN
ACCEPT tcp -- Motorola-SRM-1B anywhere tcp dpt:sqlexec flags:FIN,SYN,RST,ACK/SYN
DROP tcp -- anywhere anywhere tcp dpt:sqlexec flags:FIN,SYN,RST,ACK/SYN
ACCEPT tcp -- 10.0.0.2 anywhere tcp dpt:9070 flags:FIN,SYN,RST,ACK/SYN
ACCEPT tcp -- Motorola-SRM-1B anywhere tcp dpt:9070 flags:FIN,SYN,RST,ACK/SYN
DROP tcp -- anywhere anywhere tcp dpt:9070 flags:FIN,SYN,RST,ACK/SYN
ACCEPT tcp -- Motorola-SRM-1A anywhere tcp dpt:9085 flags:FIN,SYN,RST,ACK/SYN
ACCEPT tcp -- 10.0.0.2 anywhere tcp dpt:9085 flags:FIN,SYN,RST,ACK/SYN
ACCEPT tcp -- Motorola-SRM-1B anywhere tcp dpt:9085 flags:FIN,SYN,RST,ACK/SYN
DROP tcp -- anywhere anywhere tcp dpt:9085 flags:FIN,SYN,RST,ACK/SYN
2. Change Linux core parameters:
(1) Add following lines into /etc/sysctl.conf
net.core.rmem_default=4096000
net.core.wmem_default=4096000
net.core.rmem_max=8192000
net.core.wmem_max=8192000
(2) Make change effective: run sysctl –p
(3) Check change result: run sysctl -a|grep 'net.core'
3.Check network card error.
[root@Motorola-SRM-1A ~]# netstat -us
Udp:
1924142763 packets received
2047410 packets to unknown port received.
347842 packet receive errors
1582986591 packets sent
4. Check services on Linux
[root@Moto-SRM-C ~]# chkconfig --list
NetworkManager 0:off 1:off 2:off 3:off 4:off 5:off 6:off
NetworkManagerDispatcher 0:off 1:off 2:off 3:off 4:off 5:off 6:off
acpid 0:off 1:off 2:off 3:on 4:on 5:on 6:off
anacron 0:off 1:off 2:on 3:on 4:on 5:on 6:off
[root@Moto-SRM-C ~]# lsb_release –a
[root@Moto-SRM-C ~]# uname -a
6. Check MySQL
mysql> show variables like '%timeout';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| connect_timeout | 10 |
| delayed_insert_timeout | 300 |
| innodb_lock_wait_timeout | 120 |
| innodb_rollback_on_timeout | OFF |
| interactive_timeout | 28800 |
| net_read_timeout | 30 |
| net_write_timeout | 60 |
| slave_net_timeout | 3600 |
| table_lock_wait_timeout | 50 |
| wait_timeout | 28800 |
+----------------------------+-------+
10 rows in set (0.00 sec)
7. Tcpdump
tcpdump -i eth0 -nn -X 'port 13819 and udp' -s 0 -w tcpdump.log -W 40 -C 10
40 files count
10MB per file
8. check ulimits
[root@Motorola-SRM-1A ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
max nice (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 65536
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
max rt priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 65536
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
9. vmstat 3,3
[root@Moto-SRM-C B_IPRM_SANDBOX]# vmstat 3,3
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 1456912 226008 1130316 0 0 68 22 338 148 1 0 98 1 0
1 0 0 1456440 226008 1130820 0 0 225 5 1334 2468 15 9 73 3 0
1 0 0 1455968 226016 1131544 0 0 243 233 1345 2515 16 9 73 3 0
1 0 0 1454816 226016 1132300 0 0 245 0 1338 2533 16 9 73 2 0
If the system is very busy, the cs and us is a bit high.
I turned off a lot services and changed the core parameters. The problem wasn't resolved. So after KunZhong checked the messages from STB, the message length is error which is very bigger than the actual message length. A lot of messages are supposed to one message.
- How to check health of Linux OS
- How to check linux version
- How to check digit of EAN
- how to check maxlength of textarea
- How to check graphics card on Linux
- How to check ip address in linux
- health check
- How to check the size of file/directory ?
- How to Improve Students' Mental Health?
- How to check srvconfig.
- How to check the external public IP in linux
- How To Check and Use Serial Ports Under Linux
- Linux - How to check processor and cpu details
- How to delete the hidden partition of Win7 OS
- How to check system information
- How to use authority check
- How to check cloud readiness?
- How to check CentOS version
- Spr咖啡的创业之道
- 数据挖掘的十大错误现象(翻译)
- GDI是什么!
- 多态
- 考计算机专业的人都要看看!
- How to check health of Linux OS
- 林达华 的博客
- Vertasent infrastructure code------ Windows platform migrate
- [林达华]介绍几本数学书
- 如何用 DELPHI 获取 CPU 的序列号
- vb.net使用Winsock控件编程心得
- 融合之路——厚积薄发
- 如何下载网页中的flash SWF文件
- 解决commons-fileupload组件无法处理自定义head信息的bug