Oracle Coherence运维监控
来源:互联网 发布:英雄联盟for mac美服 编辑:程序博客网 时间:2024/06/17 12:10
1. 环境参数检查与设置环境参数检查与设置
具体请参考Oracle® Coherence Administrator's Guide的第6章:Performance Tuning。针对本次项目的AIX环境,建议调整下面这些参数:
1.1. AIX操作系统参数
1.1.1. SocketBuffer Sizes
默认的socket buffer sizes一般都比较小,Coherence会报下面的Warning:
UnicastUdpSocket failed to set receive buffer size to1428 packets (2096304
bytes); actual size is 89 packets (131071 bytes).Consult your OS documentation
regarding increasing the maximum socket buffer size.Proceeding with the actual
value may cause sub-optimal performance.
用root用户执行下面的命令进行调整:
no -o rfc1323=1
no -o sb_max=4194304
1.1.2. 多播与IPV6选项
AIX5.2以上版本缺省以IPV6进行多播,需要在启动Coherence服务与应用时候,在JVM使用以下系统属性确认使用IPV4
-D java.net.preferIPv4Stack = true
同时在/etc/netsvc.conf中hosts=local,bind4
1.2. IBM JVM特殊配置
1.2.1. OutOfMemoryError
如果某个节点处于OutOfMemoryError状态,会给集群带来不好的影响,所以当某个节点处于这种状态,应该让它退出而不是师徒恢复。所以需要在IBM JVM的启动参数中配置:
UNIX:
-Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,exec="kill-9 %pid"
1.2.2. HeapSizing
IBM JVM不建议采用固定大小的heap,所以建议只配置-Xms,不配置-Xmx,具体可参考:http://www.ibm.com/developerworks/java/jdk/diagnosis/
2. 启停脚本
2.1. 启动脚本
2.2. 数据加载脚本
2.3. 停止脚本
3. Coherence日志管理
3.1. 日志说明
Coherence有它自己的日志框架,同时还支持使用log4j,SLF4J以及Javalogging ,为应用程序提供一个通用的日志环境。Coherence的日志是一个专用和低优先级线程,以降低日志记录对系统的关键部分的影响。日志被预先配置,并根据需要将默认设置进行修改。
Coherence记录日志级别决定了日志消息发出。默认的日志级别发出的错误,警告,信息,以及一些调试消息。在开发过程中,日志级别应提高到其最大设置,以确保所有调试消息记录。生产环境的日志输出级别3是合理的,在开发环境下,日志级别越高,输出信息越详细,默认值为5. 以下日志级别说明:
· 0
– Thislevel includes messages that are not associated with a logging level.与日志级别没有关系的信息
· 1
– Thislevel includes the previous level's messages plus error messages.错误日志
· 2
– Thislevel includes the previous levels' messages plus warning messages.警告日志
· 3
– Thislevel includes the previous levels' messages plus informational messages.
· 4-9
– Theselevels include the previous levels' messages plus internal debugging messages.More log messages are emitted as the log level is increased. The default loglevel is5
. debug的信息
· -1
– Nolog messages are emitted.无日志输出
3.2. 日志级别设置
Coherence的日志级别可以在tangosol-coherence-override.xml文件中配置,如下说示:
<logging-config>
<destinationsystem-property="tangosol.coherence.log">log4j</destination>
<severity-levelsystem-property="tangosol.coherence.log.level">3</severity-level>
</logging-config>
3.3. 日志监控
如果Coherence的日志文件或者应用的日志文件比较多或者比较大,要及时清理,防止把磁盘空间耗光。需要定期检查Coherence的日志,要注意警告warning及以上级别的日志信息,特别要注意的下面这些问题:
1、 Un-indexed data access 无索引的数据访问 日志关注的内容
1) at com.tangosol...readSerializable(ExternalizableHelper.java:2180
2) YYYY-MM-DD HH:MM:SS.mmm/55.838 Oracle Coherence GE 12.1.2.0.0<…> . . .Timeout while delivering a packet;requestingthe departure confirmation for Member(. . . ) by MemberSet(. . . )
2、 Heap exhaustion 内存消耗 日志关注的内容
java.lang.OutOfMemoryError: GC overhead limit exceeded Dumpingheap to java_pid6199.hprof. . .
Heap dump file created [16864871 bytes in 1.921 secs]
3、 Unresponsive service 未响应的服务
(thread=Cluster, member=2): Detected soft timeout) of {WrapperGuardableGuard{Daemon=DistributedCache}
4、 有关SWAP 的消息
2013/09/17 10:20:26 | [GC 938176K->865107K(1021376K), 19.7179554secs]
5、 Potential Bandwidth Messages 潜在的带宽的消息
a) Experienceda XXX ms communication delay (probable remote GC) with MemberYYY
b) Apotential communication problem has been detected.
c) Thisnode appears to have become disconnected
6、 Potential Disconnect Messages 潜在断开消息
a) (thread=Cluster,member=5): Failed to reach address /192.168.1.103within the IpMonitor timeout. Members [Member(Id=3. . . )] are suspect.
b) (thread=Cluster,member=5): Timed-out members MemberSet(Size=4,BitSetCount=2Member(Id=1, Timestamp=2011-02-05
7、 Detecting Split Brain 集群脑裂的信息
a) 2013-01-2508:16:59.555/638.831 Oracle Coherence GE 12.1.2.0.0/465p4 <D5>Anexistence of a cluster island
b) 2010-01-2509:38:43.213/460.877 Oracle Coherence GE 12.1.2.0.0/465p4Receivedpanic from senior Member,. . .
4. Coherence集群监控
4.1. Coherence集群监控说明
有多种工具可以监控Coherence集群,主要有:
1. Using JMX to Manage Oracle Coherence
JMX工具,主要是指Jconsole或者Java VisualVM.
2. Using Oracle Coherence Reporting
Coherence本身提供的功能,可生产文本格式的统计报告。
3. Using Oracle WebLogic Server
可通过Weblogic Console监控Coherence节点的健康状态,并启停Coherence节点。
4. Using Oracle Enterprise Manager
也就是通过OEM的ManagementPack for Oracle Coherence,具体请参见:https://docs.oracle.com/cd/E24628_01/install.121/e24215/coherence_getstarted.htm
如果是通过JXM工具监控,需要修改Coherence启动脚本,加上下面的参数:
-Dcom.sun.management.jmxremote-Dtangosol.coherence.management=all -Dtangosol.coherence.management.remote=true
如果需要远程监控:还需要加上:
-Dcom.sun.management.jmxremote.host=10.46.158.140-Dcom.sun.management.jmxremote.port=7091-Dcom.sun.management.jmxremote.ssl=false-Dcom.sun.management.jmxremote.authenticate=false
如果连接不上,还要加上
-Dcom.sun.management.jmxremote.local.only=false
为减少对集群性能的影响,一个集群中,只要有一个节点配置了上面的JMX参数就可以了。不需要每个节点都配置.
JMX工具只能监控从JMX工具启动到停止这个阶段的Coherence集群情况,而通过OEM监控,则可以把采集到的监控数据保存到数据库中,可以查看历史情况。
对Coherence的监控,重点是对内存的监控,如果发现内存没有及时回收并且即将耗光,可进行手工GC, Jconsole或者java VisualVM都可以手工GC,见下面的介绍。
4.2. 通过Java VisualVM监控
4.2.1. 安装Coherence插件
4.2.2. Coherence集群的Machine状态监控
4.2.3. Coherence集群的成员监控
要注意publisher success rate和receiver success rate, send Q size等指标,并注意每个节点的内存是否足够。Free memory等指标
4.2.4. Coherence集群的Service监控
要注意是不是所有的Service都处于正常状态,并注意task average duration, request average duration是否正常。Task backlog是否为0
如下面的Service状态就不正常,处于ENDANGERED状态, request average duration值也特别高。
4.2.5. Coherence集群的Cache监控
4.2.6. Coherence节点CPU,内存监控
如下图所示,VisualVM可监控到具体某个节点的CPU,内存使用情况,并且可以进行手工GC.
4.3. 通过JConsole监控
JConsole可监控具体某个Coherence节点的CPU,内存,进程情况,并可通过Jconsole手工执行GC。
另外通过JConsole的MBean可以监控更多细节的东西,这是JConsole比VisualVM强的地方。
4.4. 通过JMX编程监控
通过jmx管理Coherence,通过MBean数据可以显示Coherence集群简明的操作信息,实现实时的监控和分析。用Coherence-JVisualVM插件可以得到很多的Coherence相关信息,比如:Coherence集群的Machines,Members,Services,Caches等相关信息。
Coherence的MBean列表如下:
CacheMBean
Represents a cache. A cluster member includes zero or more instances of this managed bean.
ClusterMBean
Represents a cluster. Each cluster member includes a single instance of this managed bean.
ClusterNodeMBean
Represents a cluster member. Each cluster member includes a single instance of this managed bean.
ConnectionManagerMBean
Represents an Oracle Coherence*Extend proxy. A cluster member includes zero or more instances of this managed bean.
ConnectionMBean
Represents a remote client connection through Oracle Coherence*Extend. A cluster member includes zero or more instances of this managed bean.
FlashJournalRM
Represents a flash journal resource manager. The managed bean is an instance of the JournalMBean interface. Each cluster member includes a single instance of this managed bean.
ManagementMBean
Represents the grid JMX infrastructure. Each cluster member includes a single instance of this managed bean.
PointToPointMBean
Represents the network status between two cluster members. Each cluster member includes a single instance of this managed bean.
RamJournalRM
Represents a RAM journal resource manager. The managed bean is an instance of the JournalMBean interface. Each cluster member includes a single instance of this managed bean.
ReporterMBean
Represents the Oracle Coherence reporter. Each cluster member includes a single instance of this managed bean.
ServiceMBean
Represents a clustered service. A cluster member includes zero or more instances of this managed bean.
StorageManagerMBean
Represents a storage instance for a storage-enabled distributed cache service. A cluster member includes zero or more instances of this managed bean.
TransactionManagerMBean
Represents a transaction manager. A cluster member includes zero or more instances of this managed bean.
每个MBean又有相关的属性,有的是只读的,有的是可以修改的,帮助完成Coherence的管理和监控。下面列出几个MBean的具体属性信息。更多的信息请参考Oracle® Fusion Middleware Managing Oracle Coherence 。- Oracle Coherence运维监控
- Oracle Coherence中文教程五:Coherence调试
- Oracle Coherence中文教程五:Coherence调试
- Introduction to Oracle Coherence
- oracle Coherence企业级缓存
- oracle coherence 配置使用
- Oracle Coherence 应用优化
- Oracle Coherence中文教程二:安装Oracle Coherence
- Oracle Coherence中文教程二:安装Oracle Coherence
- oracle coherence介绍及使用
- Oracle coherence 项目常见问题分析
- Oracle Coherence中文教程四:构建一个Coherence 应用
- Oracle Coherence中文教程六:Coherence集群简介
- Oracle Coherence中文教程二十六:使用Coherence Query语言
- Oracle Coherence中文教程四:构建一个Coherence 应用
- Oracle Coherence中文教程六:Coherence集群简介
- Oracle Coherence中文教程二十六:使用Coherence Query语言
- Coherence
- JavaWeb如何学?
- [codewars]3.制作一个程序来过滤字符串列表,并返回一个列表,只有你的朋友的名字。 如果一个名字中有4个字母,那么你可以确定它是你的朋友!
- Vector与ArrayList的简单区别
- 利用反射重写tostring方法
- 572. Subtree of Another Tree Medium
- Oracle Coherence运维监控
- Java面试
- Android Studio 开发百度地图第一步
- 结构定义
- windows 批处理脚本(batch scripting)
- oracle 简单增删改查语句汇总
- HDU 6027 Easy Summation
- JavaScript 通过function创建对象的思考
- Elasticsearh 入门到放弃