应用服务器发生 hang 的诊断方法
来源:互联网 发布:linux双网卡绑定脚本 编辑:程序博客网 时间:2024/05/29 02:13
其实这是BEA官网上的一篇文档,是在weblogic8.1的时候推出的。在BEA被Oracle收购后,所有的support文章也就被重定向到Oracle的官网首页= =,而且google的快照也没有了。这篇来自无意间google到的一个外国论坛,虽然是写在8.1时,但是解决问题的方法和思路现在依旧有效。本想理解之后结合案例来写一篇,但是最近一直没有遇到相关的问题,而且觉得那样也许会破坏文章的完整性,所以放出原文,既在网上留个副本,也能让大家各取所需,见仁见智。
从内容看,你会发现除了这篇,还有EJB_RMI Server Hang、Application Dead Lock、JDBC Causes Server Hang,但是那个论坛里还能找到的仅有JDBC Causes Server Hang一篇。所以如果你接触weblogic比较早,保存过另两篇文章,或者在网上看到了,那请留言说明,万分感谢。
Generic Hang
Problem DescriptionA server hang is suspected when:
- The server does not respond to new requests.
- Requests time out.
- Requests take longer and longer to process (may be on the way to a hang).
- A server crash is not usually a symptom of a hung server but may follow.
Please note that not all of the following items would need to be done. Some issues can be solved by only following a few of the items.Quick Links:
- Why does the problem occur?
- Potential Causes of Server Hang
- Basic Steps
- Known WebLogic Server Issues
- Collecting Thread Dumps
- Analysis of a Thread Dump
Why does the problem occur?
A server can hang for a variety of reasons (refer to Potential Causes of Server Hang). Generally, a server hangs because of a lack of some resource. Lack of a resource prevents the server from servicing requests. For example, because of a problem (deadlock) or volume of requests there may be no execute threads available to do any work; all are busy or busy with previous requests.
Top of Page
When a server is hanging, first ping the server using java weblogic.Admin t3://server:port PING. If the server can respond to the ping, it may be that the application is hanging and not the server itself.
Ensure that the server is actually hanging and not doing garbage collection. To verify, restart the server with-verbosegc turned on, and redirectstdout and stderr to one file. When the server stops responding, it can be determined if it’s doing garbage collection or it is really hanging. If the garbage collection is taking too long (>10 seconds), the server may miss the heartbeats that servers use to keep each other informed of the topoplogy of the cluster.
WebLogic Server uses the ‘default’ thread queue or a configured application specific thread queueto service client requests. Client requests will only be handled in the default queue if no application specific thread queue is defined. Please seeTuning WebLogic Server Applications,Tuning the Default Execute Queue Threads, andTuning WebLogic Server Performance Parameters for more information on defining application specific thread queues.
In release 8.1, a change was made to the thread architecture in WebLogic Server. A specific kernel thread group for internal WebLogic tasks was created. This was found to be necessary to avoid deadlocks that occurred in earlier releases when all threads in the ‘default’ thread queue were used and none were thus available for WebLogic internal tasks.
The threads in the ‘default’ queue or the application specific thread queue (if one has been configured)are the threads that should be examined in the event of a server hang.Here’s an example of what one of these threads looks like in a thread dump. Execute Thread ‘14′ from the ‘default’ queue looks like in a thread dump when the thread is waiting for work. The latest method called by this thread isObject.wait(). This thread is in a state “waiting on monitor”.
“ExecuteThread: ‘14′ for queue: ‘default’” daemon prio=5 tid=0×8b0ab30 nid=0×1f4 waiting on monitor [0x96af000..0x96afdc4]at
java.lang.Object.wait(Native Method)
at
java.lang.Object.wait(Object.java:420)
at
weblogic.kernel.ExecuteThread.waitForRequest(ExecuteThread.java:94)
at
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:118)Threads can be in one of several states. Please see thetable below for a description of the thread states.
The format of the thread dump varies with the vendor. Check on the vendor’s website for information regarding the format.
Below is an example of threads that may be hanging. ExecuteThread ‘9′ is waiting to lock some object <dde51520>. Notice the “waiting to lock <dde51520>” line in the stack trace for this thread. ExecuteThread ‘6′ is also “waiting to lock the same object <dde51520>”. The third thread, ExecuteThread ‘5′ has locked this object <dde51520>and is doing work. This example demonstrates why one thread dump is not enough. If the server is hanging, and it is suspected that the cause is the locked object <dde51520>, then subsequent thread dumps will show whether or not that object was released and a new thread has locked object <dde51520>. If after several thread dumps, you do not see that the threads have progressed, that object <dde51520> has not been released, you may suspect that there is a problem with the routine(s) in the ExecuteThread ‘5′ call stack because the lock is not being released.
“ExecuteThread: ‘9′ for queue: ‘weblogic.kernel.Default’” daemon prio=5 tid=0xf684c8 nid=0×13 waiting for monitor entry [cc2ff000..cc2ffc24]at weblogic.cluster.MemberManager.done(MemberManager.java:306)
- waiting to lock <dde51520> (a weblogic.cluster.MemberManager)
at weblogic.cluster.MulticastManager.execute(MulticastManager.java:399)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:197)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:170)
“ExecuteThread: ‘6′ for queue: ‘weblogic.kernel.Default’” daemon prio=5 tid=0×9df020 nid=0×10 waiting for monitor entry [cc5ff000..cc5ffc24]
at weblogic.cluster.MemberManager.getRemoteMembers(MemberManager.java:396)
- waiting to lock <dde51520> (a weblogic.cluster.MemberManager)
at weblogic.cluster.ClusterService.getRemoteMembers(ClusterService.java:238)
at weblogic.servlet.internal.HttpServer.setServerList(HttpServer.java:388)
at weblogic.servlet.internal.HttpServer.clusterMembersChanged(HttpServer.java:418)
- locked <ddf32360> (a weblogic.servlet.internal.HttpServer)
at weblogic.cluster.MemberManager$2.execute(MemberManager.java:421)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:197)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:170)
“ExecuteThread: ‘5′ for queue: ‘weblogic.kernel.Default’” daemon prio=5 tid=0×9df020 nid=0×12 waiting for monitor entry [cc5ff000..cc5ffc24]
. . .
at weblogic.cluster.MemberManager.checkTimeouts(MemberManager.java:346)
- locked <dde51520> (a weblogic.cluster.MemberManager)
at weblogic.cluster.MulticastManager.trigger(MulticastManager.java:291)
at weblogic.time.common.internal.ScheduledTrigger.run(ScheduledTrigger.java:243
Determine if the”default” ExecuteThread queue is overloaded. Use the console to determine if any of the ExecuteThreads in the ‘default’ queue are idle. If none are idle, then the application probably needs to be configured with a larger number of ExecuteThreads. This value can be changed through the console and is in theconfig.xml file.
If the Execute Queue has idle threads, it is possible that not enough socket reader threads are allocated. By default, a WebLogic Server instance creates three socket reader threads upon booting. If a cluster system utilizes more than three sockets during peak periods, increase the number of socket reader threads.
The number of socket reader threads should usually be small. However, configure one thread for each Weblogic Server that acts as a client of the server instance that is hanging.
If using a JDBC connection pool, ensure that the JDBC connections have been configured to be equivalent to the number of simultaneous requests, i.e., execute threads, for the pool.
Top of Page
The possibility exists that a problem with JDBC could produce deadlock. Check the version and service pack level of the server found in the beginning of theweblogic.log. Then check above the version and service pack lines for any temporary patches that have already been applied to the server classpath. The patches will tell what problems have already been addressed.
Top of Page
The way to take a thread dump is dependent on the operating system where the hung server instance is installed. Information about taking a thread dump on various operating systems can be found athttp://e-docs.bea.com/wls/docs81/cluster/trouble.html#gc. Redirection of both standard error and standard out places the thread dump information in the proper context with server information and other messages and provides more useful logs.
Unix Systems (Solaris, HP, AIX)
Use kill –3 <weblogic process id> to create the necessary thread dumps to diagnose a problem. Ensure this is done several times on each server, spaced about 5 to 10 seconds apart, to help diagnose deadlocks. For this to work, nohup the process when starting the server (refer to SolutionsS-12292 and S-15924).
Windows, XP, NT
Each server requires <Ctrl>-<Break> to create the necessary thread dumps to diagnose a problem. Ensure this is done several times on each server, spaced about 5 to 10 seconds apart, to help diagnose deadlocks. On NT, in the command shell type CTRL-Break.
If you have installed WebLogic as a Windows service, you will not be able to see the messages from the JVM or WebLogic Server that are printed to standard out or standard error. To view these messages, you must direct standard out and standard error to a file. To do this, take the following steps:
- Create a backup copy of the WL_HOME/server/bin/installSvc.cmd master script.
- In a text editor, open the WL_HOME/server/bin/installSvc.cmd master script.
- In installSvc.cmd, the last command in the script invokes thebeasvc utility.
- At the end of the beasvccommand, append the command -log:”pathname”
where pathname is a fully qualified path and filename of the file that you want to store the server’s standard out and standard error messages. - The modified beasvc command will resemble the following command:
“%WL_HOME%/server/bin/beasvc” -install
-svcname:”%DOMAIN_NAME%_%SERVER_NAME%”
-javahome:”%JAVA_HOME%” -execdir:”%USERDOMAIN_HOME%”
-extrapath:”%WL_HOME%/server/bin” -password:”%WLS_PW%”
-cmdline:%CMDLINE%
-log:”d:/bea/user_projects/domains/myWLSdomain/myWLSserver-stdout.txt” - If you started WebLogic with nohup, the log messages will show up in nohup.out.
Linux
The Linux operating system views threads differently than other operating systems. Each thread is seen by the operating system as a process. To take a thread dump on Linux, find the process id from which all the other processes were started. Use the commands:
- To obtain the root PID, use:
ps -efHl | grep ‘java’ **. **
Use a grep argument that is a string that will be found in the process stack that matches the server startup command. The first PID reported will be the root process, assuming that the ps command has not been piped to another routine.
- Use the weblogic.Admin command THREAD_DUMP
Another method of getting a thread dump is to use the THREAD_DUMP admin command. This method is independent of the OS on which the server instance is running.
java weblogic.Admin -url ManagedHost:8001 -username weblogic -password weblogic THREAD_DUMP
NOTE: This command cannot be used if unable to ping the server instance.
If the JVM in use is Sun’s, the thread dump goes to stdout. Sun has enhanced the thread dump format between JVM 1.3.1 and 1.4. To obtain Sun’s 1.4 style of thread dump add the following option to the java command line for starting the 1.3.1 JVM:
-XX:+JavaMonitorsInStackTrace
Top of Page
The most useful tool in analyzing a server hang is a set of thread dumps. A thread dump provides information on what each of the threads is doing at a particular moment in time. A set of thread dumps (usually 3 or more taken 5 to 10 seconds apart) can help analyze the change or lack of change in each thread’s state from one thread dump to another. A hung server thread dump would typically show little change in thread states from the first to the last dump.
Threads can be in one of the following states:
Running or runnable threadA runnable state means that the threads could be running or are running at that instance in time.Suspended threadThread has been suspended by the JVM.Thread waiting on a condition variableThreads in a condition wait state can be thought of as waiting for an event to occur.Thread waiting on a monitor lockMonitors are used to manage access to code that should only be run by a single thread at a timeMore information on thread states can be found at http://java.sun.com/developer/onlineTraining/Programming/JDCBook/stack.html#states.
There is also a thread analysis tool at http://dev2dev.bea.com/resourcelibrary/utilitiestools/adminmgmt.jsp.
Download the tool and read the instructions at the link.
What to Look at in the Thread Dump
All requests enter the WebLogic Server through the ListenThread. If the ListenThread is gone, no work can be received and therefore no work can be done. Verify that a ListenThread exists in the thread dump. The ListenThread should be in the socketAccept method. The following example shows what the Listen Thread looks like:
at
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:353)
- locked <0×26d9d490> (a java.net.PlainSocketImpl)
at
java.net.ServerSocket.implAccept(ServerSocket.java:439)
at
java.net.ServerSocket.accept(ServerSocket.java:410)
at
weblogic.socket.WeblogicServerSocket.accept(WeblogicServerSocket.java:24)
at
weblogic.t3.srvr.ListenThread.accept(ListenThread.java:713)
at
weblogic.t3.srvr.ListenThread.run(ListenThread.java:290)Socket Reader Threads accept the incoming request from the Listen Thread Queue and put it on the Execute Thread Queue. If there are no socket reader threads in the thread dump, then there is a bug somewhere that is causing the socket reader thread to vanish. There should always be at least 3 socket reader threads. One socket reader thread is usually in the poll function, while the other two are available to process requests. Below are Socket Reader threads from a sample thread dump.“ExecuteThread: ‘2′ for queue: ‘weblogic.socket.Muxer’” daemon prio=10 tid=0×00036128 nid=75 lwp_id=6888070 waiting for monitor entry [0x1b12f000..0x1b12f530]
at
weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:92)
- waiting to lock <0×25c01198> (a java.lang.String)
at
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32)
at
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:178)
at
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:151)
“ExecuteThread: ‘1′ for queue: ‘weblogic.socket.Muxer’” daemon prio=10 tid=0×00035fc8 nid=74 lwp_id=6888067 runnable [0x1b1b0000..0x1b1b0530] at weblogic.socket.PosixSocketMuxer.poll(Native Method)
at
weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:99)
– locked <0×25c01198> (a java.lang.String)
at
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32)
at
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:178)
at
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:151)
“ExecuteThread: ‘0′ for queue: ‘weblogic.socket.Muxer’” daemon prio=10 tid=0×00035e68 nid=73 lwp_id=6888066 waiting for monitor entry [0x1b231000..0x1b231530]
at
weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:92)
- waiting to lock <0×25c01198> (a java.lang.String)
at
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32)
at
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:178)
at
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:151)
Allocating execute threads to act as socket reader threads increases the speed and the ability of the server to accept client requests. It is essential to balance the number of execute threads that are devoted to reading messages from a socket and those threads that perform the actual execution of tasks in the server.
In release 8.1, the socket reader threads no longer use “ExecuteThreads” in the default queue. Instead they have their own thread group named.
Next Steps
The next steps require a further analysis of the thread dump. Look in the thread dump to see what each the threads are doing at the time of the hang. This will help to analyze the next stage of the investigation. For example, if there are many threads involved in JSP compilation, refer to Potential Causes of Server Hang for further diagnosis and actions to test.
Top of Page
备注:
本文转载自:http://www.hashei.me/2009/08/java_generic_server_hang.html
http://blog.csdn.net/davidhsing/article/details/5854610
- 应用服务器发生 hang 的诊断方法
- 应用服务器发生 hang 的诊断方法
- hanganalyze诊断数据库hang的原因
- 【转自mos文章】数据库 hang问题的诊断信息收集方法
- 2014-8-21的一次性能诊断--应用服务器瓶颈
- 服务器的诊断
- 使用hanganalyze诊断db hang
- JDBC 引发的服务器 hang 解决思路
- WEB服务器连接不上MONGODB的常用诊断方法
- 一篇分析诊断被"hang"住数据库的资料(Oracle Performance Diagnostic Guide——Hang/Locking)
- system_server等应用CPU占用率过高诊断的一种方法
- 系统Hang住时用oradebug分析的方法
- 利用JProfiler诊断应用服务器内存泄漏
- 开机过程中发生死机故障的诊断与排除
- Oracle性能诊断的方法
- Oracle性能诊断的方法
- 记一次诊断Centos 7.X服务器Nginx PHP Mysql环境异常处理的方法和
- 【转】系统Hang住时用oradebug分析的方法
- tempdb 表空间监控
- C++ 私有、保护、公有继承
- MODIS数据说明
- 自定义消息机制研究学习(二)——做一些改动,定制自己的消息机制
- Weblogic WLST scripting memo
- 应用服务器发生 hang 的诊断方法
- android_java中图片占用内存大小问题
- MODIS数据产品介绍
- js自定义消息机制研究学习(三)——插件化我们js开发
- 在iPhone上实现简单Http服务
- js自定义消息机制研究学习(四)之杂七杂八
- DCO头文件宏定义问题
- 4种 查看数据空间使用情况 的方法
- 什么是同源策略?