Memory space manipulating in Java(Section four:Problems Resolving - part two)

来源：互联网发布：观潮网络空间论坛编辑：程序博客网时间：2024/06/05 09:36

2. Hang

2-1. Phenomenon

You can not get any response from your application even the process is still alive.

(Check it with ps command)

I have experienced with such a case:

1) There is contact came from customer that they have submitted form in somehow a function but got no response back from the WebSphere Application Server even have been waiting for long time.

2) Continuously, similar contact came from other place said that even login process can not be finished for a long time.

3) Customer mat see the following error in the SystemOut.log of WebSphere:

[06/07/05 13:31:52:601 JST] 14875e8 ThreadMonitor W WSVR0605W: Thread
"Servlet.Engine.Transports : 350" (2333b4c6
0) has been active for 721,162 milliseconds and may be hung. There are
1 threads in total in the server that may be hung.

:
:

[06/07/05 13:40:52:995 JST] 14875e8 ThreadMonitor W WSVR0605W: Thread
"Servlet.Engine.Transports : 560" (386d34c6
0) has been active for 612,670 milliseconds and may be hung. There are
50 threads in total in the server that may be hung.

Here by seeing the last message we got to know 50 Threads have already been hung, and 50 is the number of threads we have defined to the running WebSphere Application Server, which means there is not thread can be available to response to user’s request.

We can not get any other information for the problem except for those I have outlined above, as in most of the case you can not found error information like any exception being thrown in hung problem.

2-2. Handling

So with the limited information how can we resolve the problem? We should know firstly how hang can get happened.

The following reasons can be considered when you get into such a situation:

1) Bug in your application or module of third party

l Deadlock

l Indefinite Loop: could make JVM down with a StackOverflow error

l Recursive Call: could make JVM down with a StackOverflow error

2) Waiting for processing of the backend resource

l Your Thread is waiting for an IO processing, which can be with backend resource like DB, MQ, or with problem like network.

3) Deterioration of Performance

l After waiting for a long time, you finally got a response from your application, and then you can consider this reason.

l Well, we should notice here that reason 2) also can cause such a phenomenon.

Now we are going to see what should be done when facing a system hung by looking at the following picture:

1. Pay attention to the words “before stopping the JVM”, as Javadump is so important that if you failed to get it you would have no way to identify what has caused the problem. Java application in hung will not create a Javadump for us, so here I will give a example of how can we get the file manually.

Supposing your java application is a process running on WebSphere Application Server, you can get the Javadump by using one of the following ways:

1) ThreadAnalyzer: you can download the tool from here:

http://www7b.software.ibm.com/wsdd/downloads/thread_analyzer.html

2) wsadmin: given the hostname is hostA, server name is serverA, you can try the following commands to get the output.

# /usr/WebSphere/AppServer/bin/wsadmin.sh -conntype SOAP -hostA -port 8880

wsadmin>set jvm[$AdminControl completeObjectName type=JVM,process=serverA,*]

wsadmin>$AdminControl invoke $jvm dumpThreads

2. After getting the Javadump,

1) Try to find out the thread in hang by checking information like status (R: Runnable, CW: Watch & Wait) of threads, stack trace.

In Javadump, the information is described in [XM subcomponent dump routine] section of XM part. Below is a sample for the section (pay attention to red parts):

0SECTION XM subcomponent dump routine

------------------ (omission)

1XMCURTHDINFO Current Thread Details

3XMTHREADINFO "Signal dispatcher" (TID:0x101EB960, sys_thread_t:0x9AE860, state:R, native ID:0x284) prio=5

1XMTHDINFO All Thread Details

------------------ (omission)

2XMFULLTHDDUMP Full thread dump Classic VM (J2RE 1.4.1 IBM Windows 32 build cn1411-20031011, native threads):

------------------ (omission)

3XMTHREADINFO "MQBindingsQMThread4" (TID:0x10551CD8, sys_thread_t:0x53ED8F8, state:CW, native ID:0xA50) prio=5

4XESTACKTRACE at java.lang.Object.wait(Native Method)

4XESTACKTRACE at java.lang.Object.wait(Object.java:438)

4XESTACKTRACE at com.ibm.mq.server.MQThread.run(MQThread.java:1348)

4XESTACKTRACE at java.lang.Thread.run(Thread.java:568)

------------------ (omission)

By checking the stack trace, you can see what method is being called by the thread. This can be useful in situation like Indefinite Loop, as you got to know which method is being ran by the thread by knowing this information.

1) Try to identify whether the threads is in deadlock condition by referring to [LK subcomponent dump routine] section of LK. Below is a sample for the section:

0SECTION LK subcomponent dump routine

NULL ============================

------------------ (omission)

1LKMONPOOLDUMP Monitor Pool Dump (flat & inflated object-monitors):

//*Note1

2LKMONINUSE sys_mon_t:0x3020DAE8 infl_mon_t: 0x00000000:

3LKMONOBJECT java.lang.Object@3030DF38/3030DF40: Flat locked by thread ident 0x07, entry count 1

3LKNOTIFYQ Waiting to be notified:

3LKWAITNOTIFY "Thread-0" (0x353366F0)

//*Note2

2LKMONINUSE sys_mon_t:0x3020DB68 infl_mon_t: 0x00000000:

3LKMONOBJECT java.lang.Object@3030DF48/3030DF50: Flat locked by thread ident 0x06, entry count 1

3LKNOTIFYQ Waiting to be notified:

3LKWAITNOTIFY "Thread-1" (0x35336E90)

------------------ (omission)

1LKFLATMONDUMP Thread identifiers (as used in flat monitors):

2LKFLATMON ident 0x02 "Thread-2" (0x30210B00) ee 0x302108D4

//*Note3

2LKFLATMON ident 0x07 "Thread-1" (0x35336E90) ee 0x35336C64

2LKFLATMON ident 0x06 "Thread-0" (0x353366F0) ee 0x353364C4

With this sample, firstly we will pay attention to part *Note3, from which we know thread ident 0x07 has a name Thread-1, thread ident 0x06 has a name Thread-0.

And then we can see part *Note1, it shows you that Thread-0 is waiting for OBJECT java.lang.Object@3030DF38/3030DF40, which is now being locked by Thread-1.

Finally, by looking at part *Note2, you can find that Thread-1 is waiting for OBJECT java.lang.Object@3030DF48/3030DF50, and the object itself, is being locked by Thread-0.

To the last, you can come to the conclusion that the two threads have been running into deadlock status.

2) Try to identify whether the thread is running into Recursive Call status, it can be done through information described in [XM subcomponent dump routine] section of XM part.Here we can take a look at the following sample,

0SECTION XM subcomponent dump routine

NULL ============================

3XMTHREADINFO "Servlet.Engine.Transports : 0" (TID:0x11267038, sys_thread_t:0x23D3FC70, state:R,

native ID:0x764) prio=5

4XESTACKTRACE at handson.jvm.RecursiveCall.doRecursiveCall(RecursiveCall.java:14)

------------------ (omission)

It is easy for you to give a judgment that the thread has ran into a Recursive Call.

3) There may be other reasons which can cause your thread to be hang, analysis the Javadump will always helpful for you to resolve the problems.

3. You can not identify the root cause by just analyzing the Javadump, then more detailed information like trace or debug log need to be prepared.

This should be different from the software product you are using. Normally you will have to follow the instruction of your vendor to set the log to trace or debug level, then reappear the problem, send the outputted log to support center for analyzing.

Example1: On WebSphere Application Server, by setting a value to Diagnostic Trace of the server, you can get a log of the specified component with trace level for analyzing. Like JMS, do as follows:

1) Enter the following string into the Trace Specification field:

JMSApi=all=enabled:JMSServer=all=enabled:JMSQueueManager=all=enabled:Messaging=all=enabled:com.ibm.ejs.jts.*=all=enabled

2) Enter -DMQJMS_TRACE_LEVEL=base into Generic JVM arguments field.

Example2: As to WebSphere MQ, get a level 99 trace with the following command:

1) trace -a -j30D,30E -T 2000000 -L5000000 -o traceFileTemp

2) Reappear the problem

3) trcstop

4) cat /etc/trcfmt /usr/mqm/lib/amqtrc.fmt > $WORK/mytrcfmt

5) trcrpt -t $WORK/mytrcfmt traceFileTemp > traceFileFinal

4. Before taking above action, it would be better for you to check if the problem was caused by network or the backend system.

You can try to accomplish with OS dependent commands (ping, netstat, topas, etc.) or Product dependent commands/utilities (DB2 CLP, MQ Runmqsc, etc.).

Normally, backend system like DBMS, Middleware would not cause a hang as they have a timeout setting by default. When the specified time past and a timeout error have not been caused, a bug in the product like an Indefinite Loop of trying could be considered.

Example1: We are using Java API of MQ Workflow to get some information from the backend Runtime Database, the timeout setting of the Java API is 30 seconds.We firstly had the record of target table locked, and send a query by calling the Java API, a timeout exception is expected to throw, but we got an endless long waiting at last. After analyzing the result of DB2’s snapshot, we got to know that MQ Workflow API will do an endless try to query for the locked record in such a situation.
Example2: Still have something to do with MQ Workflow, this time we got hang in all of MQ Workflow base functions of the system, by using DB2, MQ commands we can see there is no problem to the running process like DB, MQ, Workflow. An error was found in MQ Workflow’s system error log, which shows a deadlock happened due to the MQ Workflow DB module and require for a re-bind processing. By knowing this it can save your time of checking your java source in such a situation. Re-start the MQ Workflow service and get contact with the support center would be a better way to find out the root cause.