Memory space manipulating in Java(Section four:Problems Resolving - part one)

来源:互联网 发布:刘备错过的人才 知乎 编辑:程序博客网 时间:2024/06/05 01:12

In this section, we are going to take up several typical problems associate with JVM for discussion. I will try to explain each problem with a sample I have met before, several commands was used during investigation phase, to know more detailed about them, I suggest you take a look at the manual of AIX or any other UNIX OS.

You would meet or have met the following problems in your running Java Process:

l         Crash

l         Hang

l         OutOfMemory

l         Fragmentation

l         Performance down

Before getting onto our main topic, you should be noticed that here even by understanding what I write in this section, it doesn’t mean you can handle all of the above problems yourself. It would be better to know them while designing your java application, and if you happen to meet one of those problems, it also would be helpful for you to track down or identify the cause of the problem before sending the problem to your JVM vendor to perform some analysis.

1. Crash

1-1. Phenomenon

You JVM get down due to SIGSEGV (incorrect access to memory) or SIGILL (illegal instruction) has been called.

    SIGILL: Illegal instruction execution, caused due to bad code pointers

    SIGSEGV: Segmentation fault, caused due to bad data pointers

 

    It is conceivable that the problem came out from two places as follows:

1)      Native Code running in your java application, it can be your own code or a third party library which is being called by JNI. A good example is the Type 2 JDBC driver, which has been implemented to deliver your jdbc query to native library of DB’s client module, and then begin to access DB from there.

2)      JIT compiler related matters.

In most of case, you don’t have to doubt your Java Code, because from Java you can not directly access or require any space from the Virtual Memory Space. But it would be very different from Native Code which can be called via JNI, as you can image how powerful your C/C++ program can do to Virtual Memory Space even the Real Memory Space.

We can talk a little more about this topic, why Native Code called via JNI would cause of your JVM’s crash? Know from my experience that it may have caused by running into a situation known as “Segment Collision”, where your JVM and other “In-Process” libraries are attempting to use the same address space simultaneously. As to term “In-Process”, when the JVM loads a library, it is loaded "In-Process". The term simply means that the library in question is mapped inside the address space of the JVM. This is an essential first step to use any of the code/data inside this library, and even the JVM itself is composed of a small "launcher" and multiple libraries.

It is important to understand what "In-Process" means, as the Segment Collision can occur only between components that are "In-Process". I have tried to draw a picture to help you understand it more clearly: 

        

Suppose the Native Module here stands for your JDBC Native Module, and it will

refer to SEG C for some purpose (base on the specification of your DB vendor). In most of the case, your Java Application gets run very well, but when your JVM need to extend its memory space and even up to SEG C, that would result in a collision as you can see from the picture showed above.

  Generally Native Heap would not require so much memory space unless you are using some 3rd party native modules through JNI, and fetch data in large quantities and keep for long time.

A more common case is that your Java Application requires a large quantity of Java Heap, even the Native Heap need just a few space, there would be a collision due to the excessive expansion of Java Heap.

1-2. Handling

1-2-1. Prevention

Firstly, prevention of such unintentional problems should be done beforehand. Especially for those who are planning to use a large memory space for Java Heap, pay attention to the database and middleware applications you are going to use. As most databases and middleware applications provide an inproc variant of their client, wherein a library is loaded in the address space of the JVM, here DB2 and MQ would be good examples.

l         As to the former, when Type 2 JDBC Driver was being used, the number of connection is limited by the number of shared memory segments to which a single process can be attached. Now this can be solved by changing to Type 4 JDBC Driver or setting environment variable EXTSHM = ON.

l         For the latter, when an application establishes a bindings (non-client) connection to a queue manager, MQSeries v5.2 (and before) uses segment 8 to attach shared memory and complete the connection. If segment 8 is unavailable, the connection will fail with reason code MQRC_Q_MGR_NOT_AVAILABLE (2059), and the application will generate an FDC file in the /var/mqm/errors directory showing a Probe Id of XY341019 and a Component of RetryConnectToSharedSubpool. This can be resolved by changing your queue manager’s IPCCBaseAddress attribute in /var/mqm/mqs.ini to another segment number or using EXTSHM variable.

Similar to the above two examples, there are still a lot you should take into account while planning a Java Application with large memory space. The best way is to keep contact with vendor of the 3rd party application about those issues as the internal specification is not under your control but theirs.

Besides those unintentional collisions, there are some others would cause a segment collision intentional. Collisions with this type can be easy to fix but sometimes very difficult to debug. They would be case like a bug of the library, either reusing of a freed pointer, or overwriting part of memory that doesn’t belong to it.

In Java normally they can be found in the form of Exception in runtime

(NullPointerException, ArrayIndexOutOfBoundsException and etc.)

In Native Module called by JNI, a SIGILL or SIGSEGV or similar fault conditions would be cause instead.

 

1-2-2. Preparing

Suppose you are suffering from a JVM down in you Java Application, you can refer to the following step and try to find out the root cause.

l         Keep your javadump (sometimes known as javacore or threaddump)

The output file ( javacorexxxx.txt )could be found under the following directory:

1.        IBM_JAVACOREDIR=<dir> (In case of IBM’JVM)

2.        Directory where your JVM process was started to run

( /usr/WebSphere/AppServer in case of WebSphere Application Server)

   

Fetch core information, as it will be required when you want to ask for support from your JVM vendor. Note that core file is part of OS functions, it is different from javacore. And it can also be created for program other than JVM.

      By default it can be found from here: /work/coredump

 

1-2-3. Analysis

I have drawn a picture to show a flow of the job as follows:

1.      If you found such words in your javacore file, commonly it means there is something wrong in the native module you are using through JNI.

For example in the following way:

   1XHSIGRECV     SIGSEGV received at 0xd43600a0 in

/usr/lpp/fmc/lib/libfmcjdint.a. Processing terminated.

2XHSIGHANDLER  SIGSEGV        : unknown handler (libmqmcs_r.a)

2XHSIGHANDLER  SIGSEGV        : unknown handler (libmqmcs_r.a)

That means some error in native module of WebSphere MQ Workflow. More in detail, the error would be one of the following types:

l         Bug of the native module: Solution of this type is simple, as a detail instruction would be sent from your vendor (something like apply a fixpack or upgrade to a new version). But it would take you a long time to wait for acknowledgement of there fault.

l         Limitation of native module: Though sometimes this type of error could be seen as a bug, it is different from where it can be resolved by just changing the setting of the DBMS/middleware/application parameters, or change to use module of pure java implementation.

As to this example, we were told by IBM support dept that change to use module of pure java implementation, or decrease size of Java Heap to make more space available for Native Heap, as the native module has a limitation on it.

* In either case, make contact with support dept of vendor as soon as you found SIGSEGV/SIGILL at their native module. Do not waste too much time in such case as I have said above they are not under your control.

2.      Set the environment variable JAVA_COMPILER with value NONE and re-start your java process.

3.      Set the environment variable IBM_MIXED_MODE_THRESHOLD with value 0(also mean OFF)/8/20/50/200 and re-start the java process.

4.      This step is a little complex that you had better refer to "Appendix E.Envirionment variables" - "JVM environment settings" in Diagnostics Guide 1.4 of IBM’s JVM.

By giving a summary, firstly set the environment variable JITC_COMPILEOPT with a value NALL to stop the function. And then reappear the problem, reset the variable with value NINLINING,NMMI2JIT,NQOPTIMIZE, to start just part of the function to identify which is the root cause of the problem.

 

原创粉丝点击