Server 内存泄漏

来源：互联网发布：阿里云大厦编辑：程序博客网时间：2024/06/05 00:25

上一篇文章中提到的内存泄漏问题其实只是改善了一些没有真正解决。经过近一周的折腾，现在是完全稳定了。

记录一些解决问题中的感受。

一般出问题的可能性

自己的代码 > 配置 > 开源框架

1 代码问题

代码可以通过内存工具结合代码review的方式解决

可以参考以下文章，我用的是jprofiler, 其实目的是一样的。

Finding Memory Leaks in Java Apps

Here is a small HOWTO on how to find memory leaks with Java SE.

I’ve written it while trying to find memory leaks in our testing tools: JTHarness and ME Framework, and then wanted to share the HOWTO with the world, but I didn’t have my blog at that time, so I posted this info as a comment to a relevant entry in the excellent Adam Bien’s blog.

Note: Use the latest JDK 6, because it has the latest tools, with lots of bug fixes and improvements. All the later examples assume that JDK6′s bin directory is in the PATH.

Step 1. Start the application.

Start the application as you usually do:

java -jar java_app.jar

Alternatively, you could start java with hprof agent. Java will run slower, but the huge benefit of this approach is that the stack traces for created objects will be available which improves memory leak analysis greatly:

 java  -Dcom.sun.management.jmxremote  -Dcom.sun.management.jmxremote.port=9000  -Dcom.sun.management.jmxremote.authenticate=false  -Dcom.sun.management.jmxremote.ssl=false  -agentlib:hprof=heap=dump,file=/tmp/hprof.bin,   format=b,depth=10  -jar java_app.jar

When the application is up, perform various actions that you think might lead to memory leaks.

For example, if you open some documents in your app, the memory graph could rapidly go up. If closing the docs and invocation of full garbage collection did not bring the memory back to normal level, there is probably a leak somewhere.You might use jconsole from JDK 6 to see the memory consumption graph to have a clue whether memory leak is present or not:

jconsole

It will pop up a dialog with a list of Java applications to connect to. Find the one with java_app.jar and connect. Also, jconsole allows to invoke full GC providing nice button just for that.

Step 2. Find the application pid.

Find out the application’s process id via:

jps

It will print something like:

15976 java_app.jar
7586 startup.jar
22476 Jps
12248 Main
5437 Bootstrap

In our case the pid is 15976.

Step 3. Dump the heap into file.

Dump heap into the file:

jmap -dump:format=b,file=/tmp/java_app-heap.bin 15976

We just told jmap to dump the heap into /tmp/java_app-heap.bin file, in binary from (which is optimized to work with large heaps). The third parameter is the pid we found in Step 2.

Alternatively, if you started java with hprof agent, you could just use Ctrl-/ on Solaris/Linux or Ctrl-Break on Windows to dump heap into the file, specified in hprof agent arguments.

Step 4. Visualize the heap.

Use jhat tool to visualize the heap:

jhat -J-Xmx326m /tmp/java_app-heap.bin

Jhat will parse the heap dump and start a web server at port 7000. Connect to Jhat server by pointing your browser to:

http://localhost:7000

And start investigating.

Jhat allows you to see what objects are present in the heap, who has references to those objects, etc.

Here are some tips:

Investigate _instances_, not _classes_.
Use the following URL to see the instances: http://localhost:7000/showInstanceCounts/
Use “Reference Chains from Rootset” (Exclude weak refs!!!) to see who’s holding the instance.

但有时候通过这种方式也不是很好发现泄漏的地方。就像我们这一次发现泄漏最多的是hashmap$entry[] 能达到几个G,这么大的量，从IDC的production上导到本地就很费劲，也没找到明显的泄漏地方。

通过上面的方式（白盒）解决不了，就得考虑黑盒测试，看那个接口引起的，那一层引起的，横向纵向分析测试。

我们在stage服务器上进行了很多测试。

对接口的单独压力测试并不引起内存问题，

于是怀疑是接口组合调用引起的问题，于是组合调用 get post delete put 请求。

问题出现了。一阵高兴

当对同一实体的创建和查找同时进行时会有很严重的内存泄漏。

我们这部分主要用到 JPA + hibernate search

查看代码发现没有问题，于是怀疑配置问题。

<bean id="entityManagerFactory" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean"><property name="persistenceXmlLocation" value="classpath*:/META-INF/configuration/initializers/jpa/persistence.xml" /><property name="persistenceUnitName" value="ApplicationEntityManager" /><property name="dataSource" ref="dataSource" /><property name="jpaVendorAdapter"><bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter"><property name="showSql" value="false" /><property name="generateDdl" value="true" /><property name="database" value="MYSQL" /><property name="databasePlatform" value="org.hibernate.dialect.MySQL5Dialect" /></bean></property><property name="jpaPropertyMap"><map><entry key="hibernate.generate_statistics" value="false" /><entry key="hibernate.session_factory_name" value="SessionFactory" /><entry key="hhibernate.bytecode.use_reflection_optimizer" value="true" /><entry key="hibernate.cache.provider_class" value="net.sf.ehcache.hibernate.SingletonEhCacheProvider" /><entry key="hibernate.cache.use_query_cache" value="false" /><entry key="hibernate.cache.use_second_level_cache" value="false" /><entry key="hibernate.cache.use_structured_entries" value="false" /><entry key="hibernate.cache.generate_statistics" value="false" /><entry key="net.sf.ehcache.configurationResourceName" value="/META-INF/configuration/initializers/cache/ehcache.xml" /><entry key="hibernate.search.default.directory_provider" value="ram" /><entry key="hibernate.search.default.indexBase" value="${application.lucene.directory}" /><entry key="hibernate.search.default.locking_strategy" value="single" /><entry key="hibernate.search.default_null_token" value="_null_" /><entry key="hibernate.search.default.indexwriter.transaction.max_merge_docs" value="10" /><entry key="hibernate.search.default.indexwriter.transaction.merge_factor" value="20" /><entry key="hibernate.search.default.indexwriter.batch.max_merge_docs" value="100" /><entry key="hibernate.search.worker.execution" value="async" /><entry key="hibernate.search.worker.thread_pool.size" value="10" /><entry key="hibernate.search.worker.buffer_queue.max" value="100" /></map></property><property name="loadTimeWeaver"><bean class="org.springframework.instrument.classloading.InstrumentationLoadTimeWeaver" /></property></bean>

经过一番尝试在去掉

三项配置后，内存不泄漏了。

在hibernate search文档中发现配置中的名称写的不对

hibernate.search.Animals.2.indexwriter.transaction.max_merge_docs 10hibernate.search.Animals.2.indexwriter.transaction.merge_factor 20hibernate.search.default.indexwriter.batch.max_merge_docs 100

为什么这个错误的名称会引起如此严重的内存问题

看来框架内部应该有一些问题