在线服务的内存泄露问题

来源:互联网 发布:win7 smtp服务器软件 编辑:程序博客网 时间:2024/04/30 10:35

最近发现一台机器堆内存使用率大约100%,Web server相应极慢,导致客户端多次报告Read Timeout.
跟踪Resin的jvm-default log,发现已经持续告警:

[15-11-13 09:48:57.864] {CoordinatorThread[]-3} AlarmClock slow alarm Alarm[alarm[WebAppExpandDeployGenerator[null]]] 27901ms coordinator-delta 31071ms[15-11-13 09:48:57.864] {CoordinatorThread[]-3} AlarmClock slow alarm Alarm[alarm[WebAppExpandDeployGenerator[null]]] 26071ms coordinator-delta 31071ms[15-11-13 09:49:02.865] {CoordinatorThread[]-3} AlarmClock slow alarm Alarm[alarm[HostExpandDeployGenerator[null]]] 17393ms coordinator-delta 22394ms

1.通过jmap -heap查看一下堆内存使用情况,相当震惊:新生代和老年代使用率都接近100%.

[work@lg-b2c-kfs-webchat-app02 ~]$ jmap -heap 27000Attaching to process ID 27000, please wait...Debugger attached successfully.Server compiler detected.JVM version is 20.12-b01using thread-local object allocation.Parallel GC with 8 thread(s)Heap Configuration:   MinHeapFreeRatio = 40   MaxHeapFreeRatio = 70   MaxHeapSize      = 4294967296 (4096.0MB)   NewSize          = 1310720 (1.25MB)   MaxNewSize       = 17592186044415 MB   OldSize          = 5439488 (5.1875MB)   NewRatio         = 2   SurvivorRatio    = 8   PermSize         = 21757952 (20.75MB)   MaxPermSize      = 268435456 (256.0MB)Heap Usage:PS Young GenerationEden Space:   capacity = 1040121856 (991.9375MB)   used     = 978730744 (933.3903732299805MB)   free     = 61391112 (58.54712677001953MB)   94.09770002948578% usedFrom Space:   capacity = 26935296 (25.6875MB)   used     = 0 (0.0MB)   free     = 26935296 (25.6875MB)   0.0% usedTo Space:   capacity = 22675456 (21.625MB)   used     = 0 (0.0MB)   free     = 22675456 (21.625MB)   0.0% usedPS Old Generation   capacity = 2863333376 (2730.6875MB)   used     = 2863333288 (2730.68741607666MB)   free     = 88 (8.392333984375E-5MB)   99.99999692665895% usedPS Perm Generation   capacity = 78249984 (74.625MB)   used     = 78185600 (74.5635986328125MB)   free     = 64384 (0.0614013671875MB)   99.91772011097153% used

2.通过jmap -histo查看一下堆中的对象信息,发现业务对象Terminal和HttpChatContext实例数已经达到两百多万,正常情况下应该维持在几百到几千。更奇怪的是居然有大量的java.lang.ThreadLocal$ThreadLocalMap实例对象。

[work@lg-b2c-kfs-webchat-app02 log]$ jmap -histo 27000|less num     #instances         #bytes  class name----------------------------------------------   1:      11926272      920079152  [C   2:      22979344      735339008  java.lang.ThreadLocal$ThreadLocalMap$Entry   3:      22978102      551474448  java.util.concurrent.locks.ReentrantReadWriteLock$Sync$HoldCounter   4:      11795560      377457920  java.lang.String   5:           259      167801200  [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;   6:       4237670      135605440  java.util.concurrent.ConcurrentHashMap$HashEntry   7:       1611294      128911920  [Ljava.util.HashMap$Entry;   8:       2092579      100443792  com.xiaomi.kfs.chat.core.domain.Terminal   9:       2350789       94031560  java.util.LinkedHashMap$Entry  10:       2092583       83703320  java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync  11:         28486       59237952  [I  12:       1584956       50718592  java.util.HashMap$Entry  13:       2092580       50221920  java.util.concurrent.locks.ReentrantReadWriteLock  14:       2077571       49861704  com.xiaomi.kfs.chat.service.support.HttpChatContext  15:        783207       43859592  java.util.LinkedHashMap  16:        827778       39733344  java.util.HashMap  17:         13066       34390136  [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;  18:       2092583       33481328  java.util.concurrent.locks.ReentrantReadWriteLock$Sync$ThreadLocalHoldCounter  19:       2092583       33481328  java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock  20:       2092580       33481280  java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock  21:       1564801       25036816  java.util.HashMap$EntrySet  22:        163768       22473320  [B

3.可以确认出现了内存泄露,Terminal和HttpChatContext存在未被回收的现象。使用工具来观察一下这两种类对象是通过哪些路径和GC Roots关联上的。我们通过dump快照和JProfiler来进行进一步分析:

jmap -dump:format=b,live,file=/home/work/tmp/terminalMemoryLeak.hprof 27000

4.通过JProfiler,我们可以非常清晰地看到各类对象的数量和内存占用情况。
AllObjects.png

5.选择com.xiaomi.kfs.chat.core.domain.Terminal类,双击创建一个新的Object Set,选择Cumulated incoming references,可以看到Terminal对象的引用链。
TerminalReference.png

6.经过分析确认,ChatDomainManager中维护了Terminal的哈希表,而某些Terminal未被正确地从ConcurrentHashMap中删除,因此出现数量过大的情况。而Terminal对象中使用了ReentrantReadWriteLock,ReentrantReadWriteLock是通过ThreadLocal来记录当前线程所持有的读锁数量的。因此才有了上面诸多有关ReentrantReadWriteLock的对象实例。

更多内容请参考我的博客 清欢的个人博客

0 0