Android N中SurfaceView泄露的问题分析
来源:互联网 发布:淘宝被投诉假冒品牌 编辑:程序博客网 时间:2024/06/05 05:20
最近遇到一个bug,现象为SurfaceView的Layer没有销毁,导致屏幕上一直显示该Layer。觉得该案例有点意思,故在此记录下分析过程及解决方法,供有一定framework基础的Rom开发人员参考。
现象
开心消消乐的界面一直在屏幕上显示,无论如何都不能销毁。
分析过程
首先最直接相关的模块是SurfaceFlinger,既然能看到,应该存在该Layer并且进行了合成,否则这里就有问题,用如下命令dump状态信息:
adb shell dumpsys SurfaceFlinger
这里只摘取该Layer相关的部分:
+ Layer 0x71b57b0400 (SurfaceView - com.happyelements.AndroidAnimal/com.happyelements.hellolua.MainActivity) Region transparentRegion (this=0x71b57b0708, count=1) [ 0, 0, 0, 0] Region visibleRegion (this=0x71b57b0410, count=1) [ 0, 0, 1080, 1920] Region surfaceDamageRegion (this=0x71b57b0488, count=1) [ 0, 0, 0, 0] layerStack= 0, z= 21015, pos=(0,0), size=(1080,1920), crop=( 0, 0,1080,1920), finalCrop=( 0, 0, -1, -1), isOpaque=1, invalidate=0, alpha=0xff, flags=0x00000002, tr=[1.00, 0.00][0.00, 1.00] FilterRender Layer= 0, FilterMode= 0 availableRect =( 0, 0, 0, 0) client=0x71b86f0f40 format= 4, activeBuffer=[1080x1920:1088, 1], queued-frames=0, mRefreshPending=0 mSecure=0, mProtectedByApp=0, mFiltering=0, mNeedsFiltering=0 mTexName=54 mCurrentTexture=-1 mCurrentCrop=[0,0,0,0] mCurrentTransform=0 mAbandoned=0 -BufferQueue mMaxAcquiredBufferCount=1, mMaxDequeuedBufferCount=3, mDequeueBufferCannotBlock=0 mAsyncMode=0, default-size=[1080x1920], default-format=4, transform-hint=00, FIFO(0)={} this=0x71b55e3000 (mConsumerName=SurfaceView - com.happyelements.AndroidAnimal/com.happyelements.hellolua.MainActivity, mConnectedApi=0, mConsumerUsageBits=0x900, mId=39, mPid=15358, producer=[-1:com.happyelements.AndroidAnimal], consumer=[15358:/system/bin/surfaceflinger]) [00:0x0] state=FREE [01:0x0] state=FREE [02:0x0] state=FREE [03:0x0] state=FREE *BufferQueueDump mIsBackupBufInited=0, mAcquiredBufs(size=0), mMode=TRACK_CONSUMER [-1] mLastAcquiredBuf->mGraphicBuffer->handle=0x71b7636900
得到如下信息:
- flags=0x00000002,即该Layer是show和opaque状态
- alpha=0xff,即alpha值为完全不透明
- visibleRegion为[ 0, 0, 1080, 1920],说明有可见区域,而且是全屏
综合以上以及dump出来的合成信息,说明SurfaceFlinger这边的状态没有问题,符合我们看到的现象。
同时注意到有些奇怪的信息,之所以说奇怪是因为跟正常参与合成的Layer不一样:
- GraphicBuffer全部是FREE状态,正常应该至少有一个是ACQUIRED
- mCurrentTexture=-1,正常应该是>=0
- mConnectedApi=0,正常应该是>0
当然能进入到现在这种bug状态本身就不能太按常理来看待,SurfaceFlinger这边暂且先到这里。
目光转向WMS这边,用如下命令dump状态信息:
adb shell dumpsys window
唯一跟该SurfaceView相关的信息如下:
WINDOW MANAGER SURFACES (dumpsys window surfaces) Surface #0: #75499c8 SurfaceView - com.happyelements.AndroidAnimal/com.happyelements.hellolua.MainActivity mLayerStack=0 mLayer=21015 mShown=true mAlpha=1.0 mIsOpaque=false mPosition=0.0,0.0 mSize=1080x1920 mCrop=[0,0][1080,1920] mFinalCrop=[0,0][0,0] Transform: (1.0, 0.0, 0.0, 1.0)
这并不是窗口堆栈打印出的内容,为了不让此文写的太过冗长,直接给出结论:
- 该信息打印的是一个静态SurfaceTrace集合中的内容
- SurfaceTrace是SurfaceControl的子类,而每个SurfaceControl对应的是SF端的一个Layer
- 构造新的SurfaceTrace实例会往该静态数组添加元素,销毁时移除该元素
现在有个SurfaceTrace存在于该静态集合中,说明其创建后没有被销毁,这就是该bug的最直接原因,也是我们最开始的切入点。 现在WMS仅有这条信息,并没有窗口堆栈及token的对应状态,这着实让人有点惆怅,否则或许能发现点蛛丝马迹,直接扒代码找原因无异于大海捞针。
现在没有log,只有现场,还能知道如下信息:
- 通过ps命令知道目标进程已死(好奇怪,进程都死了怎么Layer还在)
- 还记得上面提到该Layer的一些奇怪的信息,扒了扒代码后得知这是因为调用了SurfaceControl.disconnect(),这是android N中新增的API,并且只在暂存Surface相关的逻辑中调用,所谓暂存Surface是android N新增的用来加速界面响应的一种优化,这可以说明代码曾经走到过某个位置,多少对分析问题有点帮助。
如果没有其它线索,分析到这里已经结束,剩下的事情就是”愉快地“钻进代码的海洋里去寻找bug,并向老天许愿。所幸的是能抓到system_server的hprof,瞬间感觉人生充满了希望。
接下来看hprof文件,为简化分析过程,不会去粘贴大量的数据。
首先从WMS中dump出来的那个SurfaceControl入手,根据代码这个实例只能是SurfaceTrace或者是它的子类SurfaceControlWithBackground,最后发现是SurfaceControlWithBackground,N种mSubLayer小于0的子窗口(即位于父窗口下方)在创建SurfaceControl时默认实例化SurfaceControlWithBackground,而SurfaceView刚好就是这样的窗口。查看它的GcRoot,确实是保存在一个静态的数组中。
顺藤摸瓜找到了对应的WindowState,GcRoot在WMS.mWindowMap中,另外它的父WindowState也一样存在。到这里我们要先下一个重要的结论:
泄露的不止是SurfaceView窗口,还有它的父窗口。
以及我们后面再来回答的一个疑问:
为什么SurfaceFlinger端看不到父窗口的Layer?
接下来马上要回答一个问题:上面不是说WMS已经dump不出来这些窗口了吗?
要回答这个问题要先讲下WindowState的组织方式,它保存在系统中的多个位置,包括如下:
- WMS.mWindowMap:以IBinder为键值查找WindowState
- DisplayContent.mWindows:列表方式保存单个屏幕上的WindowState
- WindowToken.windows或AppWindowToken.allAppWindows:列表方式保存从属的WIndowState
- WindowState.mChildWindows:列表方式保存子窗口
注:上述的列表方式均以ArrayList的方式保存窗口,索引值越大层级越高
问题的答案是:dump出来的信息是通过DisplayContent.mWindows来取,既然没有对应的信息,说明泄露的WindowState已经从这里面移除,考察上述的其它地方是否存在:
- WMS.mWindowMap:存在
- DisplayContent.mWindows:不存在
- AppWindowToken.allAppWindows:不存在
- WindowState.mChildWindows:存在
按照正常的逻辑,移除一个WindowState后,所有组织它的地方都应该移除对应的引用。现在这种状况,需要在这几个中找一个最好排查的因素,从代码来看,WMS.mWindowMap是最简单的,因为只有一处代码从这里移除WindowState,即WMS.removeWindowInnerLocked():
void removeWindowInnerLocked(WindowState win) { if (win.mRemoved) { // Nothing to do. if (DEBUG_ADD_REMOVE) Slog.v(TAG_WM, "removeWindowInnerLocked: " + win + " Already removed..."); return; } for (int i = win.mChildWindows.size() - 1; i >= 0; i--) { WindowState cwin = win.mChildWindows.get(i); Slog.w(TAG_WM, "Force-removing child win " + cwin + " from container " + win); removeWindowInnerLocked(cwin); } win.mRemoved = true; ... mPolicy.removeWindowLw(win); win.removeLocked(); // WindowState.mChildWindows中移除 if (DEBUG_ADD_REMOVE) Slog.v(TAG_WM, "removeWindowInnerLocked: " + win); mWindowMap.remove(win.mClient.asBinder()); // WMS.mWindowMap中移除 ... final WindowToken token = win.mToken; final AppWindowToken atoken = win.mAppToken; if (DEBUG_ADD_REMOVE) Slog.v(TAG_WM, "Removing " + win + " from " + token); token.windows.remove(win); // WindowToken.windows中移除 if (atoken != null) { atoken.allAppWindows.remove(win); // AppWindowToken.allAppWindows中移除 } ... final WindowList windows = win.getWindowList(); if (windows != null) { windows.remove(win); // DisplayContent.mWindows中移除 }}
也就是说对于这个泄露的WindowState,肯定没有执行到这里,这从WindowState.mRemoved值为false也可以印证,从WindowState.mChildWindows中移除的唯一位置在WIndowState.removeLocked():
void removeLocked() { disposeInputChannel(); if (isChildWindow()) { if (DEBUG_ADD_REMOVE) Slog.v(TAG, "Removing " + this + " from " + mAttachedWindow); mAttachedWindow.mChildWindows.remove(this); // WindowState.mChildWindows中移除 } mWinAnimator.destroyDeferredSurfaceLocked(); mWinAnimator.destroySurfaceLocked(); mSession.windowRemovedLocked(); try { mClient.asBinder().unlinkToDeath(mDeathRecipient, 0); } catch (RuntimeException e) { // Ignore if it has already been removed (usually because // we are doing this as part of processing a death note.) }}
这么看来WMS.removeWindowInnerLocked()像是做最后移除工作的地方,因为上述的所有保存WindowState的地方都会在这里进行移除,现在出现不一致的情况,说明有其它地方会对某些引用进行移除,问题集中在DisplayContent.mWindows和AppWindowToken.allAppWindows。
先看下AppWindowToken.allAppWindows,查了一番代码,找到AppWindowToken.removeAllWindows():
void removeAllWindows() { ... allAppWindows.clear(); // AppWindowToken.allAppWindows清空 windows.clear(); // WindowToken.windows清空}
调用的部分路径为:
WMS.removeAppToken()->AppWindowToken.removeAppFromTaskLocked()->AppWindowToken.removeAllWindows()
简单地说,我们知道目标进程已经挂掉,至少在死亡讣告中会调用到WMS.removeAppToken。我们说根据结果进行推导,这部分就解释的通。
那DisplayContent.mWindows这边怎么解释,问题出在WMS.rebuildAppWindowListLocked():
private void rebuildAppWindowListLocked(final DisplayContent displayContent) { final WindowList windows = displayContent.getWindowList(); int NW = windows.size(); int i; int lastBelow = -1; int numRemoved = 0; if (mRebuildTmp.length < NW) { mRebuildTmp = new WindowState[NW+10]; } // First remove all existing app windows. i=0; while (i < NW) { WindowState w = windows.get(i); if (w.mAppToken != null) { WindowState win = windows.remove(i); // 先从DisplayContent.mWindows移除,并可能在后面重新添加 win.mRebuilding = true; mRebuildTmp[numRemoved] = win; mWindowsChanged = true; if (DEBUG_WINDOW_MOVEMENT) Slog.v(TAG_WM, "Rebuild removing window: " + win); NW--; numRemoved++; continue; } else if (lastBelow == i-1) { if (w.mAttrs.type == TYPE_WALLPAPER) { lastBelow = i; } } i++; } // Keep whatever windows were below the app windows still below, // by skipping them. lastBelow++; i = lastBelow; // First add all of the exiting app tokens... these are no longer // in the main app list, but still have windows shown. We put them // in the back because now that the animation is over we no longer // will care about them. final ArrayList<TaskStack> stacks = displayContent.getStacks(); final int numStacks = stacks.size(); for (int stackNdx = 0; stackNdx < numStacks; ++stackNdx) { AppTokenList exitingAppTokens = stacks.get(stackNdx).mExitingAppTokens; int NT = exitingAppTokens.size(); for (int j = 0; j < NT; j++) { i = reAddAppWindowsLocked(displayContent, i, exitingAppTokens.get(j)); } } // And add in the still active app tokens in Z order. for (int stackNdx = 0; stackNdx < numStacks; ++stackNdx) { final ArrayList<Task> tasks = stacks.get(stackNdx).getTasks(); final int numTasks = tasks.size(); for (int taskNdx = 0; taskNdx < numTasks; ++taskNdx) { final AppTokenList tokens = tasks.get(taskNdx).mAppTokens; final int numTokens = tokens.size(); for (int tokenNdx = 0; tokenNdx < numTokens; ++tokenNdx) { final AppWindowToken wtoken = tokens.get(tokenNdx); if (wtoken.mIsExiting && !wtoken.waitingForReplacement()) { continue; } i = reAddAppWindowsLocked(displayContent, i, wtoken); } } } i -= lastBelow; if (i != numRemoved) { displayContent.layoutNeeded = true; Slog.w(TAG_WM, "On display=" + displayContent.getDisplayId() + " Rebuild removed " + numRemoved + " windows but added " + i + " rebuildAppWindowListLocked() " + " callers=" + Debug.getCallers(10)); for (i = 0; i < numRemoved; i++) { WindowState ws = mRebuildTmp[i]; if (ws.mRebuilding) { StringWriter sw = new StringWriter(); PrintWriter pw = new FastPrintWriter(sw, false, 1024); ws.dump(pw, "", true); pw.flush(); Slog.w(TAG_WM, "This window was lost: " + ws); Slog.w(TAG_WM, sw.toString()); ws.mWinAnimator.destroySurfaceLocked(); } } Slog.w(TAG_WM, "Current app token list:"); dumpAppTokensLocked(); Slog.w(TAG_WM, "Final window list:"); dumpWindowsLocked(); } Arrays.fill(mRebuildTmp, null);}
简单说下逻辑,就是先移除所有的应用窗口,并根据最新的AppWindowToken排列顺序来重新添加,而要重新添加的上,WindowToken.windows必须不为空,而根据上面的分析这里已经为空,那么对不起,移除完后已经加不上了,这从WindowState.mRebuilding为true可以证明。那么这里又解释通了,而且跟WindowToken.windows被清空有关。
那到底为什么没走到清理现场的WMS.removeWindowInnerLocked()?再回到死亡讣告,每个WindowState都会注册死亡讣告,并在窗口所在进程挂掉后调用WMS.removeWindowLocked(),这点是没有疑问的,并且会在后续调用WMS.removeWindowInnerLocked(),但是在这之前有可能提前返回,代码太多,只列出可能提前返回的部分,看注释我们来一一排除:
if (win.mHasSurface && okToDisplay()) { final AppWindowToken appToken = win.mAppToken; if (win.mWillReplaceWindow) { // mWillReplaceWindow为false // This window is going to be replaced. We need to keep it around until the new one // gets added, then we will get rid of this one. if (DEBUG_ADD_REMOVE) Slog.v(TAG_WM, "Preserving " + win + " until the new one is " + "added"); // TODO: We are overloading mAnimatingExit flag to prevent the window state from // been removed. We probably need another flag to indicate that window removal // should be deffered vs. overloading the flag that says we are playing an exit // animation. win.mAnimatingExit = true; win.mReplacingRemoveRequested = true; Binder.restoreCallingIdentity(origId); return; } // 唯一的可能就是进入到这个条件并return if (win.isAnimatingWithSavedSurface() && !appToken.allDrawnExcludingSaved) { // We started enter animation early with a saved surface, now the app asks to remove // this window. If we remove it now and the app is not yet drawn, we'll show a // flicker. Delay the removal now until it's really drawn. if (DEBUG_ADD_REMOVE) { Slog.d(TAG_WM, "removeWindowLocked: delay removal of " + win + " due to early animation"); } // Do not set mAnimatingExit to true here, it will cause the surface to be hidden // immediately after the enter animation is done. If the app is not yet drawn then // it will show up as a flicker. setupWindowForRemoveOnExit(win); Binder.restoreCallingIdentity(origId); return; } // If we are not currently running the exit animation, we need to see about starting one wasVisible = win.isWinVisibleLw(); if (keepVisibleDeadWindow) { // 这里肯定进不来 if (DEBUG_ADD_REMOVE) Slog.v(TAG_WM, "Not removing " + win + " because app died while it's visible"); win.mAppDied = true; win.setDisplayLayoutNeeded(); mWindowPlacerLocked.performSurfacePlacement(); // Set up a replacement input channel since the app is now dead. // We need to catch tapping on the dead window to restart the app. win.openInputChannel(null); mInputMonitor.updateInputWindowsLw(true /*force*/); Binder.restoreCallingIdentity(origId); return; } final WindowStateAnimator winAnimator = win.mWinAnimator; if (wasVisible) { final int transit = (!startingWindow) ? TRANSIT_EXIT : TRANSIT_PREVIEW_DONE; // Try starting an animation. if (winAnimator.applyAnimationLocked(transit, false)) { win.mAnimatingExit = true; } //TODO (multidisplay): Magnification is supported only for the default display. if (mAccessibilityController != null && win.getDisplayId() == Display.DEFAULT_DISPLAY) { mAccessibilityController.onWindowTransitionLocked(win, transit); } } final boolean isAnimating = winAnimator.isAnimationSet() && !winAnimator.isDummyAnimation(); final boolean lastWindowIsStartingWindow = startingWindow && appToken != null && appToken.allAppWindows.size() == 1; // We delay the removal of a window if it has a showing surface that can be used to run // exit animation and it is marked as exiting. // Also, If isn't the an animating starting window that is the last window in the app. // We allow the removal of the non-animating starting window now as there is no // additional window or animation that will trigger its removal. if (winAnimator.getShown() && win.mAnimatingExit && (!lastWindowIsStartingWindow || isAnimating)) { // mAnimatingExit为false // The exit animation is running or should run... wait for it! if (DEBUG_ADD_REMOVE) Slog.v(TAG_WM, "Not removing " + win + " due to exit animation "); setupWindowForRemoveOnExit(win); if (appToken != null) { appToken.updateReportedVisibilityLocked(); } Binder.restoreCallingIdentity(origId); return; }}
最后发现只有一处可能,看代码跟Surface的暂存有关,是不是想到了什么?对,上面讲过泄漏的layer走过这部分相关的代码,到这里会将WindowState.mRemoveOnExit置为true,若WindowState.mAnimatingExit同时为true,那么会在WindowStateAnimator.finishExit()中执行最后的移除操作,但是看到的信息是前者为true,后者为false,所以不会被移除。特别的,因为这时候目标进程挂掉了,没有后续的其它调用,状态就一直停留在这里,问题就此发生。给出结论:
目标应用曾经启动过并且退到后台,重新启动的过程中目标进程突然挂掉,并且此时父窗口和子窗口都没有重新完成绘制,即调用WMS.finishDrawingWindow,问题发生。
可以想象,实际上这种情况在日常使用中是非常难出现的,所以出现的概率极低。根据给出的结论进行代码调整使之能够达到复现条件,得到的结果是必现!!
现在回到之前埋下的一个疑问:
为什么SurfaceFlinger端看不到父窗口的Layer?
答案是Layer跟SurfaceControl对应,WindowState泄漏不代表SurfaceControl也泄漏,也就是说子窗口的SurfaceControl没有销毁而父窗口的销毁了。看下hprof中这两个窗口SurfaceControl相关的引用,情况如下:
- 父窗口的mSurfaceController和mPendingDestroySurface都已经为null,说明已经销毁
- 子窗口的mSurfaceController为null,mPendingDestroySurface不为null,说明被延迟销毁
实际上两者都调用了WindowStateAnimator.destroySurfaceLocked():
void destroySurfaceLocked() { ... if (mSurfaceDestroyDeferred) { // 子窗口mSurfaceDestroyDeferred为true if (mSurfaceController != null && mPendingDestroySurface != mSurfaceController) { if (mPendingDestroySurface != null) { if (SHOW_TRANSACTIONS || SHOW_SURFACE_ALLOC) { WindowManagerService.logSurface(mWin, "DESTROY PENDING", true); } mPendingDestroySurface.destroyInTransaction(); } mPendingDestroySurface = mSurfaceController; } } else { if (SHOW_TRANSACTIONS || SHOW_SURFACE_ALLOC) { WindowManagerService.logSurface(mWin, "DESTROY", true); } destroySurface(); } ...}
这下清楚了,子窗口的SurfaceControl因为WindowState.mSurfaceDestroyDeferred为true被延迟销毁;为true是因为SurfaceView进行relayout时带有RELAYOUT_DEFER_SURFACE_DESTROY的flag,在正常情况下稍后SurfaceView会调用WMS.performDeferredDestroyWindow()销毁mPendingDestroySurface,但是在这之前进程挂了,那么就没有了这个调用。
有兴趣的可以看下SurfaceView.updateWindow()函数,正常情况下会有如下调用序列:
WMS.relayoutWindow()->WMS.finishDrawingWindow()->WMS.performDeferredDestroyWindow()
如果在WMS.finishDrawingWindow()之前进程挂了,就跟我们的结论完全吻合,mPendingDestroySurface就会一直得不到销毁。父窗口没有泄漏SurfaceControl就是因为它是被立即销毁。
原因已查明,那要怎么修复?实际上,如果WMS.removeWindowInnerLocked()有被调用到,就不会有任何泄漏,做为框架要做到任何时候都能保持状态正常,而不管应用是不是在某个特殊场景挂掉了!
那么问题就回到陷入上述场景时怎么办,代码如下:
if (win.isAnimatingWithSavedSurface() && !appToken.allDrawnExcludingSaved) { // We started enter animation early with a saved surface, now the app asks to remove // this window. If we remove it now and the app is not yet drawn, we'll show a // flicker. Delay the removal now until it's really drawn. if (DEBUG_ADD_REMOVE) { Slog.d(TAG_WM, "removeWindowLocked: delay removal of " + win + " due to early animation"); } // Do not set mAnimatingExit to true here, it will cause the surface to be hidden // immediately after the enter animation is done. If the app is not yet drawn then // it will show up as a flicker. setupWindowForRemoveOnExit(win); Binder.restoreCallingIdentity(origId);}
意思是如果正在使用一个暂存的Surface执行动画,并且应用还没完成绘制,就延迟移除窗口,设置mRemoveOnExit为true,还特意交代不能设置mAnimatingExit为true,因为那会使得动画结束后Surface被马上隐藏,美其名曰:这一切都是为了不闪屏!! mAnimatingExit是可以一直不为true的好吧。
解决方案
一种改法是同时将mAnimatingExit置为true,但是很有可能WindowStateAnimator.finishExit()根本没机会调用到,还是于事无补。
个人最后的改法是注释掉这部分代码,既然现在要销毁窗口,为何还等到绘制完成并且动画结束?后续一定还有机会再进行移除吗?这个所谓的优化真的有意义吗?改完后,不再复现。
- Android N中SurfaceView泄露的问题分析
- Android中内存泄露的原因分析:
- Android中常见的内存泄露分析
- Android内存泄露问题分析
- android中SurfaceView的使用
- Android中引起内存泄露的原因分析
- Android中使用Handler造成内存泄露的分析总结
- Android中使用中Handler的内存泄露问题
- Android视图SurfaceView的实现原理分析
- Android视图SurfaceView的实现原理分析
- Android视图SurfaceView的实现原理分析
- Android视图SurfaceView的实现原理分析
- Android视图SurfaceView的实现原理分析
- Android视图SurfaceView的实现原理分析
- Android视图SurfaceView的实现原理分析
- Android视图SurfaceView的实现原理分析
- Android视图SurfaceView的实现原理分析
- Android视图SurfaceView的实现原理分析
- mysql 视图
- Android Studio(windows快捷键)转载
- annotation注解服务层对象和持久层对象
- malloc函数与free函数
- Java基础-参数传递
- Android N中SurfaceView泄露的问题分析
- 《剑指offer》的青蛙跳级算法
- Spark性能调优(三)
- POJ1852-Ants
- java算法--判断质数
- 关于FTP的两种连接模式
- RabbitMQ基础
- Android常用控件二
- 2017苹果全球开发者大会直播地址