android log丢失研究

来源:互联网 发布:mac webstorm使用 编辑:程序博客网 时间:2024/05/22 10:45

之前我们分析过关于android log机制,在这里我们再详细说下,log丢失的原理。

一、统计log

logd监听了logdw的socket来保存从log打印函数通过logdw socket传过来的log,最后会调用LogBuffer::log函数,在log函数最后会调用如下两个函数。

    stats.add(elem);    maybePrune(log_id);

这里的log_id就是radio,main,event等。

我们先来看LogStatistics::add函数

void LogStatistics::add(LogBufferElement *e) {    log_id_t log_id = e->getLogId();    unsigned short size = e->getMsgLen();//获取消息长度    mSizes[log_id] += size;//对每个log_id做长度累计    ++mElements[log_id];//对每个log_id做消息个数累计    mSizesTotal[log_id] += size;    ++mElementsTotal[log_id];    if (log_id == LOG_ID_KERNEL) {        return;    }    uidTable[log_id].add(e->getUid(), e);    if (!enable) {        return;    }    pidTable.add(e->getPid(), e);    tidTable.add(e->getTid(), e);    uint32_t tag = e->getTag();    if (tag) {        tagTable.add(tag, e);    }}

这个函数,对每个log_id的消息长度做统计,消息数量也做了统计。


二、删除log判定

我们再来看下这个maybePrune函数

// Prune at most 10% of the log entries or 256, whichever is less.//// mLogElementsLock must be held when this function is called.void LogBuffer::maybePrune(log_id_t id) {    size_t sizes = stats.sizes(id);//每个log_id统计的所有消息长度    unsigned long maxSize = log_buffer_size(id);//每个log_id缓存的最大值,之前在init函数里面初始化的    if (sizes > maxSize) {        size_t sizeOver = sizes - ((maxSize * 9) / 10);//超过90%size的那部分        size_t elements = stats.elements(id);        size_t minElements = elements / 10;//10%的elements        unsigned long pruneRows = elements * sizeOver / sizes;//超过90%size的elements        if (pruneRows <= minElements) {            pruneRows = minElements;        }        if (pruneRows > 256) {            pruneRows = 256;        }        prune(id, pruneRows);    }}

在之前的博客中我们分析过了,每个log_id的size是如何而来的。可以通过属性获取。

这里保存elements的是mLogElements,只是保存的LogBufferElement 的指针而已,实际不会占用多大的内存。

typedef std::list<LogBufferElement *> LogBufferElementCollection;class LogBuffer {    LogBufferElementCollection mLogElements;

而且只有每个element被调用erase,才会被真正销毁内存。

LogBufferElementCollection::iterator LogBuffer::erase(        LogBufferElementCollection::iterator it, bool engageStats) {    LogBufferElement *e = *it;    log_id_t id = e->getLogId();    LogBufferIteratorMap::iterator f = mLastWorstUid[id].find(e->getUid());    if ((f != mLastWorstUid[id].end()) && (it == f->second)) {        mLastWorstUid[id].erase(f);    }    it = mLogElements.erase(it);    if (engageStats) {        stats.subtract(e);    } else {        stats.erase(e);    }    delete e;//销毁内存    return it;}

所以每个log_id设定的值,不是一个缓存,而是一个警戒值。超过这个值,就要对特定log删除。


三、prune函数

prune函数主要就是删除log,在删除log的时候也做了白名单和黑名单的机制。

3.1 白名单 & 黑名单

这里我们先来看看LogBuffer的initPrune函数,这是用来设定白名单和黑名单的。

int initPrune(char *cp) { return mPrune.init(cp); }

至于init这个函数我们就不看了,主要是解析字符串,把uid,pid保存下来。


那么又在哪里设定白名单和黑名单呢?我们可以通过如下命令,最后就调用了initPrune函数来设定白黑名单了。

int CommandListener::SetPruneListCmd::runCommand(SocketClient *cli,                                         int argc, char **argv) {    setname();    if (!clientHasLogCredentials(cli)) {        cli->sendMsg("Permission Denied");        return 0;    }    char *cp = NULL;    for (int i = 1; i < argc; ++i) {        char *p = cp;        if (p) {            cp = NULL;            asprintf(&cp, "%s %s", p, argv[i]);            free(p);        } else {            asprintf(&cp, "%s", argv[i]);        }    }    int ret = mBuf.initPrune(cp);    free(cp);    if (ret) {        cli->sendMsg("Invalid");        return 0;    }    cli->sendMsg("success");    return 0;}

而每个白名单和黑名单的匹配就是看uid和pid的。这块就不细看了。


3.2 黑名单处理 & log最多的uid处理

下面我们就来看下prune这个函数的黑名单部分处理:

void LogBuffer::prune(log_id_t id, unsigned long pruneRows, uid_t caller_uid) {  ......    // prune by worst offender by uid    bool hasBlacklist = mPrune.naughty();//这块是黑名单部分    while (pruneRows > 0) {        // recalculate the worst offender on every batched pass        uid_t worst = (uid_t) -1;        size_t worst_sizes = 0;        size_t second_worst_sizes = 0;        if (worstUidEnabledForLogid(id) && mPrune.worstUidEnabled()) {            std::unique_ptr<const UidEntry *[]> sorted = stats.sort(2, id);//得到log最多的2个uid            if (sorted.get()) {                if (sorted[0] && sorted[1]) {                    worst_sizes = sorted[0]->getSizes();                    // Calculate threshold as 12.5% of available storage                    size_t threshold = log_buffer_size(id) / 8;                    if ((worst_sizes > threshold)//大于阈值                        // Allow time horizon to extend roughly tenfold, assume                        // average entry length is 100 characters.                            && (worst_sizes > (10 * sorted[0]->getDropped()))) {                        worst = sorted[0]->getKey();//最多lod uid的size                        second_worst_sizes = sorted[1]->getSizes();                        if (second_worst_sizes < threshold) {                            second_worst_sizes = threshold;                        }                    }                }            }        }        // skip if we have neither worst nor naughty filters        if ((worst == (uid_t) -1) && !hasBlacklist) {//如果没有这样的uid,也没有黑名单,直接跳过            break;        }        bool kick = false;        bool leading = true;        it = mLogElements.begin();        // Perform at least one mandatory garbage collection cycle in following        // - clear leading chatty tags        // - merge chatty tags        // - check age-out of preserved logs        bool gc = pruneRows <= 1;        if (!gc && (worst != (uid_t) -1)) {            LogBufferIteratorMap::iterator f = mLastWorstUid[id].find(worst);//查找这uid            if ((f != mLastWorstUid[id].end())                    && (f->second != mLogElements.end())) {                leading = false;//找到了,leading为false                it = f->second;            }        }        static const timespec too_old = {            EXPIRE_HOUR_THRESHOLD * 60 * 60, 0        };        LogBufferElementCollection::iterator lastt;        lastt = mLogElements.end();        --lastt;        LogBufferElementLast last;        while (it != mLogElements.end()) {            LogBufferElement *e = *it;            if (oldest && (oldest->mStart <= e->getSequence())) {                break;            }            if (e->getLogId() != id) {//log_id不对 continue                ++it;                continue;            }            unsigned short dropped = e->getDropped();//是否被free过mMsg            // remove any leading drops            if (leading && dropped) {                it = erase(it);                continue;            }            // merge any drops            if (dropped && last.merge(e, dropped)) {// 合并之前删除的element                it = erase(it, false);                continue;            }            if (hasBlacklist && mPrune.naughty(e)) {//如果满足黑名单,删除这条element                last.clear(e);                it = erase(it);                if (dropped) {//如果当前是dropoed,直接continue                    continue;                }                pruneRows--;//删除log的计数                if (pruneRows == 0) {                    break;                }                if (e->getUid() == worst) {                    kick = true;                    if (worst_sizes < second_worst_sizes) {//最差值,小于第二个直接跳过                        break;                    }                    worst_sizes -= e->getMsgLen();//最差的值累减                }                continue;            }            if ((e->getRealTime() < ((*lastt)->getRealTime() - too_old))                    || (e->getRealTime() > (*lastt)->getRealTime())) {                break;            }            // unmerged drop message            if (dropped) {                last.add(e);                if ((!gc && (e->getUid() == worst))                        || (mLastWorstUid[id].find(e->getUid())                            == mLastWorstUid[id].end())) {                    mLastWorstUid[id][e->getUid()] = it;                }                ++it;                continue;            }            if (e->getUid() != worst) {                leading = false;                last.clear(e);                ++it;                continue;            }            pruneRows--;            if (pruneRows == 0) {                break;            }            kick = true;            unsigned short len = e->getMsgLen();            // do not create any leading drops            if (leading) {                it = erase(it);            } else {                stats.drop(e);                e->setDropped(1);//setDropped函数,这里就是普通的是worst这个uid的log,但不是黑名单中。就把它的消息清空                if (last.merge(e, 1)) {//合并                    it = erase(it, false);//合并之后,删除现有log                } else {                    last.add(e);                    if (!gc || (mLastWorstUid[id].find(worst)                                == mLastWorstUid[id].end())) {                        mLastWorstUid[id][worst] = it;                    }                    ++it;                }            }            if (worst_sizes < second_worst_sizes) {//最差值小于第二差就跳过                break;            }            worst_sizes -= len;        }        last.clear();        if (!kick || !mPrune.worstUidEnabled()) {            break; // the following loop will ask bad clients to skip/drop        }    }.....

上面就是对黑名单以及log最多的那个uid的处理,我们先来看看每个LogBufferElement的setDropped函数

    unsigned short setDropped(unsigned short value) {        if (mMsg) {            free(mMsg);//消息清空            mMsg = NULL;        }        return mDropped = value;//第一次为1    }

这个函数直接把消息清空了,然后把mDropped设为1,我们再来看看last.merge(e, 1)函数

class LogBufferElementLast {    typedef std::unordered_map<uint64_t, LogBufferElement *> LogBufferElementMap;    LogBufferElementMap map;public:    bool merge(LogBufferElement *e, unsigned short dropped) {        LogBufferElementKey key(e->getUid(), e->getPid(), e->getTid());        LogBufferElementMap::iterator it = map.find(key.getKey());        if (it != map.end()) {            LogBufferElement *l = it->second;            unsigned short d = l->getDropped();            if ((dropped + d) > USHRT_MAX) {                map.erase(it);            } else {                l->setDropped(dropped + d);//将两个element合并                return true;            }        }        return false;    }
通过merge,element的mDropped可以不为1了。


3.3 白名单处理

下面我们再看下白名单处理:

    bool whitelist = false;    bool hasWhitelist = mPrune.nice();    it = mLogElements.begin();    while((pruneRows > 0) && (it != mLogElements.end())) {        LogBufferElement *e = *it;        if (e->getLogId() != id) {            it++;            continue;        }        if (oldest && (oldest->mStart <= e->getSequence())) {            if (whitelist) {                break;            }            if (stats.sizes(id) > (2 * log_buffer_size(id))) {                // kick a misbehaving log reader client off the island                oldest->release_Locked();            } else {                oldest->triggerSkip_Locked(id, pruneRows);            }            break;        }        if (hasWhitelist && !e->getDropped() && mPrune.nice(e)) { // WhiteListed            whitelist = true;            it++;            continue;        }        it = erase(it);        pruneRows--;    }

白名单的处理比较简单,只要是白名单的不删除,其他都删除,直到满足条件。


四、logcat取log

之前的博客分析过当logcat进程到logd中取log时,会最终调用LogBufferElement::flushTo函数

uint64_t LogBufferElement::flushTo(SocketClient *reader, LogBuffer *parent) {    struct logger_entry_v3 entry;    memset(&entry, 0, sizeof(struct logger_entry_v3));    entry.hdr_size = sizeof(struct logger_entry_v3);    entry.lid = mLogId;    entry.pid = mPid;    entry.tid = mTid;    entry.sec = mRealTime.tv_sec;    entry.nsec = mRealTime.tv_nsec;    struct iovec iovec[2];    iovec[0].iov_base = &entry;    iovec[0].iov_len = sizeof(struct logger_entry_v3);    char *buffer = NULL;    if (!mMsg) {//如果mMsg为null了,就是之前prune里面setPropped函数把mMsg设为null        entry.len = populateDroppedMessage(buffer, parent);        if (!entry.len) {            return mSequence;        }        iovec[1].iov_base = buffer;    } else {        entry.len = mMsgLen;        iovec[1].iov_base = mMsg;    }    iovec[1].iov_len = entry.len;    uint64_t retval = reader->sendDatav(iovec, 2) ? FLUSH_ERROR : mSequence;//发送给调用者    if (buffer) {        free(buffer);    }    return retval;}

调用populateDroppedMessage函数最终会把消息设为类似:

chatty  : uid=1000(system) RenderThread expire 3 lines 



五、总结

最后总结,在logd中如果有丢失log,可以设置log_id的缓冲设置再大写。如果是调试的话可以增加调试的白名单。而且在logd中丢失log肯定会有类似chatty这样的log,那就是删除了log最多的那个uid的log。而且会合并。


我们可以通过设置系统属性persist.logd.size来设置每个log id的最大缓存值(在开发者选项中也有这个设置,开发者选项中设置就不用重启设备了),或者persist.logd.size.radio设置每个id的最大缓存值。

步骤:


  1. 将手机连上电脑并且进入root

  2. setproppersist.logd.size.radio 1024k

  3. reboot 重启

另外可以用getprop | grep logd查看设置的属性是否生效

logcat -g 可以查看每个id 的缓存大小


当然这是通过属性的方法设置,我们还可以通过logcat的命令,logcat -G 10m是设置所有的id的大小,logcat -b radio -G 10m是设置radio的log的缓存大小

在logcat中有如下代码,处理设置缓存大小

如果logd中没有chatty这样的log,但是又有log丢失,那么就要怀疑在写log时,logdw的socket就有丢失。因为我们看下logdw是dgram类型的,这种socket是一种不可靠的报文传递保证效率但会有丢失。所有这样情况我们可以看把socket改成stream试试,看看是否有效果?

service logd /system/bin/logd    class core    socket logd stream 0666 logd logd    socket logdr seqpacket 0666 logd logd    socket logdw dgram 0222 logd logd    group root system     writepid /dev/cpuset/system-background/tasks

但是试了以后好像socket在连接的时候就有问题。


当然还有一种方式,在android4.4的时候是不使用logd的机制的。使用的是往内存文件系统里面写log,写入dev/log/main  dev/log/radio等log。我们可以改回之前的模式。需要打几个补丁,这样就不会有丢失log的情况(不会有类似chatty的log)。

kernel的话我们可以看看dev/log目录下是否有main raidio等节点。如果有说明kernel是支持的,不需要打补丁了。只需改动system/core/liblog目录  以及修改板卡目录boardconfig.mk 还有selinux的权限。具体如下:

1. system/core/liblog补丁

地址:https://source.codeaurora.org/quic/la/platform/system/core/commit/?id=1b078a205c4ba34e2eaf5c5b248967ea4c08fb2c

From 1b078a205c4ba34e2eaf5c5b248967ea4c08fb2c Mon Sep 17 00:00:00 2001From: Maria Yu <aiquny@codeaurora.org>Date: Wed, 23 Mar 2016 15:35:34 +0800Subject: liblog: Fix compile error for using legacy kernel logger driverAdd log_event_write.c into liblog_sources file when using legacykernel logger driver. Add socket_local_client API in log_read_kern.Fix the BIONIC define missing in logd_write_kern.CRs-Fixed: 949708Change-Id: If7a85e1d19fdd22581b6eb87096b39c2a700297e--- liblog/Android.mk        |   2 +- liblog/log_read_kern.c   | 169 +++++++++++++++++++++++++++++++++++++++++++++-- liblog/logd_write_kern.c |   6 +- 3 files changed, 170 insertions(+), 7 deletions(-)diff --git a/liblog/Android.mk b/liblog/Android.mkindex 115dd79..1f113c5 100644--- a/liblog/Android.mk+++ b/liblog/Android.mk@@ -27,7 +27,7 @@ ifneq ($(TARGET_USES_LOGD),false) liblog_sources := logd_write.c log_event_write.c else-liblog_sources := logd_write_kern.c+liblog_sources := logd_write_kern.c log_event_write.c endif  # some files must not be compiled when building against Mingwdiff --git a/liblog/log_read_kern.c b/liblog/log_read_kern.cindex 69b405c..b1b9a26 100644--- a/liblog/log_read_kern.c+++ b/liblog/log_read_kern.c@@ -1,5 +1,5 @@ /*-** Copyright 2013-2014, The Android Open Source Project+** Copyright 2013-2014,2016 The Android Open Source Project ** ** Licensed under the Apache License, Version 2.0 (the "License"); ** you may not use this file except in compliance with the License.@@ -27,6 +27,7 @@ #include <cutils/list.h> #include <log/log.h> #include <log/logger.h>+#include <cutils/sockets.h>  #define __LOGGERIO     0xAE @@ -37,16 +38,174 @@ #define LOGGER_GET_VERSION         _IO(__LOGGERIO, 5) /* abi version */ #define LOGGER_SET_VERSION         _IO(__LOGGERIO, 6) /* abi version */ -typedef char bool;-#define false (const bool)0-#define true (const bool)1- #define LOG_FILE_DIR "/dev/log/"  /* timeout in milliseconds */ #define LOG_TIMEOUT_FLUSH 5 #define LOG_TIMEOUT_NEVER -1 +/* branchless on many architectures. */+#define min(x,y) ((y) ^ (((x) ^ (y)) & -((x) < (y))))++#if (defined(USE_MINGW) || defined(HAVE_WINSOCK))+#define WEAK static+#else+#define WEAK __attribute__((weak))+#endif+#ifndef __unused+#define __unused __attribute__((unused))+#endif++/* Private copy of ../libcutils/socket_local_client.c prevent library loops */++#ifdef HAVE_WINSOCK++int WEAK socket_local_client(const char *name, int namespaceId, int type)+{+    errno = ENOSYS;+    return -ENOSYS;+}++#else /* !HAVE_WINSOCK */++#include <sys/socket.h>+#include <sys/un.h>+#include <sys/select.h>+#include <sys/types.h>++/* Private copy of ../libcutils/socket_local.h prevent library loops */+#define FILESYSTEM_SOCKET_PREFIX "/tmp/"+#define ANDROID_RESERVED_SOCKET_PREFIX "/dev/socket/"+/* End of ../libcutils/socket_local.h */++#define LISTEN_BACKLOG 4++/* Documented in header file. */+int WEAK socket_make_sockaddr_un(const char *name, int namespaceId,+                                 struct sockaddr_un *p_addr, socklen_t *alen)+{+    memset (p_addr, 0, sizeof (*p_addr));+    size_t namelen;++    switch (namespaceId) {+    case ANDROID_SOCKET_NAMESPACE_ABSTRACT:+#if defined(__linux__)+        namelen  = strlen(name);++        /* Test with length +1 for the *initial* '\0'. */+        if ((namelen + 1) > sizeof(p_addr->sun_path)) {+            goto error;+        }++        /*+         * Note: The path in this case is *not* supposed to be+         * '\0'-terminated. ("man 7 unix" for the gory details.)+         */++        p_addr->sun_path[0] = 0;+        memcpy(p_addr->sun_path + 1, name, namelen);+#else+        /* this OS doesn't have the Linux abstract namespace */++        namelen = strlen(name) + strlen(FILESYSTEM_SOCKET_PREFIX);+        /* unix_path_max appears to be missing on linux */+        if (namelen > sizeof(*p_addr)+                - offsetof(struct sockaddr_un, sun_path) - 1) {+            goto error;+        }++        strcpy(p_addr->sun_path, FILESYSTEM_SOCKET_PREFIX);+        strcat(p_addr->sun_path, name);+#endif+        break;++    case ANDROID_SOCKET_NAMESPACE_RESERVED:+        namelen = strlen(name) + strlen(ANDROID_RESERVED_SOCKET_PREFIX);+        /* unix_path_max appears to be missing on linux */+        if (namelen > sizeof(*p_addr)+                - offsetof(struct sockaddr_un, sun_path) - 1) {+            goto error;+        }++        strcpy(p_addr->sun_path, ANDROID_RESERVED_SOCKET_PREFIX);+        strcat(p_addr->sun_path, name);+        break;++    case ANDROID_SOCKET_NAMESPACE_FILESYSTEM:+        namelen = strlen(name);+        /* unix_path_max appears to be missing on linux */+        if (namelen > sizeof(*p_addr)+                - offsetof(struct sockaddr_un, sun_path) - 1) {+            goto error;+        }++        strcpy(p_addr->sun_path, name);+        break;++    default:+        /* invalid namespace id */+        return -1;+    }++    p_addr->sun_family = AF_LOCAL;+    *alen = namelen + offsetof(struct sockaddr_un, sun_path) + 1;+    return 0;+error:+    return -1;+}++/**+ * connect to peer named "name" on fd+ * returns same fd or -1 on error.+ * fd is not closed on error. that's your job.+ *+ * Used by AndroidSocketImpl+ */+int WEAK socket_local_client_connect(int fd, const char *name, int namespaceId,+                                     int type __unused)+{+    struct sockaddr_un addr;+    socklen_t alen;+    int err;++    err = socket_make_sockaddr_un(name, namespaceId, &addr, &alen);++    if (err < 0) {+        goto error;+    }++    if(connect(fd, (struct sockaddr *) &addr, alen) < 0) {+        goto error;+    }++    return fd;++error:+    return -1;+}++/**+ * connect to peer named "name"+ * returns fd or -1 on error+ */+int WEAK socket_local_client(const char *name, int namespaceId, int type)+{+    int s;++    s = socket(AF_LOCAL, type, 0);+    if(s < 0) return -1;++    if ( 0 > socket_local_client_connect(s, name, namespaceId, type)) {+        close(s);+        return -1;+    }++    return s;+}++#endif /* !HAVE_WINSOCK */+/* End of ../libcutils/socket_local_client.c */+ #define logger_for_each(logger, logger_list) \     for (logger = node_to_item((logger_list)->node.next, struct logger, node); \          logger != node_to_item(&(logger_list)->node, struct logger, node); \diff --git a/liblog/logd_write_kern.c b/liblog/logd_write_kern.cindex 8742b34..bddfbdd 100644--- a/liblog/logd_write_kern.c+++ b/liblog/logd_write_kern.c@@ -1,5 +1,5 @@ /*- * Copyright (C) 2007-2014 The Android Open Source Project+ * Copyright (C) 2007-2014,2016 The Android Open Source Project  *  * Licensed under the Apache License, Version 2.0 (the "License");  * you may not use this file except in compliance with the License.@@ -26,7 +26,9 @@ #include <time.h> #include <unistd.h> +#ifdef __BIONIC__ #include <android/set_abort_message.h>+#endif  #include <log/log.h> #include <log/logd.h>@@ -167,9 +169,11 @@ int __android_log_buf_write(int bufID, int prio, const char *tag, const char *ms             tag = tmp_tag;     } +#ifdef __BIONIC__     if (prio == ANDROID_LOG_FATAL) {         android_set_abort_message(msg);     }+#endif      vec[0].iov_base   = (unsigned char *) &prio;     vec[0].iov_len    = 1;-- cgit v1.1


2. selinux权限

加上domain对log_device的读写权限

diff --git a/domain.te b/domain.teindex 0f6c6da..4e0f541 100644--- a/domain.te+++ b/domain.te@@ -88,6 +88,8 @@ allow domain null_device:chr_file rw_file_perms;allow domain zero_device:chr_file rw_file_perms;allow domain ashmem_device:chr_file rw_file_perms;allow domain binder_device:chr_file rw_file_perms;+allow domain log_device:dir search;+allow domain log_device:chr_file rw_file_perms;allow domain ptmx_device:chr_file rw_file_perms;allow domain alarm_device:chr_file r_file_perms;allow domain urandom_device:chr_file rw_file_perms;


3. 去除logd的编译

只要在所编译版本的BoardConfig.mk中加入TARGET_USES_LOGD := false

kernel部分如果没有合入补丁的话:
kernel补丁:
https://us.codeaurora.org/cgit/quic/la/kernel/msm-3.18/commit/?id=
5ec14761ae4f9e0c65519a6d954da0a08f59fa80
https://us.codeaurora.org/cgit/quic/la/kernel/msm-3.18/commit/?id=
1813ebe46a4d61423be3c121484f298abdbaf86d

kernel的kconfig:
在kernel的kConfig文件中加上CONFIG_ANDROID_LOGGER=y


1 0
原创粉丝点击