大数据和Android

来源：互联网发布：加强网络文化建设编辑：程序博客网时间：2024/04/29 14:52

关于weka

乌云，静态
Weka 介绍
Weka 中文简单示例

第一个问题

我不想知道大数据和机器学习都有那些方法。最快捷的路径是：

大数据和机器学习在这方面都有什么应用？

关键字：malware detection

两篇比较重要的文档：
分别介绍了两个android恶意监测软件DroidMat以及ANDRUBIS。

1.DroidMat: Android Malware Detection through Manifest and API Calls Tracing

2.ANDRUBIS - 1,000,000 Apps Later:
A View on Current Android Malware Behaviors

=========================

DroidMat: Android Malware Detection through Manifest and API Calls Tracing

地址是：
http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=6298136&tag=1#citedby-section

主要涉及的是静态分析

=========================

Android anomaly detection system using machine learning classification（比较浅显易懂的一篇）

地址：http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=7352512

网络流量、电池电量、电池温度这三个角度来测试app。
通过机器学习来进行分类。

使用了meka。

最后还搭建了一个服务器，传送给客户端结果。

另外，这篇文章里还提到了真阳性（tpr）以及真阴性（fpr) 的概念。
这里用误诊率给出了很好的解释：
https://zh.wikipedia.org/wiki/ROC%E6%9B%B2%E7%BA%BF

=========================

Crowdroid: Behavior-Based Malware Detection System for Android

https://www.ida.liu.se/labs/rtslab/publications/2011/spsm11-burguera.pdf

名气比andrubis小一些的一个软件。

使用的是strace来获取系统调用记录。

然后使用了K-means算法来解决这个问题。

关于K-means 算法，其实还是比较形象易懂的。

K-means算法

============

另外：关于K-means算法，weka是可以实现的。

============

Detection of Android Malicious Apps Based on the Sensitive Behaviors(sbfv 支持向量机)

地址：http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=7011341

============

一种针对Android平台恶意代码的检测方法及系统实现

静态分析的弱点：

静态分析只能检测出特征库中已有的恶意代码样本，无法检测未知的恶意代码，同时静态分析很难应对代码混淆、反射、加密等情况，针对静态分析的缺点，本文设计实现了动态分析。动态分析以Android模拟器为运行环境，将APK文件安装到模拟器中并运行，同时监控APK文件运行时的行为，并与恶意行为模式进行匹配，判断是否为恶意代码。

典型的恶意行为：

（1）关键路径和数据访问：Android系统基于Linux内核，同样存在一些敏感路径，比如系统可执行程序目录／system／xbin，恶意代码可以调用该目录下的系统程序执行命令。以Root漏洞利用恶意代码GingerMaster为例，其在恶意行为执行过程中会调用chmod、mount等程序来执行更改文件权限、挂载文件等命令；短信、通讯录等隐私信息存储于特定的数据库，如短信数据库为mmsms．db，一些获取个人隐私信息的恶意代码会对该数据库进行访问。
关键路径和数据信息如表3所示。

这里写图片描述

(2)恶意域名访问：隐私窃取类的恶意代码会收
集用户的个人信息上传至服务器；僵尸木马类的恶
意代码会访问C&C服务器，获得控制命令。例如
Geinimi会从WWW．widifu．corn：8080获取控制命
令，因此可以收集这类恶意域名，并设置成黑名单，
作为恶意行为判别的一个衡量因子。

(3)恶意吸费：Lookout关于2012年手机安全
状况的报告指出，在Android恶意代码分类中，恶意
吸费类在2012年第2季度达到了62％的比例。这
类恶意代码在运行过程中会发送吸费短信、拨打吸
费电话，对用户的话费造成损失。我们在动态分析
过程中记录程序的短信发送、电话拨打行为，如果号
码不在移动运营商之列，如10086、10000等，则认为
具有恶意吸费行为。

(4)权限绕过：如果程序在AndroidManifest．xml
文件中没有声明某些权限，而在实际运行过程中又
执行了需要该权限的行为，则称之为权限绕过。这
种情况一般存在于获取了Root权限的恶意代码，
恶意代码在获取Root权限后可以在不需要其他任
何权限的情况下执行敏感行为。

恶意代码特征中使用最频繁的技术：

这里写图片描述

恶意代码最频繁使用的权限：

Sensitive Behaviors的定义

Sensitive behaviors are related to specific device resources or to critical operations that possibly be exploited to harm the user privacy, the user fiscal, or the control of the device itself. For example, malware authors invoke many methods within TelephonyManager/SmsManger classes.sendTextMessage() is very frequently used by malwares authors to send SMS messages to premium rate numbers without the user’s consent and thus incur financial losses. So the invoking the API call (android.telephony.SmsManager.sendTextMessage()) is a sensitive behavior.

The malware authors use dynamic library loading techniques like Java refection and native code execution to evade security analysis. As we know, dynamic loading is a mechanism in which a program can load a library (or dynamic payloads) into memory at runtime, retrieve the addresses of functions contained in the dynamic library, execute these functions and unload the library from the memory upon completion. It is a popular and effective way to protect Android applications (or malware). Apparently, the dynamic loading is also sensitive behavior.

Android is a derivative based on a modified Linux 2.6 with a Java programming interface. If any Android application needs to request services (e.g., accessing the flash drive), it has to rely on system calls provided by the kernel. For example, if a mobile app uses the open() system call to open the contact list file(/data/data/com.android.providers.contacts/databases/contacts2.db), and then uses the read() system call to get the contact list. The system calls (sys_open() and sys_read()) are sensitive behaviors, too.

前言：

The number of malicious applications (apps) targeting the Android system has exploded in recent years. The evolution of malware makes it difficult to detect for static analysis tools. Various behavior-based malware detection techniques to mitigate this problem have been proposed. The drawbacks of the existing approaches are: the behavior features extracted from a single source lead to the low detection accuracy and the detection process is too complex. Especially it is unsuitable for smart phones with limited computing power. In this paper, we extract sensitive behavior features from three sources: API calls, native code dynamic execution, and system calls. We propose a sensitive behavior feature vector for representation multi-source behavior features uniformly. Our sensitive behavior representation is able to automatically describe the low-level OS-specific behaviors and high-level application-specific behaviors of an Android malware. Based on the unified behavior feature representation, we provide a light weight decision function to differentiate a given application benign or malicious. We tested the effectiveness of our approach against real malware and the results of our experiments show that its detection accuracy up to 96% with acceptable performance overhead. For a given threshold t (t=9), we can detect the advanced malware family effectively.

===========

我们开发出一种轻量级客户端Low，其主要功能如下。
1.Low在安装之后提供给用户“手机助手”的基础功能，用户可以查看本机上安装的所有非系统app，并且有评论、卸载等功能。
2.

每一次app测试的行为相关信息会保存在独立的文件中，然后发送给分析服务器。
分析服务器会结合此app之前已有的行为信息，对这一次的检测结果进行分析。并且返回一个详细的行为分析报告。

这种架构的优势是：提供了非常轻便的解决方案，我们有充足的机器资源来完成数据收集的工作。收集到的数据集越大，分析结果就会越有信服力、越准确。

这里写图片描述

0 0