Study of the Symbolic Exection

来源：互联网发布：mac 普通用户管理员编辑：程序博客网时间：2024/05/03 11:43

With the development of the computer science, software security captures more people's attention recently. To find the vulnerability and the malicious code, static analysis is an efficient method where program will never be really executed. According to the analysis of current states and its tendency, static analysis can anticipate the executive condition in the future. Static analysis technology can be classified into several categories, including model chacking, data flow analysis, abstract explain, symbolic exection and so on. What I want to emphasize is that the symbolic exection is an efficent way.

In the procedure of the symbolic exection, we use symbolic elements to represent the variable quantities, then simulation program is exected to do the real analysis, and finally, we can get all the semantic information from the analysis results. There are two categories in the symbolic exection, process analysis and interprocedural analysis. The prior one just focus on the single block of code such as a function. And in this process, it would be sufficient to think about invocation information and environment condition in the entry point. However, as for the interprocedural analysis, we should also take the calling information between different blocks(functions) into consideration. Those types of symbolic exection are isolated from each other, but still highly interdependent.

To accomplish process analysis, it is necessary to build control flow graph(CFG) consisting of points and edges, where each point represents a basic block and edges are on behalf of the jump between blocks. There is still one thing should be concerned, basic blocks are a piece of codes without jump. Then, the simulation can be started from the entry point. A constraint solver is intruduced to determine which branch to executed when we meet the branch node. After all the path have been successfully visited according to some traversal policy, you'll come up with the right results about semantic information. Supposing you want to analyse the security property of a system, some security constraint should be added. For example, you should add some constraint for the buffer to restrict the space a program can access when you analyse the buffer overflow.

When carring out interprocedural analysis, you should build calling graph(CF) additionally, where the points represent functions and the edge represents the calling information. To accomplish interprocedural analysis, process analysises of each function are included.

Despite many obvious advantages, there are some certain problems such as path space explosion.

When we carry out symbolic exection, each brant node will produce a new path based on current paths, which is exponential. To identify this problem, there are several stategies are proposed such as the number of path constraint, memory constraint, or time constraint. The program designer want to cover the source code as much as possible. However, those policies would not not fundamentally solve the problem.

Besides, let's consider the case where so many functions and inter-calls between them are waitting to analysised. We should execute once process analysis for each call of a subprocess, which may ensure the accuracy but cause huge space overhead at the same time. A good solution called functions summary is proposed where a map is built for each function. When we call a certain function, we will go to review the mappings first instead of really execting the process analysis. To some extent, this method can improve the performance efficiency.

0 0