Parallel Symbolic Execution for Automated Real-World Software Testing

来源：互联网发布：阿里云腾讯云哪个好编辑：程序博客网时间：2024/05/16 10:25

稿件来源：http://www.cnblogs.com/hszhang/archive/2011/11/22/2259095.html

转载在此，方便自己阅读。

1.本文的主要贡献：

第一个基于集群的并行符号执行引擎（parallel symbolic execution engine）；a testing platform for writing symbolic tests；a quasi-complete（半完备的） symbolic POSIX model that makes it possible to use symbolic execution on real-world systems.

2. Problem Overview

首先，路径爆炸；其次，程序和运行环境之间的交互和协调；最后，using an automated test generator in the context of a development organization’s quality assurance processes.

3. Scalable Parallel Symbolic Execution

3.1 Conceptual Overview

Parallel Symbolic Execution:The key design goal is to enable individual cluster nodes to explore the execution tree independently of each other. One way of doing this is to statically split the execution tree and farm off subtrees to worker nodes

Since the methods used so far in parallel model checkers rely on static partitioning of a finite state space, they cannot be directly applied to the present problem. Instead, Cloud9 partitions the execution tree dynamically, as the tree is being explored.

Dynamic Distributed Exploration: Cloud9 consists of worker nodes and a load balancer (LB). Workers run independent SEEs, based on KLEE [Cadar 2008].Cloud9 operates roughly as follows:

When the first worker node W1 joins the Cloud9 cluster, it connects to the LB and receives a “seed” job to explore the entire execution tree.When the second worker W2 joins and contacts the LB, it is instructed to balance W1’s load, which causes W1 to break off some of its unexplored subtrees and send them to W2 in the form of jobs. As new workers join, the LB has them balance the load of existing workers. The workers regularly send to the LB status updates on their load in terms of exploration jobs, along with current progress in terms of code coverage,encoded as a bit vector. Based on workers’ load, the LB can issue job transfer requests to pairs of workers in the form < source worker, destination worker, # of jobs>. The source node decides which particular jobs to transfer.

3.2 Worker-level Operation

a worker consists of three kinds of nodes: (1) internal nodes that have already been explored and are thus no longer of interest—we call them dead nodes; (2) fence nodes that demarcate the portion being explored, separating the domains of different workers; (3) candidate nodes, which are nodes ready to be explored. A worker exclusively explores candidate nodes; it never expands fence or dead nodes.

Worker-to-Worker Job Transfer: When the global exploration frontier becomes poorly balanced across workers, the load balancer chooses a loaded workerWs and a less loaded worker Wd and instructs them to balance load by sending n jobs from Ws to Wd.

Ws chooses n of its candidate nodes and packages them up for transfer to Wd.Ws sends to Wd the path from the tree root to the node, and rely on Wd to “replay” that path and obtain the contents of the node.

When the job tree arrives at Wd,it is imported into Wd’s own subtree,and the leaves of the job tree become part of Wd’s frontier (at the
time of arrival, these nodes may lie “ahead” of Wd’s frontier).Wd keeps the nodes in the incoming jobs as virtual nodes, as opposed to materialized nodes that reside in the local subtree, and replays paths only lazily. A materialized node is one that contains the corresponding program state, whereas a virtual node is an “empty shell” without corresponding program state. In the common case, the frontier of a worker’s local subtree contains a mix of materialized and virtual nodes, as shown in the diagram.

3.3 Cluster-level Operation

Load Balancing:

When jobs arrive at Wd, they are placed conceptually in a queue; the length of this queue is sent to the load balancer periodically. The LB ensures that the worker queue lengths stay within the same order of magnitude.The balancing algorithm takes as input the lengths li of each worker Wi’s queue Qi. It computes the average ¯l and standard deviation s of the li values and then classifies each Wi as underloaded (li < max{¯l−d ·s,0}), overloaded(li > ¯ l +d ·s), or OK otherwise; d is a constant factor. The Wi are then sorted according to their queue length li and
placed in a list. LB then matches underloaded workers from the beginning of the list with overloaded workers from the end of the list. For each pair <Wi,Wj>, with li < l j , the load balancer sends a job transfer request to the workers to move (lj −li)/2 candidate nodes from Wj to Wi.

Coordinating Worker-level Explorations：

In Cloud9, coverage is represented as a bit vector, with one bit for every line of code; a set bit indicates that a line is covered. Every time
a worker explores a program state, it sets the corresponding bits locally. The current version of the bit vector is piggybacked on the status updates sent to the load balancer.The LB maintains the current global coverage vector and,when it receives an updated coverage bit vector, ORs it into the current global coverage. The result is then sent back to the worker, which in turn ORs this global bit vector into its own, in order to enable its local exploration strategy to make choices consistent with the global goal. The coverage bit vector is an example of a Cloud9 overlay data structure.

4. Cloud9 Prototype

We developed a Cloud9 prototype that runs on private clusters as well as cloud infrastructures like Amazon EC2 [Amazon] and Eucalyptus[Eucalyptus].

Broken Replays：

We therefore replaced the KLEE allocator with a perstate deterministic memory allocator, which uses a per-state address counter that increases with everymemory allocation.To preserve the correctness of external calls (that require real addresses), this allocator gives addresses in a range that is also mapped in the SEE address space using mmap(). Thus,before external calls are invoked, the memory content of the state is copied into the mmap-ed region.

Constraint Caches：

In Cloud9, states are transferred between workers without the source worker’s cache.

Custom Data Structures：

We developed two custom data structures for handling symbolic execution trees: Node pins are a kind of smart pointer customized for trees.

Cloud9 adopted a layer-based structure similar to that used in CAD tools, where the actual tree is a superposition of simpler layers. When exploring the tree,one chooses the layer of interest; switching between layers can be done dynamically at virtually zero cost.