使用程序分析提取web应用页面关系

来源：互联网发布：js 调试工具编辑：程序博客网时间：2024/06/05 15:45

看这个题目，大家肯定会有疑惑：什么叫web应用页面关系？听我慢慢来解释。

在现有的web应用中，可展示的网页数量要要大于实际的JSP文件的个数，原因是在处理事件或者单纯的页面跳转时，URL后面会带有一些参数，例如：

http://XXXXXXXX:9090/linghao/buy.jsp?action=add&iid=2 处理这些URL的页面不一定位于一个新的JSP文件，可能在自身文件就内部处理了，所以说可展示的网页数量要大于实际的JSP文件数目。今天我们关心的重点在这些参数上，不同的参数个数和参数值都会导致当前页面跳转到不同的页面，这些页面的关系信息往往都隐藏于JSP的java代码中，除非你手动的去找，否则只能通过程序分析的方法把这些参数找出来。

现在有一个相关工作做了类似的分析《Improving Test Case Generation for Web Applications Using Automated Interface Discovery》

这个工作也是通过分析servlet来找domain infomation也就是URL后面的参数信息，只不过这个工作当时是用在测试方面，为了更多的覆盖执行路径，跟我们要解决的问题属于同一类，先看看他们的解决方案是什么。

他们先定义了一个example servlet：(IP表示input parameters)

interface = IP∗IP = name, domain informationname =< string >domain information = domain type, relevant value∗domain type = ANY |NUMERICrelevant value =< string > | < number >

文章提到了一个算法，主要分两步：

一、找到参数的类型string OR number......

Algorithm 1 – Phase1/* GetDomainInfo */Input: servlets: set of servlets in the web application  //JSP文件中的JAVA代码Output: ICFG: ICFG for the servlets, annotated with domain information  //输出JAVA代码的标有参数信息的过程间控制流图begin   ICFG ← ICFG for the web application //首先得到过程间控制流图   compute data-dependence information for the web application   for each PFcallintheservlets do//对每一处出现request.getParameter();进行处理       PFnode ← ICFG’s node representing PF        PFvar ← lhs of the PF call statement //request.getParameter()的返回值       newannot ← new annotation //添加注释       newannot.IPnode ← PFnode//注释内容-Node       newannot.type ← ANY//初始化类型为任意       newannot.values ← {}//可能的值集合为空       associate newannot with PFnode //把这些注释标记到当前分析的node节点       GDI(DUchain(PFvar, node), PFvar, PFnode, {}) 调用GDI的函数，具体实现在下面，DUchain代表define-use链   end for   return ICFG /* returns annotated ICFG */end/* GDI */Input: node: current node to examine //参数一：需要分析的NodeIPvar: variable storing the IP value and used at node //参数二：在node处使用的代表参数值的变量root node: node to be annotated//参数三：需要添加注释的节点visited nodes: nodes visited along current path//参数四：用以做标示的变量，若节点被分析过就标为visited，防止再次分析begin if node !∈ visited nodes then//如果node没有被分析过   if node is an exit node then//如果是一个函数的退出语句       returnsites ← possible return sites for node’s method  //该函数退出后可能返回的节点集合，也就是当初调用此函数的节点结合       for each retsite ∈ returnsites do //对于每个调用node所在方法的节点           retvar ← variable defined at retsite//记录node所在方法执行完成后的返回值           newannot ← root node’s annotation//并把node所在方法的根节点方法的注释记录下来           associate newannot with node retsite//建立注释与node的联系           GDI(DUchain(retvar, retsite), retvar, retsite,visited nodes ∪ {node})//递归调用GDI函数，DUchain(v,n)表示在n出定义的v变量被使用的节点集合       end for   else      if node represents a comparison with a constant then//如果使用变量时是比较操作         compval ← value used in the comparison//把被用来比较的值记录下来         addValueToAnnotation(root node, compval)//把这个值添加在root节点的注释中      else if node is a type cast/conversion of IPvar then//如果使用变量时是类型转换操作         casttype ← target type of the cast operation//把转换后的类型记录下来         setDomainTypeInAnnotation(root node, casttype)//把注释中的类型字段改成转换后的类型      end if      if node contains a definition of a variable then//如果node处包含一个新的变量定义，也就是说如果之前的变量使用在把值传给另一个变量的情况下         var ← variable defined at node         for each n ∈ DUchain(var, node) do//需要再跟踪这个变量，一直要找到从参数取下来的值最后到底转换成了什么类型以及有哪些可能的值            GDI(n, var, root node, visited nodes ∪ {node})         end for      end if   end if end ifend

其实这个算法完成的主要工作就是来确认URL中参数的类型以及它们可能的数值，算法的输出如下图

算法的第二步：得到可能的参数组合

Algorithm 2 – Phase 2/* ExtractInterfaces */Input: ICFG: annotated ICFG produced by GetDomainInfo//第一步得到的过程间控制流图Output: interfaces[]: interfaces exposed by each of the servlets//可能跳转到的页面集合begin    CG ← call graph for the web application//获取web应用的调用图，图的节点是方法    SCC ← set of strongly connected components in CG//找到其中的强连通部分    SINGLETONS ← set of singleton sets, one for each node in CG that is not part of a strongly connected component//出去强连通部分的节点集合    CC ← SCC ∪ SINGLETONS//所有的节点集合    for each mset ∈ CC, in reverse topological order do //对每个节点（方法）按照逆拓扑序遍历，即先遍历底层的函数，这样保证在分析时函数所有函数调用的其他方法都有分析结果       SummarizeMethod(mset)//具体的方法实现在下面    end for    return interfaces of each servlet’s root methodend /* SummarizeMethod */Input: methodset ⊂ CG nodes: singleton set or set of strongly connected methods in the call graph//mset是一个强连通方法集合或者单个方法 begin   N ← Sm∈methodset nodes in m′s CFG    worklist ← {}   for each n ∈ N do  //便利m的控制流图内的每个节点       In[n]← {}//进入n的变量集合初始为空       if n corresponds to a PF call then //如果n是一个request.getParameter()这样的操作          newIP ← new IP            newIP.node ← n          newIP.name ← parameter of the PF call  //保存PF方法的参数，也就是request.getParameter()的参数，即URL中传递的参数的名称          if n’s annotation has domain information dominfo then                newIP.domaininfo ← dominfo//如果n包含URL信息(算法第一步的结果)，则把这个信息添加到新的IP中          else              newIP.domaininfo ← null          end if          Gen[n] ← {{newIP}}  //因为生成了一个新节点，所以把新节点加入Gen[n]集合          add nodes in succ(n) to worklist//把n后面的节点加入到要分析的列表中       else if n is a callsite AND target(n) has summary s then //如果n这个节点调用了别的方法并且调用的方法已包含summary          Gen[n] ← map(n, s) //n处就生成了一个新的节点和summary的对应关系          for each interface ∈ Gen[n] do //遍历Gen集合中的每个节点              for each IP ∈ interface do                  annot ← annotation associated with n’s return site                  if IP.node == annot.IPnode ANDannot has domain information dominfo then                     IP.domaininfo ← dominfo                  end if              end for          end for          add nodes in succ(n) to worklist       else if n is a method entry point then//如果n是一个方法入口点，那么它不会生成IP          Gen[n] ← {{}}          add nodes in succ(n) to worklist       else          Gen[n]← ∅       end if       Out[n] ← Gen[n]   end for   while |worklist| 6= 0 do       n ← first element in worklist       In[n]← Sp∈pred(n) Out[p]       Out′[n] ← {}       for each i ∈ In[n] do          for each g ∈ Gen[n] do              Out′[n] ← Out′[n] ∪ {i ∪ g} //列出可能生成的IP组合          end for       end for       if Out′[n] 6= Out[n] then          Out[n] ← Out′[n]          if n is a callsite AND target(n) ∈ methodset then                add target(n)’s entry node to worklist          else              add nodes in succ(n) to worklist          end if       end if   end while   for each m ∈ methodset do       summary ← Out[m’s exit node]//方法m退出节点的IP输出集合就是要求的summary       associate summary to method m       for each interface ∈ summary do           for each IP ∈ interface such that IP.name is not a concrete value do               IP.name ← resolve(IP)//把虚参换成对应的实参           end for       end for   end forend

算法第二步先是对调用流图中的各集合中的方法体进行遍历，对每个方法都生成一个summary包含各种可能的IP组合。

这两个算法篇幅比较长，是文献里提到的，研究明白也花了很长时间，特此拿来记录一下，巩固记忆也帮助想解决这个问题的童鞋，收工，睡觉去~