2010 英特尔® 线程挑战赛—Hosoya指数

来源:互联网 发布:windows系统恢复选项 编辑:程序博客网 时间:2024/05/22 05:08

2010 英特尔® 线程挑战赛—Hosoya指数

邓辉   denghui0815@hotmail.com

源码下载

问题描述

Hosoya 指数 Z 是分子中非临近化学键集的数量。如果我们把分子抽象成图,那么分子的原子就是整个图的顶点 (V),原子间的临近化学键就是边 (E)。也就是说,图 Z(G(V,E)) Hosoya 指数是图中的匹配总数,而其中匹配是指不共享顶点的边的子集。

计算给定图的 Hosoya 指数可以通过以下方法完成:对所有 k-匹配中的边子集数进行求和,其中 k-匹配是 k (0 <= k <= |E|) 条边子集的集合。例如,对于六顶点无向完整图 (K6) Hosoya 指数,Z(K6) = 76

K6 = ({A,B,C,D,E,F} {AB,AC,AD,AE,AF,BC,BD,BE,BF,CD,CE,CF,DE,DF,EF})

k 大小 k-匹配

--------------------------------------------

0 1 {} (0-匹配的大小总为 1)

1 15 { {AB}, {AC}, {AD}, {AE}, {AF}, {BC},{BD}, {BE}, {BF}, {CD}, {CE}, {CF},{DE}, {DF}, {EF} }

2 45 { {AB CD}, {AB CE}, {AB CF}, {AB DE}, {AB DF}, {AB EF},

{AC BD}, {AC BE}, {AC BF}, {AC DE}, {AC DF}, {AC EF},

{AD BC}, {AD BE}, {AD BF}, {AD CE}, {AD CF}, {AD EF},

{AE BC}, {AE BD}, {AE BF}, {AE CD}, {AE CF}, {AE DF},

{AF BC}, {AF BD}, {AF BE}, {AF CD}, {AF CE}, {AF DE},

{BC DE}, {BC DF}, {BC EF}, {BD CE}, {BD CF}, {BD EF},

{BE CD}, {BE CF}, {BE DF}, {BF CD}, {BF CE}, {BF DE},

{CD EF}, {CE DF}, {CF DE} }

3 15 { {AB CD EF}, {AB CE DF}, {AB CF DE},

{AC BD EF}, {AC BE DF}, {AC BF DE},

{AD BC EF}, {AD BE CF}, {AD BF CE},

{AE BC DF}, {AE BD CF}, {AE BF CD},

{AF BC DE}, {AF BD CE}, {AF BE CD} }

>=4 0 --

---------------------------------------------------

总计 76

问题:编写一个多线程应用程序,用来通过文件输入图并计算所输入图的 Hosoya 指数。

输入文件说明:输入文件的文件名作为命令行参数提供给应用程序。该文件有若干行,每行包含图中一条要分析的边。图顶点用“A”到“Z”范围中的三个大写字母表示。每个输入行包含六个大写字母,前三个字母代表边的一个顶点,后三个字母代表该边的另一个顶点。文件结束 (EOF) 表示图输入结束。此输入图是连通的无向图;并且每条边的两个顶点不能重复(例如,不允许使用“AABAAB”)。上述示例不符合此问题的输入文件说明。

输出:将结果输出到标准输出,并且必须给出输入图的 Hosoya 指数。

输入文件示例:(省略氢原子的 2-环丙基丁烷分子)

GHAQRC

KLBMND

QRCMND

ABEYZG

CDFABE

CDFYZG

MNDYZG

输出示例:

The Hosoya index of the input graph is 24

串行算法

观察K6完全图的Hosoya指数。

 

观察K5完全图的Hosoya指数。

K1 1 10 { {AB}, {AC}, {AD}, {AE}, {BC}, {BD}, {BE}, {CD}, {CE}, {DE}}

K1的每条边EV1V2),遍历K1中不含V1V2的边构成K2

例如  {AB} + {{CD}, {CE}, {DE}}

{AC} + {{BD}, {BE}, {DE}}

{AD} + {{BC}, {BE}, {DE}}

{AE} + {{BC}, {BD}, {CD}}

{BC} + {{AD}, {AE}, {DE}}

{BD} + {{AC}, {AE}, {CE}}

{BE} + {{AC}, {AD}, {CD}}

{CD} + {{AB}, {AE}, {BE}}

{CE} + {{AB}, {AD}, {BD}}

{DE} + {{AB}, {AC}, {BC}}

这样得到10 * 3 = 30对没有交点的边,实际上K2的数量为15{AB} + {DE} {DE} + {AB}是相同的,控制配对流程,先将边按较小的节点排序,每条边与排在其后的边配对即可。

      {AB} + {{CD}, {CE}, {DE}}

{AC} + {{BD}, {BE}, {DE}}

{AD} + {{BC}, {BE}, {DE}}

{AE} + {{BC}, {BD}, {CD}}

{BC} + {{AD}, {AE}, {DE}}

{BD} + {{AC}, {AE}, {CE}}

{BE} + {{AC}, {AD}, {CD}}

{CD} + {{AB}, {AE}, {BE}}

{CE} + {{AB}, {AD}, {BD}}

{DE} + {{AB}, {AC}, {BC}}

另外一种方法:通过记住当前边集内最大点,与含有大于该点的边配对即可。

{AB} + {{CD}, {CE}, {DE}}

{AC} + {{BD}, {BE}, {DE}}

{AD} + {{BC}, {BE}, {DE}}

{AE} + {{BC}, {BD}, {CD}}

{BC} + {{AD}, {AE}, {DE}}

{BD} + {{AC}, {AE}, {CE}}

{BE} + {{AC}, {AD}, {CD}}

{CD} + {{AB}, {AE}, {BE}}

{CE} + {{AB}, {AD}, {BD}}

{DE} + {{AB}, {AC}, {BC}}

递归求解:将边E加入边集{G},检测E的节点是否在边集{G}中,如果E的节点不在边集{G}中,将E加入到{G},递归加入其它边,否则回溯。

 

__int64 XGetHosya_Recursion(const XGraph* pGraph, uint8* pMask, int nBeg)

{

     __int64 nCount = 0;

     XEdge* pCur = pGraph->pStart[nBeg];

     XEdge* pEnd = pGraph->pEdge + pGraph->nEdge;

     for(; pCur < pEnd; ++pCur)

     {

         if(pMask[pCur->nEnd] == 0)

         {

              pMask[pCur->nBeg] = pMask[pCur->nEnd] = 1;

              nCount += XGetHosya_Recursion(pGraph, pMask, pCur->nBeg + 1) + 1;

              pMask[pCur->nBeg] = pMask[pCur->nEnd] = 0;

         }

     }

     return nCount;

}

为提高效率展开递归。

// 展开递归

__int64 XGetHosya_Mask_Stack(const XGraph* pGraph, uint8* pMask, int nBeg)

{

     __int64 nCount = 0;

     XEdge*   pCur = pGraph->pStart[nBeg];

     XEdge*   pEnd = pGraph->pStart[pGraph->nNode];

     XEdge**  pStart = pGraph->pStart;

     XEdge**  pCurStack = (XEdge**)scalable_aligned_malloc((pGraph->nNode / 2 + 2) *

sizeof(pCurStack[0]), 16);

     int      nIndex = 1;

     pEnd->nBeg = pEnd->nEnd = pGraph->nNode;

     pCurStack[0] = pEnd;

     while(nIndex > 0)

     {

         for( ;pCur < pEnd; ++pCur)

         {

              if(pMask[pCur->nEnd] == 0)

              {

                   pMask[pCur->nBeg] = pMask[pCur->nEnd] = 1;

                   ++nCount;

                   pCurStack[nIndex++] = pCur;

                   pCur = pStart[ pCur->nBeg + 1] - 1;

              }

         }

 

         if(pCur == pEnd)

         {

              pCur = pCurStack[--nIndex];

              pMask[pCur->nBeg] = pMask[pCur->nEnd] = 0;

              ++pCur;

         }

     }

     scalable_aligned_free(pCurStack);

     return nCount;

}

并行算法

    使用TBBTask,进行任务分解,当待处理的节点数较少时停止分解Task

// HosoyaTask

class CXHosoyaTask: public tbb::task

{

     const XGraph* m_pGraph;

     uint8*          m_pMask;

     int             m_nBeg;

     __int64*       0020m_pCount;

     static  uint32 ms_nTask;

public:

     CXHosoyaTask( const XGraph* pGraph, uint8* pMask, int nBeg, __int64* pCount) :

       m_pGraph(pGraph), m_nBeg(nBeg), m_pCount(pCount)

     {

         m_pMask = (uint8*)scalable_calloc(sizeof(m_pMask[0]), pGraph->nNode + 2);

         memcpy(m_pMask, pMask, sizeof(pMask[0]) * (pGraph->nNode + 2));

     }

 

     ~CXHosoyaTask() { scalable_free(m_pMask); }

 

     tbb::task* execute()

     {

         tbb::task_list list;

         if(m_nBeg + 12 > m_pGraph->nNode)

         {

              *m_pCount += XGetHosya_Mask_Stack(m_pGraph, m_pMask, m_nBeg);

         }

         else

         {

int nTask = 1;

              uint8* pMask = m_pMask;

              while(pMask[m_nBeg]) ++m_nBeg;

              XEdge* pCur = m_pGraph->pStart[m_nBeg];

              XEdge* pEnd = m_pGraph->pEdge + m_pGraph->nEdge;

 

              deque<__int64> deqCnt;

              deque<__int64>::iterator i;

              for(; pCur < pEnd; ++pCur)

              {

                   if(pMask[pCur->nEnd] == 0)

                   {

                       pMask[pCur->nBeg] = pMask[pCur->nEnd] = 1;

                       if(pCur->nBeg + 12 < m_pGraph->nNode)

                       {

                            deqCnt.push_back(1);

                            list.push_back( *new( allocate_child() )  CXHosoyaTask(m_pGraph,

pMask, pCur->nBeg + 1, &deqCnt.back()));

 

                            ++nTask;

                       }

                       else

                        {

                       *m_pCount += XGetHosya_Mask_Stack(m_pGraph, pMask, pCur->nBeg + 1) + 1;

                       }

                       pMask[pCur->nBeg] = pMask[pCur->nEnd] = 0;

                   }

              }

 

              set_ref_count(nTask);

              spawn_and_wait_for_all(list);

              for(i = deqCnt.begin(); i != deqCnt.end(); ++i) *m_pCount += *i;

         }

 

         return NULL;

     }

};

 

// 并行版本

__int64 XGetHosya_Mask_Parallel(const XGraph* pGraph)

{

     __int64 nCount = 0;

     uint8*   pMask = (uint8*)scalable_calloc(sizeof(pMask[0]), pGraph->nNode + 2);

     CXHosoyaTask& xtask = *new(tbb::task::allocate_root()) CXHosoyaTask(pGraph,

pMask, 0, &nCount);

     tbb::task::spawn_root_and_wait(xtask);

     scalable_free(pMask);

     return nCount;

}

 

优化工具

Hotspots检测

使用Intel AmplifierHotspots检测热点函数,结果如下:

检测结果显示热点函数在XGetHosya_Mask_Stack内部,代码优化的空间较小,必须从算法的角度进行优化。

在文献《The Hosoys Index, Lucas Numbers, and QSPR》中有这样的算法,

 

 

Concurrency检测

使用Intel AmplifierConcurrency检测并行优化的效果,结果如下:

检测结果显示,程序并行度较好,达到了Ideal的水平。