插件22:从URL地址读取链接信息

来源:互联网 发布:公安局网络安全管理岗 编辑:程序博客网 时间:2024/06/05 13:26
<?php // Plug-in 22: Get Links From URL// This is an executable example with additional code supplied// To obtain just the plug-ins please click on the Download link$result = PIPHP_GetLinksFromURL("http://pluginphp.com");echo "<ul>";for ($j = 0 ; $j < count($result) ; ++$j)   echo "<li>$result[$j]</li>";function PIPHP_GetLinksFromURL($page){   // Plug-in 22: get Links From URL   //   // This plug-in accepts the URL or a web page and returns   // an array of all the links found in it. The argument   // required is:   //   //    $page: The web site's main URL   $contents = @file_get_contents($page);   if (!$contents) return NULL;      $urls    = array();   $dom     = new domdocument();   @$dom    ->loadhtml($contents);   $xpath   = new domxpath($dom);   $hrefs   = $xpath->evaluate("/html/body//a");   for ($j = 0 ; $j < $hrefs->length ; $j++)      $urls[$j] = PIPHP_RelToAbsURL($page,         $hrefs->item($j)->getAttribute('href'));   return $urls;}// The below function is repeated here to ensure that it's// available to the main function which relies on itfunction PIPHP_RelToAbsURL($page, $url){   // Plug-in 21: Relative To Absolute URL   // This plug-in accepts the absolute URL of a web page   // and a link featured within that page. The link is then   // turned into an absolute URL which can be independently   // accessed. Only applies to http:// URLs. Arguments are:   //    $page: The web page containing the URL   //    $url:  The URL to convert to absolute   if (substr($page, 0, 7) != "http://") return $url;      $parse = parse_url($page);   $root  = $parse['scheme'] . "://" . $parse['host'];   $p     = strrpos(substr($page, 7), '/');      if ($p) $base = substr($page, 0, $p + 8);   else $base = "$page/";      if (substr($url, 0, 1) == '/')           $url = $root . $url;   elseif (substr($url, 0, 7) != "http://") $url = $base . $url;      return $url;}?>

插件说明:

本插件接受一个web页面的URL地址,对他进行解析,只寻找"<a href "超链接标签,以数组的形式返回所有找到的超链接地址。他需要一个参数:

$page: 一个web页面的URL地址,包括前导符“http://”和域名。