sitemap.xml生成方法(asp和php)

来源:互联网 发布:服装设计网络培训班 编辑:程序博客网 时间:2024/04/16 12:22

 sitemap.xml是一种站点地图协议,此协议文件基于早期的robots.txt文件协议,并有所升级。向搜索引擎中提交了sitemap.xml的 网站将更有利于搜索引擎网页爬行机器人的爬行索引,这样将提高索引网站内容的效率和准确度。

  一共有六个标签,changefreq:页面内容更新频率;

  lastmod:页面最后修改时间;

  loc:页面永久链接地址;

  priority:相对于其他页面的优先权(这个标签可以不使用);

  url:相对于前 4个标签的父标签;

  urlset:相对于前5个标签的父标签。

  你可以向搜索引擎提供多个Sitemap文件,但提供的每个Sitemap文件包括的网址不得超过50,000 个,并且未压缩时不能大于10MB 。

  向Google提交网站地图Sitemap: 通过网址http://www.google.com/webmasters管理提交;

  向Yahoo!提交网站地图Sitemap: 通过网址http://siteexplorer.search.yahoo.com管理提交;

  向MSN提交网站地图Sitemap: 用URL直接提交:http://api.moreover.com/ping?u=http%3A//your.domainname/sitemap.xml。这是向MSN直接提交网站地图的后门URL。注意”:”被%3A替换掉。

  向ASK提交网站地图Sitemap: 直接提交。http://submissions.ask.com/ping?sitemap=http%3A//your.domainname/sitemap.xml。注意”:”被%3A替换掉。

sitemap.xml文件格式如下:


<?xml version=”1.0″ encoding=”UTF-8″ ?>
<urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9“>
<url>
<loc>http://www.grzz.com.cn/</loc>
<lastmod>2009-04-27</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>http://www.grzz.com.cn/index.html</loc>
<lastmod>2009-04-27</lastmod>
<changefreq>weekly</changefreq>
</url>
</urlset>

那怎么制作sitemap.xml。最笨的方法就是按照这六个标签的规则,自己手写了。

如果网站的页面太多了,这个就会变成了一个超级郁闷的体力劳动。于是就有不少sitemap.xml的生成工具出现了,但是现在大部分的sitemap.xml生成工具都是在客户端输入网址,让工具在网站自行寻找链接生成,这样的模式,Rookie感觉效率比较低,而且没有办法对生成链接做控制。终于在网上找到了一个比较好的方法,适用于将内容生成静态页面的网站。有人将生成sitemap.xml的功能,写成了asp和php的页面,在页面上可以控制需要生成哪些链接。按照你的需要修改页面后,再把页面上传到你的网站空间,访问这个页面就是你所需要的sitemap.xml文件。然后就保存成为xml文件格式,再上传到你的空间,再将链接提交给支持sitemap.xml的搜索引擎。

Asp文件,将蓝色代码复制到文本文件,再保存成sitemap.asp,修改相关设置后,上传到服务器,访问即可
<%session(”server”)=”http://www.grzz.com.cn“                ‘将此http://www.grzz.com.cn改成你的域名vDir = “/”                                               ‘制作SiteMap的目录set objfso = CreateObject(”Scripting.FileSystemObject”)root = Server.MapPath(vDir)response.ContentType = “text/xml”response.write “<?xml version=’1.0′ encoding=’UTF-8′?>”response.write “<urlset xmlns=’http://www.sitemaps.org/schemas/sitemap/0.9′>”Set objFolder = objFSO.GetFolder(root)Set colFiles = objFolder.FilesFor Each objFile In colFilesresponse.write getfilelink(objFile.Path,objfile.dateLastModified)NextShowSubFolders(objFolder)response.write “</urlset>”set fso = nothingSub ShowSubFolders(objFolder)Set colFolders = objFolder.SubFoldersFor Each objSubFolder In colFoldersif folderpermission(objSubFolder.Path) thenresponse.write getfilelink(objSubFolder.Path,objSubFolder.dateLastModified)Set colFiles = objSubFolder.FilesFor Each objFile In colFilesresponse.write getfilelink(objFile.Path,objFile.dateLastModified)NextShowSubFolders(objSubFolder)end ifNextEnd SubFunction getfilelink(file,datafile)‘changefreq更改参数:always, hourly, daily, weekly, monthly, yearly , neverfile=replace(file,root,”")file=replace(file,”\”,”/”)If FileExtensionIsBad(file) then Exit Functionif month(datafile)<10 then filedatem=”0″if day(datafile)<10 then filedated=”0″filedate=year(datafile)&”-”&filedatem&month(datafile)&”-”&filedated&day(datafile)getfilelink = “<url><loc>”&server.htmlencode(session(”server”)&vDir&file)&”</loc><lastmod>”&filedate&”</lastmod><changefreq>weekly</changefreq></url>”Response.FlushEnd FunctionFunction Folderpermission(pathName)’需要过滤的目录(不列在SiteMap里面)PathExclusion=Array(”\ad”,”\admin”,”\aspnet_client”,”\Count”,”\data”,”\Inc”,”\upload”,”\template”)Folderpermission =Truefor each PathExcluded in PathExclusionif instr(ucase(pathName),ucase(PathExcluded))>0 thenFolderpermission = Falseexit forend ifnextEnd FunctionFunction FileExtensionIsBad(sFileName)Dim sFileExtension, bFileExtensionIsValid, sFileExtExtensions = Array(”html”)‘设置列表的文件名,扩展名不在其中的话SiteMap则不会收录该扩展名的文件if len(trim(sFileName)) = 0 thenFileExtensionIsBad = trueExit Functionend ifsFileExtension = right(sFileName, len(sFileName) - instrrev(sFileName, “.”))bFileExtensionIsValid = false ‘assume extension is badfor each sFileExt in extensionsif ucase(sFileExt) = ucase(sFileExtension) thenbFileExtensionIsValid = Trueexit forend ifnextFileExtensionIsBad = not bFileExtensionIsValidEnd Function%>

Php文件,将绿色代码复制到文本文件,再保存成sitemap.php,修改相关设置后,上传到服务器,访问即可
<?phpheader(’Content-type: application/xml; charset=”GB2312″‘,true);?><?php$website = “http://www.grzz.com.cn“; /* 将此http://www.grzz.com.cn改成你的域名 */ $page_root = “/”; /*更改成你网站的目录地址*//* changefreq可自行设置 */$changefreq = “weekly”; //”always”, “hourly”, “daily”, “weekly”, “monthly”, “yearly” and “never”./* 修改时间 */$last_modification = date(”Y-m-d\TH:i:s”) . substr(date(”O”),0,3) . “:” . substr(date(”O”),3);/* 需要生成的目录 */$allow_dir[] = “web”;/* 需要过滤的目录(不列在SiteMap里面) */$disallow_dir[] = “admin”;$disallow_dir[] = “_notes”;/* 设置列表的文件名,扩展名不在其中的话SiteMap则不会收录该扩展名的文件 */$disallow_file[] = “.inc”;$disallow_file[] = “.old”;$disallow_file[] = “.save”;$disallow_file[] = “.txt”;$disallow_file[] = “.js”;$disallow_file[] = “~”;$disallow_file[] = “.LCK”;$disallow_file[] = “.zip”;$disallow_file[] = “.ZIP”;$disallow_file[] = “.CSV”;$disallow_file[] = “.csv”;$disallow_file[] = “.css”;$disallow_file[] = “.class”;$disallow_file[] = “.jar”;$disallow_file[] = “.mno”;$disallow_file[] = “.bak”;$disallow_file[] = “.lck”;$disallow_file[] = “.BAK”;/* simple compare function: equals */function ar_contains($key, $array) {foreach ($array as $val) {if ($key == $val) {return true;}}return false;}/* better compare function: contains */function fl_contains($key, $array) {foreach ($array as $val) {$pos = strpos($key, $val);if ($pos === FALSE) continue;return true;}return false;}/* this function changes a substring($old_offset) of each array element to $offset */function changeOffset($array, $old_offset, $offset) {$res = array();foreach ($array as $val) {$res[] = str_replace($old_offset, $offset, $val);}return $res;}/* this walks recursivly through all directories starting at page_root andadds all files that fits the filter criterias */// taken from Lasse Dalegaard, function getFiles($directory, $directory_orig = “”, $directory_offset=”") {global $disallow_dir, $disallow_file, $allow_dir;   if ($directory_orig == “”) $directory_orig = $directory;   if($dir = opendir($directory)) {// Create an array for all files found$tmp = Array();       // Add the fileswhile($file = readdir($dir)) {// Make sure the file existsif($file != “.” && $file != “..” && $file[0] != ‘.’ ) {// If it’s a directiry, list all files within it//echo “point1<br>”;if(is_dir($directory . “/” . $file)) {//echo “point2<br>”;$disallowed_abs = fl_contains($directory.”/”.$file, $disallow_dir); // handle directories with pathes$disallowed = ar_contains($file, $disallow_dir); // handle directories only without pathes$allowed_abs = fl_contains($directory.”/”.$file, $allow_dir);$allowed = ar_contains($file, $allow_dir);if ($disallowed || $disallowed_abs) continue;if ($allowed_abs || $allowed){$tmp2 = changeOffset(getFiles($directory . “/” . $file, $directory_orig, $directory_offset), $directory_orig, $directory_offset);if(is_array($tmp2)) {$tmp = array_merge($tmp, $tmp2);}}} else { // filesif (fl_contains($file, $disallow_file)) continue;array_push($tmp, str_replace($directory_orig, $directory_offset, $directory.”/”.$file));}}}       // Finish off the functionclosedir($dir);return $tmp;}}$a = getFiles($page_root);echo ‘<?xml version=”1.0″ encoding=”UTF-8″?>’;?><urlset xmlns=’http://www.sitemaps.org/schemas/sitemap/0.9′><?foreach ($a as $file) {?><url><loc><? echo utf8_encode($website.$file); ?></loc><lastmod><? echo utf8_encode(date(”Y-m-d\TH:i:s”, filectime($page_root.$file)). substr(date(”O”),0,3) . “:” . substr(date(”O”),3));?></lastmod><changefreq><? echo utf8_encode($changefreq); ?></changefreq></url><?}?></urlset>


原创粉丝点击