中国代码恒熠百度小偷V1.0之网页搜索

来源：互联网发布：spss数据分析思路编辑：程序博客网时间：2024/05/17 01:34

演示见http://baidu.china-code.net/

作品来源：在做浙搜www.zhesou.com这个项目中，老板要求在三天内做好获取百度的网页内容和mp3的内容.

花了一个晚上基本上把这个任务完成。

做这个搜索的关键是编码和正则表达式。

下面主要说web.aspx.cs

string str1,Result;
   string wd=System.Web.HttpUtility.UrlEncode(Request.QueryString["wd"], System.Text.Encoding.GetEncoding("GB2312"));
   string pn=Request.QueryString["pn"];
   if (pn=="")
   {
    pn="0";
   }
  //Response.Write("http://www.baidu.com/s?wd="+wd+"&pn="+pn+"&lm=0&si=&rn=10&ie=gb2312&ct=0&cl=3&f=1&");

   string PageUrl="http://www.baidu.com/s?wd="+wd+"&pn="+pn+"&lm=0&si=&rn=10&ie=gb2312&ct=0&cl=3&f=1&";
   WebClient wc = new WebClient();
　　　　   wc.Credentials = CredentialCache.DefaultCredentials;
　　　　     Byte[] pageData = wc.DownloadData(PageUrl);
　　　　      Result = Encoding.Default.GetString(pageData);


   string[] strtemp = System.Text.RegularExpressions.Regex.Split(Result,@"篇，用时");
   str1=strtemp[0];
   Result=strtemp[1];
   strtemp=null;
   strtemp = System.Text.RegularExpressions.Regex.Split(str1,@"找到相关网页约");
   str1=strtemp[1];
   strtemp=null;
   Response.Write("共有"+str1+"条记录<br>");
   strtemp = System.Text.RegularExpressions.Regex.Split(Result,"下一页");
   Result=strtemp[0]+"下一页</FONT></A></DIV>";
   strtemp=null;
   strtemp = System.Text.RegularExpressions.Regex.Split(Result,"DIV id=ScriptDiv>");
   Result=strtemp[1];
   strtemp=null;
   Result=System.Text.RegularExpressions.Regex.Replace(Result,@"- <a href=(/S*) target=""_blank/"" class=m>百度快照</a>","");
   Result=System.Text.RegularExpressions.Regex.Replace(Result,@"a href=s","a href=");
   Response.Write(Result);

BUG：1）当搜索记录小于100条时，系统出错。

2）当搜索结果包含"下一页"时，系统显示不完全。

正在修正BUG中。