获取网页的信息(转)

来源：互联网发布：水电预算工作量软件编辑：程序博客网时间：2024/05/29 10:12

用WebRequest抓取当前页面后，用正则表达式提取
例如:
scracth.htm
<html>
<body>
<a href="http://www.csdn.net"> 程序员 </a>
<a hfef="http://www.sohu.com"> 搜狐 </a>
<a href="http://www.sina.com.cn"> 新浪 </a>
</body>
</html>

后台代码:
public static void Main()
{
WebRequest wr = WebRequest.Create("http://127.0.0.1/scratch.htm");
WebResponse rs = wr.GetResponse();
StreamReader sr = new StreamReader(rs.GetResponseStream());

string htm = sr.ReadToEnd();

Console.WriteLine(htm);

string pattern = @" <a href=(? <url> [^> ]*)> (? <description> [^> ]*) </a> ";

RegexOptions m_options = RegexOptions.IgnoreCase ¦ RegexOptions.Multiline ¦ RegexOptions.Singleline;

Regex regex = new Regex(pattern, m_options );

Match m = regex.Match(htm);

while (m.Success)
{
Console.WriteLine("url=" + m.Groups["url"] + " description=" + m.Groups ["description"]);
m = m.NextMatch();
}
RL();
}