正则－匹配超链接地址及内容

来源：互联网发布：mac的软件强制退出不了编辑：程序博客网时间：2024/06/08 15:20

正则－匹配超链接地址及内容

今天做文章抓取程序的完善开发，碰到了以下问题
“<a href=aaa.html>A页</a><a href=bbb.html>B页</b>” 利用正则从中分别取出 aaa.html ，A页，bbb.html，B页。

MatchCollection mc = Regex.Matches(htmlstring, @"<a\s+href=(?<url>.+?)>(?<content>.+?)</a>");
2

foreach (Match m in mc)
3

{
4

url = m.Groups["url"].Value;
5

content = m.Groups["content"].Value;
7

}

其中htmlstring 为输入代码

2.

正则表达式匹配超链接的
有如下文本：
<a href="/sort/172_1.htm">系统相关</a>  | &nbsp;<a href="/sort/173_1.htm">软件教程</a>  | &nbsp;<a href="/sort/174_1.htm">程序设计</a>  | &nbsp;<a href="/sort/175_1.htm">网络编程</a>  | &nbsp;<a href="/sort/176_1.htm">图形图像</a>  | &nbsp;<a href="/sort/177_1.htm">数据库类</a>  | &nbsp;<a href="/sort/178_1.htm">网络安全</a>

想要匹配出其中的URL，但是使用如下的正则表达式：
(?<URL><a\s*href=".*">.*</a>)
但是它把整行一起匹配成了一条结果，如何能够让其把那几个A标签都匹配出来呢?

------解决方案--------------------
(?is)(?<URL><a\s*href="[^"]*"[^>]*>.*?</a>)
------解决方案--------------------

探讨

(?is)(?<URL><a\s*href="[^"]*"[^>]*>.*?</a>)

------解决方案--------------------
(?is)(?<URL><a[^>]*?>.*?</a>)

0 0