asp.net 截取html标签取对应值

来源：互联网发布：新理念外语网络教学编辑：程序博客网时间：2024/06/06 06:45

自写一个程序

using System;using System.Collections.Generic;using System.Linq;using System.Web;using System.Web.UI;using System.Web.UI.WebControls;using System.Text.RegularExpressions;public partial class Default2 : System.Web.UI.Page{    protected void Page_Load(object sender, EventArgs e)    {        string str = "<html><head><head/><body><div>1111111111111111<div></body></html>";        string a= getHtmlMark(str);        Response.Write(a);    }    public string getHtmlMark(string s)    {        string temp = Regex.Replace(s, "<[^>]*>", "");        return temp;    }    }

今天写公共教学部程序的时候需要清除从数据库中提取出来的HTML标记....
嘿嘿.收集了个详细解决方法..
1，得到网页上的链接地址：

      string matchString = @"<a[^>]+href=\s*(?:'(?<href>[^']+)'|""(?<href>[^""]+)""|(?<href>[^>\s]+))\s*[^>]*>";
2，得到网页的标题：

           string matchString = @"<title>(?<title>.*)</title>";
3，去掉网页中的所有的html标记：

           string temp = Regex.Replace(html, "<[^>]*>", "");        //html是一个要去除html标记的文档

        java 中去掉网页中的所有标签
           str = str.replaceAll("<[^>]*>", "");

4, string matchString = @"<title>([\S\s\t]*?)</title>";

5,js去掉所有html标记的函数：
           function delHtmlTag(str)
          {
             return str.replace(/<\/?.+?>/g,"");//去掉所有的html标记
          }
        这个可能IE5会错，那用这个，梅花的：
         function delHtmlTag(str)
{
        return str.replace(/<[^>]+>/g,"");//去掉所有的html标记
}

document方法：
getElementById(id)                             返回指定结点的引用
getElementsByTagName(name)        返回文档中所有匹配的元素的集合
createElement(name)                          创建指定类型的新结点
createTextNode(text)                          创建一个纯文本结点
element方法：
getAttribute(id)                                    返回指定属性的值
setAttribute(id,value)                          给属性赋值
removeAttribute(id)                            移除指定属性和它的值
getElementsByTagName(name)        返回结点内所有匹配的元素的集合
node方法：
appendChild(child)                              给指定结点添加一个新的子结点
removeChild(child)                              移除指定结点的子结点
replaceChild(newChild,oldChild)        替换指定结点的子结点
insertBefore(newChild,refChild)        在同一层级的结点前面插入新结点
hasChildNodes()                                  如果结点有子结点则返回true
node属性：
nodeName                                          以字符串的格式存放结点的名称
nodeType                                            以整型数据格式存放结点的类型
nodeValue                                           以可用的格式存放结点的值
parentNode                                         指向结点的父结点的引用
childNodes                                          指向子结点的引用的集合
firstChild                                             指向子结点结合中的第一个子结点的引用
lastChild                                              指向子结点结合中的最后一个子结点的引用

参考地址:http://www.qionghi.com/bbs/5040.aspx