asp.net替换html标签的程序

来源:互联网 发布:淘宝买鞋子调包 编辑:程序博客网 时间:2024/06/09 21:33

 

HTML代码导致显示的问题,但这也是一个程序的BUG,以前写asp时都会写HtmlEncode函数,这次却忽略了,找了一下以前的程序,如下:
function HTMLEncode(fString)
fString=replace(fString,";","&#59;")
fString=replace(fString,"<","<")
fString=replace(fString,">",">")
fString=replace(fString,"/","/")
fString=replace(fString,"--","&#45;&#45;")
fString=replace(fString,CHR(9),"&#9;")
fString=replace(fString,CHR(10),"<br>")
fString=replace(fString,CHR(13),"")
fString=replace(fString,CHR(22),"&#22;")
fString=replace(fString,CHR(32),"&#32;")
fString=replace(fString,CHR(34),"&#34;")'双引号
fString=replace(fString,CHR(39),"'")'单引号
HTMLEncode=fString
end function
但是这种程序在PHP中是不用写的,已经有人写好。
我想asp.net肯定不会落后,找了一下,在System.Web.HttpUtility找到了。
PHP中还有自带的一些正则式的公式,功能比较强,如ereg()等。.net中没有发现有类似函数,不过我觉得自己写写也不是坏事,因为并不难,还可以知其所以然。就如上面那个asp程序,知道他做了些什么。
附带一下asp.net替换html标签的程序:
public static string NoHTML(string Htmlstring)  
    {  

        //删除脚本  

        Htmlstring = Regex.Replace(Htmlstring, @"<script[^>]*?>.*?</script>", "", RegexOptions.IgnoreCase);  

        //删除HTML  

        Htmlstring = Regex.Replace(Htmlstring, @"<(.[^>]*)>", "", RegexOptions.IgnoreCase);  

        Htmlstring = Regex.Replace(Htmlstring, @"([/r/n])[/s]+", "", RegexOptions.IgnoreCase);  

        Htmlstring = Regex.Replace(Htmlstring, @"-->", "", RegexOptions.IgnoreCase);  

        Htmlstring = Regex.Replace(Htmlstring, @"<!--.*", "", RegexOptions.IgnoreCase);  

        Htmlstring = Regex.Replace(Htmlstring, @"&(quot|#34);", "/"", RegexOptions.IgnoreCase);  

        Htmlstring = Regex.Replace(Htmlstring, @"&(amp|#38);", "&", RegexOptions.IgnoreCase);  

        Htmlstring = Regex.Replace(Htmlstring, @"&(lt|#60);", "<", RegexOptions.IgnoreCase);  

        Htmlstring = Regex.Replace(Htmlstring, @"&(gt|#62);", ">", RegexOptions.IgnoreCase);  

        Htmlstring = Regex.Replace(Htmlstring, @"&(nbsp|#160);", " ", RegexOptions.IgnoreCase);  

        Htmlstring = Regex.Replace(Htmlstring, @"&(iexcl|#161);", "/xa1", RegexOptions.IgnoreCase);  

        Htmlstring = Regex.Replace(Htmlstring, @"&(cent|#162);", "/xa2", RegexOptions.IgnoreCase);  

        Htmlstring = Regex.Replace(Htmlstring, @"&(pound|#163);", "/xa3", RegexOptions.IgnoreCase);  

        Htmlstring = Regex.Replace(Htmlstring, @"&(copy|#169);", "/xa9", RegexOptions.IgnoreCase);         Htmlstring = Regex.Replace(Htmlstring, @"&#(/d+);", "", RegexOptions.IgnoreCase);

        Htmlstring.Replace("<", "");

        Htmlstring.Replace(">", "");

        Htmlstring.Replace("/r/n", "");

        Htmlstring = HttpContext.Current.Server.HtmlEncode(Htmlstring).Trim();

        return Htmlstring;

    }
以上方式并非是最好的方式,还可以有更简洁的写法,当然掌握了上述方法,可以引出其它,如自动归整,保留部分标签,如链接,图片等。

附代码:
public string CleanWord(string strHtml)
        {
            string[] aryReg ={
                                  
          @"<script[^>]*?>.*?</script>",
          @"<p[^>]*>",
          @"</p[^>]*>",
          @"<div[^>]*>",
          @"</div[^>]*>",
          @"<br[^>]*>",
          @"<img",
          @"<a",
          @"</a[^>]*>",
          @"<[^>]+>|</[^>]+>",
          @"([/r/n])[/s]+",
          @"&(quot|#34);",
          @"&(amp|#38);",
          @"&(lt|#60);",
          @"&(gt|#62);",
          @"&(nbsp|#160);",
          @"&(iexcl|#161);",
          @"&(cent|#162);",
          @"&(pound|#163);",
          @"&(copy|#169);",
          @"&#(/d+);",
          @"-->",
          @"<!--.*/n",
          @"<FONT.*?>",
          @"<SPAN.*?>",
          @"<?xml.*?/>",
          @"</?",
          @"<(///s*)?!?((/w+:)?/w+)(/w+(/s*=?/s*(([""'])([url=file:////[&quot;&quot;'tbnr]|[^/7])*?/7|/w+)|.{0})|/s)*?(///s]//[""'tbnr]|[^/7])*?/7|/w+)|.{0})|/s)*?(///s[/url]*)?>",
          @"/[p/]",
          @"/[//p/]",
          @"/[div/]",
          @"/[//div/]",
          @"/[br/]",
          @"/[img",
          @"/[a",
          @"/[//a/]",
         };
            string[] aryRep = {
                                  
           "",
                                  "[p]",
                                  "[/p]",
                                  "[div]",
                                  "[/div]",
                                  "[br]",
                                  "[img",
                                  "[a",
                                  "[/a]",
           "",
           "",    
           "/"",
           "&",
           "<",
           ">",
           " ",
           "/xa1",
           "/xa2",
           "/xa3",
           "/xa9",
           "",
           "/r/n",
           "",
           "",
           "",
           "",
           "",
           "",
                                   "<div style='text-indent:2em'>",
           "</div>",
                                   "<div style='text-indent:2em'>",
           "</div>",
           "<div style='text-indent:2em'>&nbsp;</div>",
           "<img",
                                   "<a",
                                  "</a>"
          };
            string newReg = aryReg[0];
            string strOutput = strHtml;
            for (int i = 0; i < aryReg.Length; i++)
            {
                Regex regex = new Regex(aryReg[i], RegexOptions.IgnoreCase);
                strOutput = regex.Replace(strOutput, aryRep[i]);
            }
            //strOutput.Replace("[p]", "<p>");
            //strOutput.Replace("[/p]>", "</p>");
            //strOutput.Replace("[br]", "<br />");
            //strOutput.Replace("[img", "<img");
            return strOutput;
        }

本文引用地址: http://www.sciencenet.cn/m/user_content.aspx?id=39127

原创粉丝点击