c#中从html中使用GetElementsByTagName来获取数据的例子

来源:互联网 发布:淘宝原始头像 编辑:程序博客网 时间:2024/05/01 02:28

部分代码如下: 

if (doc.GetElementById(i.ToString()) != null)      //
                {
                    String str_content = "";
                     HtmlElement  he=doc.GetElementById(i.ToString());
                    Console.WriteLine(he.InnerHtml);                          //要解析的原始的代码he.InnerHtml
                    for (int j = 0; j < 7; j++)
                    {
                        if(j==0)                                                             //获取第一个DIV中插入的txt
                        {
                            str_content = he.GetElementsByTagName("div")[0].InnerText;
                            gdata.setid(str_content);
                      //      Console.WriteLine(j + "           " + str_content);
                            continue;
                        }
                        if (j == 1)                                                       //获取DIV第一个IMG的src的属性值
                        {                        
                            str_content = he.GetElementsByTagName("img")[0].GetAttribute("src");
                            gdata.setimgsrc(str_content);
                      //      Console.WriteLine(j + "           " + str_content);
                            continue;
                        }
                        if (j == 5)                                                    //获取第6个DIV中插入的txt
                        {                        
                            str_content = he.GetElementsByTagName("div")[5].InnerText;
                            gdata.setcontent(str_content);
                        //    Console.WriteLine(j + "           " + str_content);
                            continue;
                        }
                        if (j == 6)                                                    //获取DIV 中第二个A(链接)的属性名位_productId的值
                        {                       
                            str_content = he.GetElementsByTagName("A")[1].GetAttribute("_productId");
                            gdata.setproductid(str_content);
                      //      Console.WriteLine(j + "           " + str_content);
                            continue;
                        }
                      
                    }

 

例如要解析的源码如下全是DIV嵌套:

<DIV class=orderNum>1</DIV>
<DIV class=pic id=yui-gen0><B class=picRind><IMG class=picCore height=75 alt="promotional gifts" src="http://img.vip.summ.jpg" width=75 onload=setImgSizeWH(this.src,this,75,75); border=0></B> </DIV>
<DIV style="CLEAR: both; MARGIN: 0px auto; WIDTH: 100px; POSITION: relative">
<DIV id=MP_shopWindow_214293296 style="BORDER-RIGHT: #ffb64b 1px solid; BORDER-TOP: #ffb64b 1px solid; DISPLAY: none; FONT-SIZE: 11px; RIGHT: 16px; BACKGROUND: #fff; BORDER-LEFT: #ffb64b 1px solid; CURSOR: pointer; COLOR: #7b2e00; BOTTOM: 0px; LINE-HEIGHT: 10px; BORDER-BOTTOM: #ffb64b 1px solid; POSITION: absolute; HEIGHT: 10px">MP</DIV>
<DIV id=AD_shopWindow_214293296 style="BORDER-RIGHT: #ffb64b 1px solid; BORDER-TOP: #ffb64b 1px solid; DISPLAY: none; FONT-SIZE: 11px; RIGHT: 0px; BACKGROUND: #fff; BORDER-LEFT: #ffb64b 1px solid; CURSOR: pointer; COLOR: #7b2e00; BOTTOM: 0px; LINE-HEIGHT: 10px; BORDER-BOTTOM: #ffb64b 1px solid; POSITION: absolute; HEIGHT: 10px">AD</DIV></DIV>
<DIV class=productname><A href="http://sh.com/" target=_blank>promotional gifts </A></DIV>
<DIV class=commandArea><A class="btnRemove noVisited underline" href="javascript:vd()" _productId="88888">移除</A></DIV>

 

解析后获取的数据如下
0           1
1           http://img.vip.summ.jpg
5           promotional gifts
6           88888

原创粉丝点击