Treating HTML like XML using HtmlAgilityPack, and doing it inside of an XSLT too [转载]
来源:互联网 发布:知敬畏守规矩作文 编辑:程序博客网 时间:2024/05/21 16:37
I was not able to post this on Simon Mourier's blog due to the HTML and XSLT tags, so here it is on mine:
Maybe someone has done this already, but I don't see it in the comments.
I created an XSLT extension object based on HtmlAgilityPack. The class is tiny:
using System;
using System.Collections.Generic;
using System.Text;
using HtmlAgilityPack;
using System.Xml;
using System.Xml.XPath;
using System.IO;
namespace HtmlAgilityPack
{
public class XslExtension
{
public XmlDocument loadhtmlasxml(string url)
{
// Create an instance of the HtmlWeb object
HtmlWeb web = new HtmlWeb();
// Declare necessary stream and writer objects
MemoryStream m = new MemoryStream();
XmlTextWriter xtw = new XmlTextWriter(m,null);
// Load the content into the writer
web.LoadHtmlAsXml(url, xtw);
// Rewind the memory stream
m.Position = 0;
// Create, fill, and return the xml document
XmlDocument xdoc = new XmlDocument();
xdoc.LoadXml((new StreamReader(m)).ReadToEnd());
return xdoc;
}
}
}
Then, I used NXSLT from http://www.xmllab.net to load the custom extension function in from the command line so that the following XSL style sheet can be used directly:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:hap="http://smourier.blogspot.com"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
version="1.0">
<xsl:output method="html" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<h1>BEGIN TEST OF HtmlAgilityPack.XslExtension</h1>
<h2>First, connect to http://www.cnn.com and load its node set into a local variable</h2>
<xsl:variable name="cnn"><xsl:copy-of select="hap:loadhtmlasxml('http://www.cnn.com')" /></xsl:variable>
<h3>CNN.com has this many nodes:</h3>
<xsl:value-of select="count(msxsl:node-set($cnn)//*)" />
<h2>Now, process all the A tags within the "Special Converage" stories inside the "div class="cnnLSSpecialCovBoxContent" that have an HREF that starts with /2005.</h2>
<h3>Special Coverage</h3>
<xsl:for-each select="msxsl:node-set($cnn)//div[@class='cnnLSSpecialCovBoxContent']//a[starts-with(@href, '/2005/')]">
<div>
<h3><xsl:copy-of select="." /></h3>
<!-- Now get the images from each story if they exist -->
<h5>Connecting to: <xsl:value-of select="concat('http://www.cnn.com', @href)" /> to retrieve image if it exists</h5>
<xsl:copy-of select="hap:loadhtmlasxml(concat('http://www.cnn.com', @href))//img[@height = '168']" />
<br /><br />
</div>
</xsl:for-each>
<h1>END TEST OF HtmlAgilityPack.XslExtension</h1>
</xsl:template>
</xsl:stylesheet>
The command for NXSLT to perform this is:
nxslt2.exe source.xml source.xsl -ext hap:HtmlAgilityPack.XslExtension xmlns:hap="http://smourier.blogspot.com" -af ./HtmlAgilityPackXs
lExtension.dll
The style sheet connects to CNN.com using the syntax:
select="hap:loadhtmlasxml('http://www.cnn.com')"
Then, further down, after it processes each of the selected A HREF's, it connects to each of the linked stories and retrieves any images with height 168, outputting the HTML result tree.
This could allow for any number of descendent link followings. I haven't worked out the automatic form processor yet, but I think that could be an XSLT extension too perhaps...
Let me know what you think...
http://blogs.wdevs.com/ultravioletconsulting/archive/2005/09/10/10506.aspx
- Treating HTML like XML using HtmlAgilityPack, and doing it inside of an XSLT too [转载]
- Save an XML File to Database and Send an Email using XSLT
- Transform XML into HTML using XSLT
- Transform XML into HTML using XSLT
- Transform XML into HTML using XSLT
- Using XML And XSLT In Delphi
- What does an XML document look like inside?
- Simple sample for transforming XML to HTML by using XSLT
- XML document processing in Java using XPath and XSLT
- Equivalent of CONTAINS and LIKE in an IF statement
- Is there an XML version of HTML?
- Display an RSS Feed Using XSLT
- Inside XSLT
- Parse an XML string: Using DOM and a StringReader.
- Distinguish between index of a decimal number and integer inside an array in Ruby?
- Stripping HTML tags when using XSLT
- How to make text of an html checkbox clickable (like ASP.NET control)
- Draggable revert if outside this div and inside of other draggables (using both invalid and valid re
- 十天学会php之第九天
- 重新审视SqlDataReader的使用
- 活动目录联合服务(ADFS)
- 我的互联网
- OpenSource 的 Free是自由 非免费
- Treating HTML like XML using HtmlAgilityPack, and doing it inside of an XSLT too [转载]
- C++中实现变长数组
- "Ascend.Net" Windows Forms Controls
- 十天学会php之第十天
- ----------------------------MSSQL多列取最大或者最小值---------------
- Sql Server 2005 ROW_NUMBER 函数实现分页
- Google免费主页服务
- sqlserver 2000/2005 Ambiguous column error错误解决办法
- 用Eclipse MyEclipse WebLogic8.1开发第一个Web程序