Java下使用xpath解析html文件

来源：互联网发布：淘宝改差评链接在哪里编辑：程序博客网时间：2024/04/28 10:13

这段时间一直在网络上扒取数据，为了方便就用到了xpath，一开始使用真的是挺难的，不过用用就好了，但是网上的资源少之又少，所以自己就写个博客把～～～

首先需要下载三个jar包 nekohtml.jar xalan.jar xerceslmpl.jar

在下面的这个文章中可以找自己想要的语法结构

http://blog.csdn.net/dyoldfish/article/details/1774539

直接上代码

不好意思，这个文本编辑器实在是不好用，凑合着看～～

import org.apache.xpath.XPathAPI;import org.cyberneko.html.parsers.DOMParser;import org.w3c.dom.DOMException;import org.w3c.dom.Document;import org.w3c.dom.Node;import org.w3c.dom.NodeList;import org.xml.sax.InputSource;public class Hxpath {DOMParser parser = new DOMParser();             try {                    //设置网页的默认编码                    parser.setProperty("http://cyberneko.org/html/properties/default-encoding","gb2312");                    parser.setFeature("http://xml.org/sax/features/namespaces", false);//可以修改获取页面的编码方式，在最后的那个参数                    BufferedReader in = new BufferedReader(new InputStreamReader(new URL("www.baidu.com").openStream(),"gb2312"));                    parser.parse(new InputSource(in)); in.close();                     } catch (Exception e) {                        e.printStackTrace();                      }                         Document doc = parser.getDocument();                         String productsXpath = "//UL/LI/A[@class='orange"; //xpath语句                          NodeList products; products = XPathAPI.selectNodeList(doc, productsXpath);                         String temp="";Node node = null;String p = "";                for (int i = 0; i < products.getLength(); i++) {                    node = products.item(i);temp=node.getTextContent().trim();                    System.out.println("i="+i+" "+temp);                }}
整个程序基本是这样的，在其中修改url值和xpath语法就可以分析html页面，如果是本地的html文件，没有试过，这个是直接从网络获取

//UL/LI/A[@class='orange"这句话的意思就是获取ul标签下的li标签下的a标签，并且a标签的class属性是orange的文本信息

//UL/LI/A/attribute::href  表示获取a标签的href属性
两个语句之间使用 | 可以匹配多个
html标签必须大写
如果还有其他问题就留言～～～感觉这个东西还算是很好用的～～～


                                                     0        0           	
					
					   Java下使用xpath解析html文件
	  	   java 使用xpath解析xml和html
	  	   使用XPath解析html
	  	   使用XPath解析html
	  	   使用 XPath 解析 HTML 文档
	  	   python 使用xpath解析html
	  	   使用 XPath 解析 HTML 文档
	  	   使用xPath读取html文件
	  	   使用XPath解析XML文件
	  	   在java中使用xpath来解析xml文件
	  	   iPhone：使用hpple解析html，Xpath导航
	  	   使用XPath解析HTML获取网页内容
	  	   iPhone：使用hpple解析html，Xpath导航
	  	   python 使用lxml解析html（xpath）
	  	   xpath解析HTML
	  	   XPath--解析Html
	  	   使用 XPATH 和 HTML Cleaner 解析 HTML/XML
	  	   libxml中使用xpath解析xml文件
	     		  
	  	   Excel当中Index函数和offset函数的…
	  	   Excel引用函数-实例1
	  	   Excel获取行数和列数
	  	   使用OFFSET函数完成二级城市菜单
	  	   Excel雷达图
	  	   Java下使用xpath解析html文件
	  	   Excel当中Rank函数
	  	   Excel自定义函数VBA
	  	   VBA条件选择结构
	  	   VBA循环结构
	  	   Excel点击按钮弹出对话框
	  	   ps制作透明气泡效果教程
	  	   英寸厘米对换关系
	  	   Word文件的保存类型