Jsoup入门-java
来源:互联网 发布:linux系统dd命令 编辑:程序博客网 时间:2024/05/01 03:17
1,解析一个本地网页。
File input = new File("/tmp/input.html");Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");Elements links = doc.select("a[href]"); // a with hrefElements pngs = doc.select("img[src$=.png]"); // img with src ending .pngElement masthead = doc.select("div.masthead").first(); // div with class=mastheadElements resultLinks = doc.select("h3.r > a"); // direct a after h3
2,描述
Selector overview
tagname
: find elements by tag, e.g.a
ns|tag
: find elements by tag in a namespace, e.g.fb|name
finds<fb:name>
elements#id
: find elements by ID, e.g.#logo
.class
: find elements by class name, e.g..masthead
[attribute]
: elements with attribute, e.g.[href]
[^attr]
: elements with an attribute name prefix, e.g.[^data-]
finds elements with HTML5 dataset attributes[attr=value]
: elements with attribute value, e.g.[width=500]
(also quotable, likesequence"
)[attr^=value]
,[attr$=value]
,[attr*=value]
: elements with attributes that start with, end with, or contain the value, e.g.[href*=/path/]
[attr~=regex]
: elements with attribute values that match the regular expression; e.g.img[src~=(?i)\.(png|jpe?g)]
*
: all elements, e.g.*
Selector combinations
el#id
: elements with ID, e.g.div#logo
el.class
: elements with class, e.g.div.masthead
el[attr]
: elements with attribute, e.g.a[href]
- Any combination, e.g.
a[href].highlight
ancestor child
: child elements that descend from ancestor, e.g..body p
findsp
elements anywhere under a block with class "body"parent > child
: child elements that descend directly from parent, e.g.div.content > p
findsp
elements; andbody > *
finds the direct children of the body tagsiblingA + siblingB
: finds sibling B element immediately preceded by sibling A, e.g.div.head + div
siblingA ~ siblingX
: finds sibling X element preceded by sibling A, e.g.h1 ~ p
el, el, el
: group multiple selectors, find unique elements that match any of the selectors; e.g.div.masthead, div.logo
Pseudo selectors
:lt(n)
: find elements whose sibling index (i.e. its position in the DOM tree relative to its parent) is less thann
; e.g.td:lt(3)
:gt(n)
: find elements whose sibling index is greater thann
; e.g.div p:gt(2)
:eq(n)
: find elements whose sibling index is equal ton
; e.g.form input:eq(1)
:has(seletor)
: find elements that contain elements matching the selector; e.g.div:has(p)
:not(selector)
: find elements that do not match the selector; e.g.div:not(.logo)
:contains(text)
: find elements that contain the given text. The search is case-insensitive; e.g.p:contains(jsoup)
:containsOwn(text)
: find elements that directly contain the given text:matches(regex)
: find elements whose text matches the specified regular expression; e.g.div:matches((?i)login)
:matchesOwn(regex)
: find elements whose own text matches the specified regular expression- Note that the above indexed pseudo-selectors are 0-based, that is, the first element is at index 0, the second at 1, etc
0 0
- Jsoup入门-java
- JSoup入门笔记--------java HTML 解析库
- Java爬虫入门之Jsoup使用
- Jsoup入门
- Jsoup入门
- Jsoup入门
- Jsoup入门
- Jsoup入门
- Jsoup入门
- jsoup入门
- JAVA jsoup
- JSOUP入门指南-Java开发的HTML解析器
- JSOUP入门指南-Java开发的HTML解析器
- 运用jsoup和Java语言模拟登陆(入门级)
- jsoup Cookbook(中文版) 入门
- Jsoup入门指南
- 01-jsoup入门
- 爬虫之jsoup入门指南
- ZOJ Problem Set - 3795(缩点拓补)
- Linux命令汇总之a.*
- python socket编程练习
- python excel文件操作
- hdu 1009 FatMouse' Trade(贪心)
- Jsoup入门-java
- Codeforces Round #280 (Div. 2) D
- windows更改没人格林威治时间命令
- asp.net 生成、解析条形码和二维码
- Android事件传递机制(更加深入的了解事件的触发过程)
- hdu 1012 u Calculate e
- css用法
- javascript匹配字符串是否符合链接URL要求
- google go 模板处理