Java网络爬虫crawler4j学习笔记<24> PageFetchResult类
来源:互联网 发布:java 线程局部变量 编辑:程序博客网 时间:2024/05/21 05:07
源代码
package edu.uci.ics.crawler4j.fetcher;import java.io.EOFException;import java.io.IOException;import org.apache.http.Header;import org.apache.http.HttpEntity;import org.apache.http.util.EntityUtils;import edu.uci.ics.crawler4j.crawler.Page;import org.slf4j.Logger;import org.slf4j.LoggerFactory;/** * @author Yasser Ganjisaffar [lastname at gmail dot com] */// 使用httpclient包fetch page之后存储的结果public class PageFetchResult { protected static final Logger logger = LoggerFactory.getLogger(PageFetchResult.class); protected int statusCode; // 状态码 protected HttpEntity entity = null; // httpEntity对象 protected Header[] responseHeaders = null; // 响应消息头 protected String fetchedUrl = null; // url链接 protected String movedToUrl = null; public int getStatusCode() { return statusCode; } public void setStatusCode(int statusCode) { this.statusCode = statusCode; } public HttpEntity getEntity() { return entity; } public void setEntity(HttpEntity entity) { this.entity = entity; } public Header[] getResponseHeaders() { return responseHeaders; } public void setResponseHeaders(Header[] responseHeaders) { this.responseHeaders = responseHeaders; } public String getFetchedUrl() { return fetchedUrl; } public void setFetchedUrl(String fetchedUrl) { this.fetchedUrl = fetchedUrl; } public boolean fetchContent(Page page) { try { // 将fetch后的结果解析转换成page对象 page.load(entity); page.setFetchResponseHeaders(responseHeaders); return true; } catch (Exception e) { logger.info("Exception while fetching content for: {} [{}]", page.getWebURL().getURL(), e.getMessage()); } return false; } // 忽略content,不进行处理 public void discardContentIfNotConsumed() { try { if (entity != null) { EntityUtils.consume(entity); } } catch (IOException e) { // We can EOFException (extends IOException) exception. It can happen on compressed streams which are not repeatable // We can ignore this exception. It can happen if the stream is closed. } catch (Exception e) { logger.warn("Unexpected error occurred while trying to discard content", e); } } public String getMovedToUrl() { return movedToUrl; } public void setMovedToUrl(String movedToUrl) { this.movedToUrl = movedToUrl; }}
0 0
- Java网络爬虫crawler4j学习笔记<24> PageFetchResult类
- Java网络爬虫crawler4j学习笔记<2> Util类
- Java网络爬虫crawler4j学习笔记<3> IO类
- Java网络爬虫crawler4j学习笔记<4> Net类
- Java网络爬虫crawler4j学习笔记<5> TLDList类
- Java网络爬虫crawler4j学习笔记<6> WebURL类
- Java网络爬虫crawler4j学习笔记<7> UrlResolver类
- Java网络爬虫crawler4j学习笔记<8> URLCanonicalizer类
- Java网络爬虫crawler4j学习笔记<9> RuleSet类
- Java网络爬虫crawler4j学习笔记<10> HostDirectives类
- Java网络爬虫crawler4j学习笔记<11> RobotstxtConfig类
- Java网络爬虫crawler4j学习笔记<12> RobotstxtParser类
- Java网络爬虫crawler4j学习笔记<13> AuthInfo类
- Java网络爬虫crawler4j学习笔记<14> BasicAuthInfo类
- Java网络爬虫crawler4j学习笔记<15> FormAuthInfo类
- Java网络爬虫crawler4j学习笔记<17> CrawlConfig类
- Java网络爬虫crawler4j学习笔记<18> Configurable类
- Java网络爬虫crawler4j学习笔记<21> Page 类
- 11.8时空传送
- 重载特殊操作符:[]、=、==、!=
- 《Web前端开发最佳实践》读书笔记
- 城市级联
- 11.10模拟赛
- Java网络爬虫crawler4j学习笔记<24> PageFetchResult类
- error while loading shared libraries : libts-0.0.so.0:cannot open shared object file: No such file o
- C语言记录之九
- 设置Activity、AppcompatActivity为透明
- 团队项目_动作游戏demo(1)
- xlsx文件解析处理:openpyxl库 csv文件格式生成:csv
- HDU 4028 The time of a day By Assassin dp+离散化!
- Sum All Numbers in a Range 返回两个数字和它们之间所有数字的和
- 标准重载代码