java爬虫教程01
来源:互联网 发布:有公积金借钱软件 编辑:程序博客网 时间:2024/06/05 09:28
学习本教程前,先得了解http协议.
心血来潮,来搞点事情.
本教程基于: httpcomponents-client,主要根据官方文档讲解.
http://hc.apache.org/httpcomponents-client-5.0.x/examples.html
本人使用的是4.5.3版,下载地址:
http://hc.apache.org/downloads.cgi
一个简单的爬虫
主要步骤:
1. CloseableHttpClient httpclient = HttpClients.createDefault();获取一个httpclient,即客户端,相当于浏览器
2. HttpGet httpget = new HttpGet(“http://bkjw.sxu.edu.cn/“);构建一个请求,主要有HttpGet和HttpPost两种
3. ResponseHandler< String > responseHandler = new ResponseHandler< String >()构建响应处理
4. String responseBody = httpclient.execute(httpget, responseHandler);使用客户端执行请求信息
5. httpclient.close();关闭客户端
import java.io.IOException;import org.apache.http.HttpEntity;import org.apache.http.HttpResponse;import org.apache.http.client.ClientProtocolException;import org.apache.http.client.ResponseHandler;import org.apache.http.client.methods.HttpGet;import org.apache.http.impl.client.CloseableHttpClient;import org.apache.http.impl.client.HttpClients;import org.apache.http.util.EntityUtils;/** * This example demonstrates the use of the {@link ResponseHandler} to simplify * the process of processing the HTTP response and releasing associated resources. */public class ClientWithResponseHandler { public final static void main(String[] args) throws Exception { CloseableHttpClient httpclient = HttpClients.createDefault(); try { HttpGet httpget = new HttpGet("http://bkjw.sxu.edu.cn/"); System.out.println("Executing request " + httpget.getRequestLine()); // Create a custom response handler ResponseHandler<String> responseHandler = new ResponseHandler<String>() { @Override public String handleResponse( final HttpResponse response) throws ClientProtocolException, IOException { int status = response.getStatusLine().getStatusCode(); if (status >= 200 && status < 300) { HttpEntity entity = response.getEntity(); return entity != null ? EntityUtils.toString(entity) : null; } else { throw new ClientProtocolException("Unexpected response status: " + status); } } }; String responseBody = httpclient.execute(httpget, responseHandler); System.out.println("----------------------------------------"); System.out.println(responseBody); } finally { httpclient.close(); } }}
阅读全文
0 0
- java爬虫教程01
- java爬虫:Heritrix教程
- java爬虫教程02
- JAVA爬虫WebCollector教程列表
- Java 爬虫 菜逼教程 00
- java爬虫教程:模拟用户表单登录
- 分布式网络爬虫Nutch中文教程nutcher(JAVA)
- JS爬虫,Java爬虫
- python爬虫教程大全
- python爬虫教程大全
- python爬虫教程大全
- python爬虫教程大全
- 【python】爬虫教程大全
- python 爬虫教程
- python爬虫教程大全
- python爬虫教程大全
- python爬虫教程大全
- Python爬虫教程大全
- Machine scheduling HDU
- 机器学习方法篇(5)------神经网络概述
- JavaScript与jQuery的区别
- 可重入函数与线程安全
- 农夫追牛问题,广搜
- java爬虫教程01
- 微型四旋翼飞行器的设计与制作
- 读书笔记-人月神话18
- OPENCV与vs2015环境搭建
- androidstudio maven仓库配置
- 机器学习中应用到的概率论的知识(作为回顾)
- 《深度学习原理与TensorFlow实践》学习笔记(三)
- struts2--单个文件上传
- Vim教程二:文档编辑