java爬虫教程01

来源:互联网 发布:有公积金借钱软件 编辑:程序博客网 时间:2024/06/05 09:28

学习本教程前,先得了解http协议.
心血来潮,来搞点事情.
本教程基于: httpcomponents-client,主要根据官方文档讲解.
http://hc.apache.org/httpcomponents-client-5.0.x/examples.html

本人使用的是4.5.3版,下载地址:
http://hc.apache.org/downloads.cgi

一个简单的爬虫

主要步骤:
1. CloseableHttpClient httpclient = HttpClients.createDefault();获取一个httpclient,即客户端,相当于浏览器
2. HttpGet httpget = new HttpGet(“http://bkjw.sxu.edu.cn/“);构建一个请求,主要有HttpGet和HttpPost两种
3. ResponseHandler< String > responseHandler = new ResponseHandler< String >()构建响应处理
4. String responseBody = httpclient.execute(httpget, responseHandler);使用客户端执行请求信息
5. httpclient.close();关闭客户端

import java.io.IOException;import org.apache.http.HttpEntity;import org.apache.http.HttpResponse;import org.apache.http.client.ClientProtocolException;import org.apache.http.client.ResponseHandler;import org.apache.http.client.methods.HttpGet;import org.apache.http.impl.client.CloseableHttpClient;import org.apache.http.impl.client.HttpClients;import org.apache.http.util.EntityUtils;/** * This example demonstrates the use of the {@link ResponseHandler} to simplify * the process of processing the HTTP response and releasing associated resources. */public class ClientWithResponseHandler {    public final static void main(String[] args) throws Exception {        CloseableHttpClient httpclient = HttpClients.createDefault();        try {            HttpGet httpget = new HttpGet("http://bkjw.sxu.edu.cn/");            System.out.println("Executing request " + httpget.getRequestLine());            // Create a custom response handler            ResponseHandler<String> responseHandler = new ResponseHandler<String>() {                @Override                public String handleResponse(                        final HttpResponse response) throws ClientProtocolException, IOException {                    int status = response.getStatusLine().getStatusCode();                    if (status >= 200 && status < 300) {                        HttpEntity entity = response.getEntity();                        return entity != null ? EntityUtils.toString(entity) : null;                    } else {                        throw new ClientProtocolException("Unexpected response status: " + status);                    }                }            };            String responseBody = httpclient.execute(httpget, responseHandler);            System.out.println("----------------------------------------");            System.out.println(responseBody);        } finally {            httpclient.close();        }    }}
原创粉丝点击