简易爬虫抓取网页内容并写入文本

来源:互联网 发布:java流程管理系统 编辑:程序博客网 时间:2024/06/08 00:32
最近学习lucene,涉及爬虫。通过网络资源包抓取网络资源,附上代码:
public class Dsfa {public static void main(String[] args) {HttpClient client = new HttpClient();GetMethod getMethod = new GetMethod("http://blog.csdn.net/luo_da/article/details/76135572");getMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,new DefaultHttpMethodRetryHandler());try {int statusCode = client.executeMethod(getMethod);if (statusCode != HttpStatus.SC_OK) {System.out.println("获取失败..." + getMethod.getStatusLine());}byte[] responseBody = getMethod.getResponseBody();FileOutputStream fileOutputStream = new FileOutputStream("content.txt");// 将文件读取到本地文本fileOutputStream.write(responseBody);fileOutputStream.close();} catch (HttpException e) {e.printStackTrace();System.out.println("获取失败,请重新获取...");} catch (IOException e) {e.printStackTrace();} finally {getMethod.releaseConnection();}}}

进行网络资源,需要用到的包:

commons-codec-1.10.jar
commons-httpclient-3.1.jar
commons-logging-1.1.1.jar

资源下载地址:http://download.csdn.net/detail/luo_da/9912913

阅读全文
0 0
原创粉丝点击