简易爬虫抓取网页内容并写入文本

来源：互联网发布：java流程管理系统编辑：程序博客网时间：2024/06/08 00:32

最近学习lucene，涉及爬虫。通过网络资源包抓取网络资源，附上代码：

public class Dsfa {public static void main(String[] args) {HttpClient client = new HttpClient();GetMethod getMethod = new GetMethod("http://blog.csdn.net/luo_da/article/details/76135572");getMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,new DefaultHttpMethodRetryHandler());try {int statusCode = client.executeMethod(getMethod);if (statusCode != HttpStatus.SC_OK) {System.out.println("获取失败..." + getMethod.getStatusLine());}byte[] responseBody = getMethod.getResponseBody();FileOutputStream fileOutputStream = new FileOutputStream("content.txt");// 将文件读取到本地文本fileOutputStream.write(responseBody);fileOutputStream.close();} catch (HttpException e) {e.printStackTrace();System.out.println("获取失败，请重新获取...");} catch (IOException e) {e.printStackTrace();} finally {getMethod.releaseConnection();}}}

进行网络资源，需要用到的包：

commons-codec-1.10.jar
commons-httpclient-3.1.jar
commons-logging-1.1.1.jar

资源下载地址：http://download.csdn.net/detail/luo_da/9912913

阅读全文

0 0