欢迎使用CSDN-markdown编辑器

来源:互联网 发布:兴业银行淘宝信用卡 编辑:程序博客网 时间:2024/06/06 02:11

最近在使用 JSOUP 作为 爬虫 爬取数据,在用习惯了 JSOUP 后,因为那种链式结构,非常喜欢,故想用它来请求接口,构造请求头的时候非常方便。其实它必须是支持的,因为底层使用的还是 HttpConnection 做为处理的,代码如下:

Document doc = Jsoup        .connect(Constant.DATA_URL)        .header("Accept", "*/*")        .header("Accept-Encoding", "gzip, deflate")        .header("Accept-Language","zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3")        .header("Content-Type", "application/json;charset=UTF-8")        .header("User-Agent","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0")        .timeout(10000).get();Element body = doc.body();JSONObject json = JSONObject.fromObject(body.text());

但是出现问题了,请求就报错:

org.jsoup.UnsupportedMimeTypeException: Unhandled content type. Must be text/*, application/xml, or application/xhtml+xml. Mimetype=application/json;charset=UTF-8, URL=http://www.baidu.com/    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:600)    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:540)    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:227)    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:216)

当然一看就明白,说没有指定类型。找了如下解决方案

 Response res = Jsoup.connect(Constant.DATA_URL)        .header("Accept", "*/*")        .header("Accept-Encoding", "gzip, deflate")        .header("Accept-Language","zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3")        .header("Content-Type", "application/json;charset=UTF-8")        .header("User-Agent","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0")        .timeout(10000).ignoreContentType(true).execute();//.get();String body = res.body();JSONObject json = JSONObject.fromObject(body);

请求结果如下
上面其实关键点在于:ignoreContentType(true) ,这个是忽略请求类型。建议用execute() 去执行,如果用get 去执行的话,返回来是一个 HTML 页面包裹的 JSON ,你处理起来稍微有点费劲。
不过我最后还是换做用 HttpConnection 来解决。

InputStreamReader reader = null;BufferedReader in = null;try {    URL url = new URL(Constant.DATA_URL);    URLConnection connection = url.openConnection();    connection.setConnectTimeout(1000);    reader = new InputStreamReader(connection.getInputStream(), "UTF-8");    in = new BufferedReader(reader);    String line = null; // 每行内容    StringBuffer content = new StringBuffer();    while ((line = in.readLine()) != null) {        content.append(line);    }    if (StringUtils.isNotBlank(content)) {        String jsonStr = content.toString().replaceAll("\\n", "");        data = JSONObject.fromObject(jsonStr);    }} catch (SocketTimeoutException e) {    System.out.println("连接超时!!!");} catch (JSONException e) {    System.out.println("网站响应不是json格式,无法转化成JSONObject!!!");} catch (Exception e) {    System.out.println("连接网址不对或读取流出现异常!!!");} finally {    if (in != null) {        try {            in.close();        } catch (IOException e) {            System.out.println("关闭流出现异常!!!");        }    }    if (reader != null) {        try {            reader.close();        } catch (IOException e) {            System.out.println("关闭流出现异常!!!");        }    }}

建议带cookie的验证选择jsoup

原创粉丝点击