java 多线程实现 爬虫京东搜索商品爬虫
来源:互联网 发布:淘宝买家刷到钻要多久 编辑:程序博客网 时间:2024/05/17 22:58
第一步
我们先来分析一下我们本次需要的参数内容
入口如下
https://search.jd.com/Search?keyword=%E7%AC%94%E8%AE%B0%E6%9C%AC%E7%94%B5%E8%84%91&enc=utf-8&wq=%E7%AC%94%E8%AE%B0%E6%9C%AC%E7%94%B5%E8%84%91&pvid=0b09350ac3df4f24886bb7a35d3b69ff
位置分析
id="J_goodsList"
所有商品都在这个容器中
data-sku="5025518"
商品的编号
class="p-price"
商品的价格
class="p-name p-name-type-2"
商品名称
class="err-product" src
图片位置所在的img
我们需要去下总页数
入口如下
https://search.jd.com/Search?keyword=笔记本电&enc=utf-8&qrst=1&rt=1&stop=1&vt=2&wq=笔记本电脑&page=3&s=57&click=0
参数解析
keyword
笔记本电脑
关键字
enc
utf-8
编码格式
wq
笔记本电脑
关键字
qrst
1
不知道是个什么鬼,没有也行
rt
1
stop
1
vt
2
我猜可能是步长的
page
3
Page 都是奇数 不知道为什么
第二步
直接上代码
- 创建父工程 主要用来管理jar包版本和插件版本之类的
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.jianqiao.clawer</groupId> <artifactId>clawer-system</artifactId> <packaging>pom</packaging> <version>1.0-SNAPSHOT</version> <modules> <module>clawer-jd-product</module> </modules> <name>clawer-system Maven Webapp</name> <url>http://maven.apache.org</url> <!-- 集中定义依赖版本号 --> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <junit.version>4.12</junit.version> <spring.version>4.1.3.RELEASE</spring.version> <mybatis.version>3.4.1</mybatis.version> <mybatis.spring.version>1.3.1</mybatis.spring.version> <mybatis.paginator.version>1.2.15</mybatis.paginator.version> <mysql.version>5.1.32</mysql.version> <slf4j.version>1.6.4</slf4j.version> <jackson.version>2.4.2</jackson.version> <druid.version>1.0.9</druid.version> <jolbox.version>0.8.0.RELEASE</jolbox.version> <jstl.version>1.2</jstl.version> <servlet-api.version>2.5</servlet-api.version> <jsp-api.version>2.0</jsp-api.version> <joda-time.version>2.5</joda-time.version> <commons-lang3.version>3.3.2</commons-lang3.version> <commons-io.version>1.3.2</commons-io.version> <commons-net.version>3.3</commons-net.version> <pagehelper.version>5.0.3</pagehelper.version> <mapper.version>2.3.4</mapper.version> <jsqlparser.version>0.9.1</jsqlparser.version> <commons-fileupload.version>1.3.1</commons-fileupload.version> <commons-codec.version>1.9</commons-codec.version> <jedis.version>2.7.2</jedis.version> <solrj.version>4.10.3</solrj.version> <dubbo.version>2.5.3</dubbo.version> <zookeeper.version>3.4.7</zookeeper.version> <zkclient.version>0.1</zkclient.version> <activemq.version>5.12.0</activemq.version> <freemarker.version>2.3.23</freemarker.version> <!--quartz--> <quartz.version>2.2.2</quartz.version> <uediter.version>1.1.1</uediter.version> <json.version>20160212</json.version> <fastdfs_client.version>1.25</fastdfs_client.version> <spring-rabbit.version>1.4.0.RELEASE</spring-rabbit.version> <httpclient.version>4.3.5</httpclient.version> <rabbitmq.version>3.4.1</rabbitmq.version> <jsoup.version>1.10.3</jsoup.version> </properties> <dependencyManagement> <dependencies> <!-- 单元测试 --> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>${junit.version}</version> <scope>test</scope> </dependency> <!-- Spring --> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-webmvc</artifactId> <version>${spring.version}</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-jdbc</artifactId> <version>${spring.version}</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-aspects</artifactId> <version>${spring.version}</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-context-support</artifactId> <version>${spring.version}</version> </dependency> <!-- 通用Mapper --> <dependency> <groupId>com.github.abel533</groupId> <artifactId>mapper</artifactId> <version>${mapper.version}</version> </dependency> <!-- Mybatis --> <dependency> <groupId>org.mybatis</groupId> <artifactId>mybatis</artifactId> <version>${mybatis.version}</version> </dependency> <dependency> <groupId>org.mybatis</groupId> <artifactId>mybatis-spring</artifactId> <version>${mybatis.spring.version}</version> </dependency> <!-- 分页助手 --> <dependency> <groupId>com.github.pagehelper</groupId> <artifactId>pagehelper</artifactId> <version>${pagehelper.version}</version> </dependency> <dependency> <groupId>com.github.jsqlparser</groupId> <artifactId>jsqlparser</artifactId> <version>${jsqlparser.version}</version> </dependency> <!-- MySql --> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>${mysql.version}</version> </dependency> <!-- 日志 --> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> <version>${slf4j.version}</version> </dependency> <!-- Jackson Json处理工具包 --> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>${jackson.version}</version> </dependency> <!-- 连接池 --> <dependency> <groupId>com.jolbox</groupId> <artifactId>bonecp-spring</artifactId> <version>${jolbox.version}</version> </dependency> <!-- JSP相关 --> <dependency> <groupId>jstl</groupId> <artifactId>jstl</artifactId> <version>${jstl.version}</version> </dependency> <dependency> <groupId>javax.servlet</groupId> <artifactId>servlet-api</artifactId> <version>${servlet-api.version}</version> <scope>provided</scope> </dependency> <dependency> <groupId>javax.servlet</groupId> <artifactId>jsp-api</artifactId> <version>${jsp-api.version}</version> <scope>provided</scope> </dependency> <!-- 时间操作组件 --> <dependency> <groupId>joda-time</groupId> <artifactId>joda-time</artifactId> <version>${joda-time.version}</version> </dependency> <!-- Apache工具组件 --> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> <version>${commons-lang3.version}</version> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-io</artifactId> <version>${commons-io.version}</version> </dependency> <!-- 文件上传组件 --> <dependency> <groupId>commons-fileupload</groupId> <artifactId>commons-fileupload</artifactId> <version>${commons-fileupload.version}</version> </dependency> <!-- dubbo相关 --> <dependency> <groupId>com.alibaba</groupId> <artifactId>dubbo</artifactId> <version>${dubbo.version}</version> <exclusions> <exclusion> <groupId>org.springframework</groupId> <artifactId>spring</artifactId> </exclusion> <exclusion> <groupId>org.jboss.netty</groupId> <artifactId>netty</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> <version>${zookeeper.version}</version> </dependency> <dependency> <groupId>com.github.sgroschupf</groupId> <artifactId>zkclient</artifactId> <version>${zkclient.version}</version> </dependency> <!-- 加密解密 --> <dependency> <groupId>commons-codec</groupId> <artifactId>commons-codec</artifactId> <version>${commons-codec.version}</version> </dependency> <!-- 定时任务Quartz --> <dependency> <groupId>org.quartz-scheduler</groupId> <artifactId>quartz</artifactId> <version>${quartz.version}</version> </dependency> <!-- ActiveMQ依赖 --> <dependency> <groupId>org.apache.activemq</groupId> <artifactId>activemq-all</artifactId> <version>${activemq.version}</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-jms</artifactId> <version>${spring.version}</version> </dependency> <!-- RabbitMq依赖 --> <dependency> <groupId>org.springframework.amqp</groupId> <artifactId>spring-rabbit</artifactId> <version>${spring-rabbit.version}</version> </dependency> <dependency> <groupId>com.rabbitmq</groupId> <artifactId>amqp-client</artifactId> <version>${rabbitmq.version}</version> </dependency> <!-- 静态化freemarker --> <dependency> <groupId>org.freemarker</groupId> <artifactId>freemarker</artifactId> <version>${freemarker.version}</version> </dependency> <!-- Redis客户端 --> <dependency> <groupId>redis.clients</groupId> <artifactId>jedis</artifactId> <version>${jedis.version}</version> </dependency> <!-- solr客户端 --> <dependency> <groupId>org.apache.solr</groupId> <artifactId>solr-solrj</artifactId> <version>${solrj.version}</version> </dependency> <!-- 百度编辑器 --> <dependency> <groupId>com.baidu</groupId> <artifactId>ueditor</artifactId> <version>${uediter.version}</version> </dependency> <dependency> <groupId>org.json</groupId> <artifactId>json</artifactId> <version>${json.version}</version> </dependency> <dependency> <groupId>com.alibaba.fastdfs</groupId> <artifactId>fastdfs_client</artifactId> <version>${fastdfs_client.version}</version> </dependency> <!-- httpclient --> <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>${httpclient.version}</version> </dependency> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>${jsoup.version}</version> </dependency> </dependencies> </dependencyManagement> <build> <finalName>${project.artifactId}</finalName> <plugins> <!-- 资源文件拷贝插件 --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-resources-plugin</artifactId> <version>2.7</version> <configuration> <encoding>UTF-8</encoding> </configuration> </plugin> <!-- java编译插件 --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.2</version> <configuration> <source>1.7</source> <target>1.7</target> <encoding>UTF-8</encoding> </configuration> </plugin> </plugins> <pluginManagement> <plugins> <!-- 配置Tomcat插件 --> <plugin> <groupId>org.apache.tomcat.maven</groupId> <artifactId>tomcat7-maven-plugin</artifactId> <version>2.2</version> </plugin> </plugins> </pluginManagement> <resources> <!-- 使用Maven部署的时候,xml和properties配置文件也一起部署到Tomcat --> <resource> <directory>src/main/java</directory> <includes> <include>**/*.properties</include> <include>**/*.xml</include> <include>**/*.cnf</include> </includes> <filtering>false</filtering> </resource> <!-- 默认是以下配置 --> <resource> <directory>src/main/resources</directory> <includes> <include>**/*.properties</include> <include>**/*.xml</include> <include>**/*.cnf</include> </includes> <filtering>false</filtering> </resource> </resources> </build></project>
2.创建一个子模块 clawer-jd-product
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <parent> <artifactId>clawer-system</artifactId> <groupId>com.jianqiao.clawer</groupId> <version>1.0-SNAPSHOT</version> </parent> <modelVersion>4.0.0</modelVersion> <artifactId>clawer-jd-product</artifactId> <packaging>war</packaging> <name>clawer-jd-product Maven Webapp</name> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> </dependency> <!-- httpclient --> <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> </dependency> <!-- Apache工具组件 --> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-io</artifactId> </dependency> <!-- 文件上传组件 --> <dependency> <groupId>commons-fileupload</groupId> <artifactId>commons-fileupload</artifactId> </dependency> <!-- Jackson Json处理工具包 --> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> </dependency> <!-- spring相关的 --> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-webmvc</artifactId> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-jdbc</artifactId> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-aspects</artifactId> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-context-support</artifactId> </dependency> <!-- 通用Mapper --> <dependency> <groupId>com.github.abel533</groupId> <artifactId>mapper</artifactId> </dependency> <!-- Mybatis --> <dependency> <groupId>org.mybatis</groupId> <artifactId>mybatis</artifactId> </dependency> <dependency> <groupId>org.mybatis</groupId> <artifactId>mybatis-spring</artifactId> </dependency> <dependency> <groupId>com.github.jsqlparser</groupId> <artifactId>jsqlparser</artifactId> </dependency> <!-- 分页助手 --> <dependency> <groupId>com.github.pagehelper</groupId> <artifactId>pagehelper</artifactId> </dependency> <dependency> <groupId>com.github.jsqlparser</groupId> <artifactId>jsqlparser</artifactId> </dependency> <!-- MySql --> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> </dependency> <!-- Jackson Json处理工具包 --> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> </dependency> <!-- 连接池 --> <dependency> <groupId>com.jolbox</groupId> <artifactId>bonecp-spring</artifactId> </dependency> <!-- JSP相关 --> <dependency> <groupId>jstl</groupId> <artifactId>jstl</artifactId> </dependency> <dependency> <groupId>javax.servlet</groupId> <artifactId>servlet-api</artifactId> <scope>provided</scope> </dependency> <dependency> <groupId>javax.servlet</groupId> <artifactId>jsp-api</artifactId> <scope>provided</scope> </dependency> <!-- 日志 --> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </dependency> <!-- 日志 --> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </dependency> <!-- html解释器 --> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> </dependency> </dependencies> <build> <finalName>clawer-jd-product</finalName> <plugins> <!-- 资源文件拷贝插件 --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-resources-plugin</artifactId> <version>2.7</version> <configuration> <encoding>UTF-8</encoding> </configuration> </plugin> <!-- java编译插件 --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.2</version> <configuration> <source>1.7</source> <target>1.7</target> <encoding>UTF-8</encoding> </configuration> </plugin> <!-- 配置Tomcat插件 --> <plugin> <groupId>org.apache.tomcat.maven</groupId> <artifactId>tomcat7-maven-plugin</artifactId> <configuration> <port>8081</port> <path>/</path> </configuration> </plugin> </plugins> <resources> <!-- 使用Maven部署的时候,xml和properties配置文件也一起部署到Tomcat --> <resource> <directory>src/main/java</directory> <includes> <include>**/*.properties</include> <include>**/*.xml</include> <include>**/*.cnf</include> </includes> <filtering>false</filtering> </resource> <!-- 默认是以下配置 --> <resource> <directory>src/main/resources</directory> <includes> <include>**/*.properties</include> <include>**/*.xml</include> <include>**/*.cnf</include> </includes> <filtering>false</filtering> </resource> </resources> </build></project>
3.1 准备基本的工具类 httpclient 自己封装过了
package com.jianqiao.util;import com.jianqiao.pojo.HttpResult;import org.apache.http.NameValuePair;import org.apache.http.client.ClientProtocolException;import org.apache.http.client.config.RequestConfig;import org.apache.http.client.entity.UrlEncodedFormEntity;import org.apache.http.client.methods.*;import org.apache.http.client.utils.URIBuilder;import org.apache.http.entity.ContentType;import org.apache.http.entity.StringEntity;import org.apache.http.impl.client.CloseableHttpClient;import org.apache.http.message.BasicNameValuePair;import org.apache.http.util.EntityUtils;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.stereotype.Component;import java.io.IOException;import java.net.URI;import java.util.ArrayList;import java.util.List;import java.util.Map;import java.util.Set;/** * @Auther: Alone_XuXu * @Description: 描述信息: 这个是协助 发送 HTTP 请求的小工具 * @Date: Created in 19:24 - 27 - 10 -2017 * @Modified By: */@Componentpublic class HttpClientUtilImpl{ @Autowired private CloseableHttpClient httpClient; @Autowired private RequestConfig config; /** * 带参数的 doGet 请求 * * @param url 请求地址 * @return 响应200 返回网页内容 其它,返回为null * @throws Exception */ public String doGet1(String url, Map<String, Object> params) throws Exception { StringBuilder sb = new StringBuilder(url); sb.append("?"); //如果调用者携带了参数过来 if(params != null && params.size() > 0){ //设置请求参数 Set<Map.Entry<String, Object>> entries = params.entrySet(); //遍历整理参数信息 啊 for (Map.Entry<String, Object> entry : entries) { sb.append(entry.getKey() + "=" + entry.getValue() + "&"); } url = sb.substring(0, sb.length() - 1).toString(); } //创建请求 URIBuilder uriBuilder = new URIBuilder(url); URI uriBuild = uriBuilder.build(); //声明一个请求 HttpGet httpGet = new HttpGet(uriBuild); //执行了这个请求 CloseableHttpResponse executeResponse = null; try { executeResponse = httpClient.execute(httpGet); if(executeResponse.getStatusLine().getStatusCode() == 200){ return EntityUtils.toString(executeResponse.getEntity(), "UTF-8"); } } finally{ if(executeResponse != null){ executeResponse.close(); } } return null; } /** * 带参数的 doGet 请求 * * @param url 请求地址 * @return 响应200 返回网页内容 其它,返回为null * @throws Exception */ public String doGet(String url, Map<String, Object> params) throws Exception { //创建请求 URIBuilder uriBuilder = new URIBuilder(url); //如果调用者携带了参数过来 if(params != null && params.size() > 0){ //设置请求参数 Set<Map.Entry<String, Object>> entries = params.entrySet(); //遍历整理参数信息 啊 for (Map.Entry<String, Object> entry : entries) { uriBuilder.setParameter(entry.getKey(),entry.getValue().toString()); } } URI uriBuild = uriBuilder.build(); //声明一个请求 HttpGet httpGet = new HttpGet(uriBuild); //执行了这个请求 CloseableHttpResponse executeResponse = null; try { executeResponse = httpClient.execute(httpGet); if(executeResponse.getStatusLine().getStatusCode() == 200){ return EntityUtils.toString(executeResponse.getEntity(), "UTF-8"); } } finally{ if(executeResponse != null){ executeResponse.close(); } } return null; } /** * 带有参数的 doPost 请求 * * @throws IOException * @throws ClientProtocolException */ public HttpResult doPost(String url, Map<String, Object> params) throws IOException,ClientProtocolException { //声明一个请求 HttpPost httpPost = new HttpPost(url); //整理参数列表 List<NameValuePair> paramterList = getNameValuePairs(params); // 将请求实体设置到httpPost对象中 //设置 参数信息 UrlEncodedFormEntity formEntity = new UrlEncodedFormEntity(paramterList, "utf-8"); httpPost.setEntity(formEntity); httpPost.setConfig(config); //执行 return executePostOrPutOrDeleteMethod(httpPost); } /** * 带参数 格式为json类型的 的 doPost 请求 * * @param url * @param json 请求参数信息 * @return 状态码和请求的body * @throws IOException */ public HttpResult doPostJson(String url, String json) throws IOException { // 创建http POST请求 HttpPost httpPost = new HttpPost(url); httpPost.setConfig(this.config); //我们需要把json参数解析出来 if(json != null){ //给他说明他是什么类型的实体类型 StringEntity stringEntity = new StringEntity(json, ContentType.APPLICATION_JSON); //将实体参数设置回去 httpPost.setEntity(stringEntity); } //执行 return executePostOrPutOrDeleteMethod(httpPost); } /** * 带参数PUT请求 * * @param url * @param params 请求参数 * @return 状态码和请求的body * @throws IOException */ public HttpResult doPut(String url, Map<String, Object> params) throws IOException { //构造一个httpPut 请求 HttpPut httpPut = new HttpPut(url); //设置参数信息 httpPut.setConfig(config); //整理参数列表 List<NameValuePair> paramterList = getNameValuePairs(params); // 将请求实体设置到httpPost对象中 //设置 参数信息 UrlEncodedFormEntity formEntity = new UrlEncodedFormEntity(paramterList, "utf-8"); httpPut.setEntity(formEntity); //执行 return executePostOrPutOrDeleteMethod(httpPut); } /** * DELETE请求,通过POST提交,_method指定真正的请求方法 * * @param url * @param param 请求参数 * @return 状态码和请求的body * @throws IOException */ public HttpResult doDelete(String url, Map<String, Object> param) throws Exception { param.put("_method", "DELETE"); return this.doPost(url, param); } /** * 不带参数的Doget请求 * * @param url 请求地址 * @return 响应200 返回网页内容 其它,返回为null * @throws Exception */ public String doGet(String url) throws Exception { //这里我们直接调用了他的 doGet 带参数的请求方式 return doGet(url, null); } /** * 没有带参数的 doPost * * @throws Exception */ public HttpResult doPost(String url) throws Exception { //我直接调用了,单携带参数的doPost return doPost(url,null); } /** * 不带参数PUT请求 * * @param url * @return 状态码和请求的body * @throws IOException */ public HttpResult doPut(String url) throws IOException { //其实我也是调用的有参数的构造器实现的功能 return doPut(url,null); } /** * 执行DELETE请求(真正的DELETE请求) * * @param url * @return 状态码和请求的body * @throws IOException */ public HttpResult doDelete(String url) throws Exception { // 创建http DELETE请求 HttpDelete httpDelete = new HttpDelete(url); httpDelete.setConfig(config); //执行 return executePostOrPutOrDeleteMethod(httpDelete); } /** * 开始执行POST 或者 PUT 或者 DELETE 方法,并且返回结果集 * @param postOrPutOrDelete 需要执行的post 或者 put 请求 或者 DELETE 请求 * @return * @throws IOException * HttpEntityEnclosingRequestBase HttpEntityEnclosingRequestBase */ private HttpResult executePostOrPutOrDeleteMethod(HttpUriRequest postOrPutOrDelete) throws IOException { CloseableHttpResponse closeableHttpResponse = null; try { closeableHttpResponse = httpClient.execute(postOrPutOrDelete); if(closeableHttpResponse.getEntity() != null){ return new HttpResult(closeableHttpResponse.getStatusLine().getStatusCode(), EntityUtils.toString(closeableHttpResponse.getEntity(), "utf-8")); }// int status = closeableHttpResponse.getStatusLine().getStatusCode();// if ( status == 200) {// return new HttpResult(status, EntityUtils.toString(closeableHttpResponse.getEntity(), "utf-8"));// } //返回状态码回去呢 return new HttpResult(closeableHttpResponse.getStatusLine().getStatusCode(), null); } finally { if( closeableHttpResponse != null){ closeableHttpResponse.close(); } } } /** * 这个方法是 整理 请求的时候的 POST 或者PUT 携带的参数整理成我们需要的类型 * @param params * @return */ private List<NameValuePair> getNameValuePairs(Map<String, Object> params) { List<NameValuePair> paramterList = new ArrayList<>(); //遍历参数信息,整理参数信息 //如果有数据 if (params != null) { for (Map.Entry<String, Object> entry : params.entrySet()) { NameValuePair nameValuePair = new BasicNameValuePair(entry.getKey(),entry.getValue().toString()); paramterList.add(nameValuePair); } } return paramterList; }}
3.2 httpclient 连接池来管理所有的httpclient连接
package com.jianqiao.util;import org.apache.http.conn.HttpClientConnectionManager;/** * @Auther: Alone_XuXu * @Description: 使用线程来管理不使用的连接操作啊 * @Date: Created in 19:53 - 27 - 10 -2017 * @Modified By: */public class IdleConnectionEvictor extends Thread{ //管理对象 private HttpClientConnectionManager httpClientConnectionManager; //判断是不是停止的条件 private volatile boolean shutdown; //构造器 public IdleConnectionEvictor(HttpClientConnectionManager httpClientConnectionManager) { this.httpClientConnectionManager = httpClientConnectionManager; this.start(); } @Override public void run() { while(!shutdown){ try { synchronized(this){ wait(5000); //清理不使用的连接 httpClientConnectionManager.closeExpiredConnections(); } } catch (InterruptedException e) { // } } } public void shutdown() { shutdown = true; synchronized (this) { notifyAll(); } }}
4 准备 pojo对象 和 vo对象 (和页面交互的对象)
4.1 准备 vo
package com.jianqiao.vo;/** * @Auther: Alone_XuXu * @Description: 描述信息<p> * 主要也就是关键字了 * </p> * @Date: Created in 6:41 - 27 - 11 -2017 * @Modified By: */public class KeyWord { private String keyword; private String enc; private String wq; private String page; public String getKeyword() { return keyword; } public void setKeyword(String keyword) { this.keyword = keyword; } public String getEnc() { return enc; } public void setEnc(String enc) { this.enc = enc; } public String getWq() { return wq; } public void setWq(String wc) { this.wq = wc; } public String getPage() { return page; } public void setPage(String page) { this.page = page; }}
4.2 准备 pojo
package com.jianqiao.pojo;import org.apache.commons.lang3.StringUtils;import java.io.Serializable;public class Product implements Serializable {private Long id;private String title;private String sellpoint;private String price;private Integer num;private String image;private Long cid;private Boolean status=true;// 在映射数据库表的时候,忽略该属性public Long getId() {return id;}public void setId(Long id) {this.id = id;}public String getTitle() {return title;}public void setTitle(String title) {this.title = title;}public String getSellpoint() {return sellpoint;}public void setSellpoint(String sellpoint) {this.sellpoint = sellpoint;}public String getPrice() {return price;}public void setPrice(String price) {this.price = price;}public Integer getNum() {return num;}public void setNum(Integer num) {this.num = num;}public String getImage() {return image;}public void setImage(String image) {this.image = image;}public Long getCid() {return cid;}public void setCid(Long cid) {this.cid = cid;}public Boolean getStatus() {return status;}public void setStatus(Boolean status) {this.status = status;}@Overridepublic String toString() {return "Product [id=" + id + ", title=" + title + ", sellPoint="+ sellpoint + ", price=" + price + ", num=" + num + ", image="+ image + ", cid=" + cid + ", status=" + status + "]";}}
package com.jianqiao.pojo;public class HttpResult { // 状态码 private Integer code; // 响应body private String body; public HttpResult() { super(); } public HttpResult(Integer code, String body) { this.code = code; this.body = body; } public Integer getCode() { return code; } public void setCode(Integer code) { this.code = code; } public String getBody() { return body; } public void setBody(String body) { this.body = body; }}
5.0 准备和数据库相关的内容
package com.jianqiao.mapper;import com.github.abel533.mapper.Mapper;import com.jianqiao.pojo.Product;public interface ProductMapper extends Mapper<Product> {}
6.0 服务层准备
package com.jianqiao.service;import com.github.abel533.entity.Example;import com.github.abel533.mapper.Mapper;import com.github.pagehelper.PageHelper;import org.springframework.beans.factory.annotation.Autowired;import java.lang.reflect.ParameterizedType;import java.lang.reflect.Type;import java.util.List;public class BaseServiceImpl<T>{ @Autowired protected Mapper<T> mapper; Class<T> clazz; public BaseServiceImpl() { Type type = this.getClass().getGenericSuperclass(); ParameterizedType ptype = (ParameterizedType)type; this.clazz =(Class<T>)ptype.getActualTypeArguments()[0]; } public T queryById(Long id) { return this.mapper.selectByPrimaryKey(id); } public List<T> queryAll() { //我们如果在缓存中查找不导数据,这个时候我们才需要去查询数据库 return this.mapper.select(null); } public List<T> queryByWhere(T t) { return this.mapper.select(t); } public Integer queryByWhereCount(T t) { return this.mapper.selectCount(t); } public List<T> queryByPage(Integer page, Integer rows) { //第一个参数:当前页,第二参数:每页显示记录数 PageHelper.startPage(page, rows); List<T> list = this.mapper.select(null); return list; } public T queryOne(T t) { return this.mapper.selectOne(t); } public void save(T t) { this.mapper.insert(t); } public void saveSelective(T t) { this.mapper.insertSelective(t); } public void update(T t) { this.mapper.updateByPrimaryKey(t); } public void updateSelective(T t) { this.mapper.updateByPrimaryKeySelective(t); } public void deleteById(Long id) { this.mapper.deleteByPrimaryKey(id); } public void deleteByIds(List<Object> ids) { Example example = new Example(this.clazz); example.createCriteria().andIn("id", ids); //批量删除 this.mapper.deleteByExample(example); }}
package com.jianqiao.service;import com.fasterxml.jackson.databind.ObjectMapper;import com.jianqiao.mapper.ProductMapper;import com.jianqiao.pojo.Product;import org.slf4j.Logger;import org.slf4j.LoggerFactory;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.stereotype.Service;@Servicepublic class ProductServiceImpl extends BaseServiceImpl<Product>{}
这个是我们主要的服务
package com.jianqiao.service;import com.jianqiao.constant.AppConstants;import com.jianqiao.pojo.Product;import com.jianqiao.util.HttpClientUtilImpl;import org.jsoup.Jsoup;import org.jsoup.nodes.Document;import org.jsoup.nodes.Element;import org.jsoup.select.Elements;import org.slf4j.Logger;import org.slf4j.LoggerFactory;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.stereotype.Service;import java.util.Map;import java.util.concurrent.ConcurrentHashMap;/** * @Auther: Alone_XuXu * @Description: 描述信息 * @Date: Created in 6:59 - 27 - 11 -2017 * @Modified By: */@Servicepublic class ClawerService { private Logger logger = LoggerFactory.getLogger(ClawerService.class); //工具 @Autowired private HttpClientUtilImpl httpClientUtil; /** * 获取都页数 * * @param url * @return */ public Integer getTotalPage(String url) { try { String html = httpClientUtil.doGet(url); if (html != null) { Document document = Jsoup.parse(html); //解析文档 //id="J_topPage" 表示这个页数所在的位置 String jtopPageText = document.select("#J_topPage").text(); //使用正则表达式来取值 String[] strings = jtopPageText.split("\\D+"); System.out.println("总页数: " + strings[1]); return Integer.parseInt(strings[1]); } } catch (Exception e) { e.printStackTrace(); } return 0; } /** * 循环抓取内容啊 * * @param url * @return */ public Map<String, Product> findProductByPage(final String url,final Map<String, Object> params) { Map<String, Product> maps = new ConcurrentHashMap<>(); //替换页码 try { String doGetHtml = httpClientUtil.doGet1(url,params); //去除中间多的空格啊,换行之类的 doGetHtml = doGetHtml.replaceAll("\r\n|\r|\n|\t|\b|~|\f", "");//去掉回车换行符 getProductList(maps, doGetHtml); return maps; } catch (Exception e) { e.printStackTrace(); } return maps; } /** * 将html中的产品信息,取出来 * @param maps * @param doGetHtml */ private void getProductList(Map<String, Product> maps, String doGetHtml) { if (doGetHtml != null) { //解析到 document 文档 Document rootDocument = Jsoup.parse(doGetHtml); // 获取到整个商品列表信息 Elements listElement = rootDocument.select("ul[class=gl-warp clearfix]") .select(".gl-item"); for (Element element : listElement) { Product product = new Product(); Element childDiv = element.child(0); String data_sku = element.attr("data-sku"); String p_name = childDiv.select(".p-name").text(); String image_src = element.select(".p-img").select("a img").attr("src"); String price = element.select(".p-price strong").select("i").text(); product.setId(Long.parseLong(data_sku)); product.setTitle(p_name); product.setImage(AppConstants.HTTPS + image_src); product.setPrice(price); //将数据添加到整个列表里面呢 maps.put(data_sku,product); } } }}
7.0 controller层准备
package com.jianqiao.controller;import com.fasterxml.jackson.databind.ObjectMapper;import com.jianqiao.constant.AppConstants;import com.jianqiao.pojo.Product;import com.jianqiao.service.ClawerService;import com.jianqiao.service.ProductServiceImpl;import com.jianqiao.vo.KeyWord;import org.slf4j.Logger;import org.slf4j.LoggerFactory;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;import org.springframework.stereotype.Controller;import org.springframework.util.StringUtils;import org.springframework.web.bind.annotation.RequestMapping;import java.util.Map;import java.util.concurrent.ConcurrentHashMap;import java.util.concurrent.CountDownLatch;import java.util.concurrent.ExecutorService;import java.util.concurrent.Executors;/** * @Auther: Alone_XuXu * @Description: 描述信息 * @Date: Created in 6:39 - 27 - 11 -2017 * @Modified By: */@Controllerpublic class JDClawerController { //设置总的记录shutdown private static Long count = 0L; //设置总页数 private static Integer totalPage = 0; //最后得到的结果 private Map<String, Product> finalMaps = new ConcurrentHashMap<>(); //json 转换工具 private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper(); @Autowired private ThreadPoolTaskExecutor threadPoolTaskExecutor; @Autowired private ClawerService clawerService; @Autowired private ProductServiceImpl productService; /** * 爬取的京东数据 * * @param keyWord 接受到的参数组合 */ @RequestMapping("/jd/clawer") public void clawerJD(KeyWord keyWord) { //我们先替换掉所有的参数信息先啊 String url = "https://search.jd.com/Search?keyword={keyword}&enc={enc}&qrst=1&rt=1&stop=1&vt=2&wq={wq}&page={page}&s=57&click=0"; String operationUrl = url.replace("{keyword}", keyWord.getKeyword()); operationUrl = operationUrl.replace("{enc}", keyWord.getEnc()); operationUrl = operationUrl.replace("{wq}", keyWord.getWq()); if(keyWord.getPage() != null){ operationUrl = operationUrl.replace("{page}", keyWord.getPage()); }else{ operationUrl = operationUrl.replace("{page}", "1"); } totalPage = clawerService.getTotalPage(operationUrl); Integer vtPage = totalPage * 2; //在京东有个问题,serach查询的时候,有个步长的概念,神知道他想做什么...... final CountDownLatch countDownLatch = new CountDownLatch(totalPage);//为了我们的线程可以计数,多少页我们就执行多少次 long startTime = System.currentTimeMillis(); //步长为2,等这里面所有线程执行结束 for (int i = 1; i < vtPage; i += 2) { System.out.println("第" + i + "页"); final Map<String, Object> params = new ConcurrentHashMap<>(); params.put("keyword", keyWord.getKeyword()); params.put("enc", keyWord.getEnc()); params.put("wc", keyWord.getWq()); params.put("page", i + ""); threadPoolTaskExecutor.submit(new Runnable() { @Override public void run() { try { Map<String, Product> productByPage = clawerService.findProductByPage(AppConstants.BASE_URL,params); finalMaps.putAll(productByPage); } finally { countDownLatch.countDown();//执行一次计数一次 } } }); } //让主线程等待啊 try { countDownLatch.await(); } catch (InterruptedException e) { e.printStackTrace(); } long endTime = System.currentTimeMillis(); //遍历一下先 for(Map.Entry<String,Product> entry : finalMaps.entrySet()){ productService.saveSelective(entry.getValue()); } //在这里我们可以开启多线程了 System.out.println("消耗时间:" + (endTime - startTime)); //消耗时间:19094 这个是开了三个线程操作的时候的数据 //消耗时间:6337 这个是我开了十个线程的时候的数据 }}
系统常量类
package com.jianqiao.constant;/** * @Auther: Alone_XuXu * @Description: 描述信息 * @Date: Created in 6:46 - 27 - 11 -2017 * @Modified By: */public interface AppConstants { //默认编码 String DEFAULT_CHARSET = "utf-8"; //需要爬取的网站入口 //https://search.jd.com/Search?keyword=笔记本电&enc=utf-8&qrst=1&rt=1&stop=1&vt=2&wq=笔记本电脑&page=3&s=57&click=0 String BASE_URL = "https://search.jd.com/Search"; String HTTPS = "https:"; /** * 浏览器头信息 */ interface Header { String ACCEPT = "Accept"; String ACCEPT_ENCODING = "Accept-Encoding"; String ACCEPT_LANGUAGE = "Accept-Language"; String CACHE_CONTROL = "Cache-Controle"; String COOKIE = "Cookie"; String HOST = "Host"; String PROXY_CONNECTION = "Proxy-Connection"; String REFERER = "Referer"; String USER_AGENT = "User-Agent"; }}
8.0 配置文件准备
8.1 web.xml
<?xml version="1.0" encoding="UTF-8"?><web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd" version="3.1"> <display-name>Archetype Created Web Application</display-name> <!-- 配置spring 容器启动监听器 --> <context-param> <param-name>contextConfigLocation</param-name> <param-value>classpath*:spring/spring-*.xml</param-value> </context-param> <listener> <listener-class>org.springframework.web.context.ContextLoaderListener</listener-class> </listener> <!-- 配置前端控制器 --> <servlet> <servlet-name>DispatcherServlet</servlet-name> <servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class> <init-param> <param-name>contextConfigLocation</param-name> <param-value>classpath:spring/springmvc-*.xml</param-value> </init-param> <load-on-startup>1</load-on-startup> </servlet> <servlet-mapping> <servlet-name>DispatcherServlet</servlet-name> <url-pattern>/</url-pattern> </servlet-mapping> <!-- 配置 post 请求乱码处理拦截器 --> <filter> <filter-name>CharacterEncodingFilter</filter-name> <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class> <init-param> <param-name>encoding</param-name> <param-value>utf-8</param-value> </init-param> <init-param> <param-name>forceRequestEncoding</param-name> <param-value>true</param-value> </init-param> <init-param> <param-name>forceResponseEncoding</param-name> <param-value>true</param-value> </init-param> </filter> <filter-mapping> <filter-name>CharacterEncodingFilter</filter-name> <url-pattern>/*</url-pattern> </filter-mapping> <!-- 配置springmvc rest 拦截器--> <filter> <filter-name>HiddenHttpMethodFilter</filter-name> <filter-class>org.springframework.web.filter.HiddenHttpMethodFilter</filter-class> </filter> <filter-mapping> <filter-name>HiddenHttpMethodFilter</filter-name> <url-pattern>/*</url-pattern> </filter-mapping></web-app>
8.2 mybatis
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE configuration PUBLIC "-//mybatis.org//DTD Config 3.0//EN" "http://mybatis.org/dtd/mybatis-3-config.dtd"><configuration> <!-- 配置通用 mapper --> <!-- 如果同时使用通用 mapper 和 PageHelper 我们应该讲 PageHelper 插件配置在前面,否则不能正常启动 --> <plugins> <plugin interceptor="com.github.pagehelper.PageInterceptor"> <!-- 开挂设置分页合理化 --> <property name="reasonable" value="true"/> </plugin> <plugin interceptor="com.github.abel533.mapperhelper.MapperInterceptor"> <!--主键自增回写方法,默认值MYSQL,详细说明请看文档 --> <property name="IDENTITY" value="MYSQL"/> <!--通用Mapper接口,多个通用接口用逗号隔开 --> <property name="mappers" value="com.github.abel533.mapper.Mapper"/> </plugin> </plugins></configuration>
8.3 properties
jdbc相关
jdbc.username=rootjdbc.password=1230jdbc.url=jdbc:mysql://localhost:3306/clawerDB?rewriteBatchedStatements=true&useUnicode=true&characterEncoding=utf8jdbc.driver=com.mysql.jdbc.Driver
httpclient.maxTotal = 200httpclient.DefaultMaxPerRoute = 20httpclient.connectTimeout =1000httpclient.connectionRequestTimeout =500httpclient.socketTimeout =10000httpclient.staleConnectionCheckEnabled = true
8.4 spring相关
8.4.1 spring-beans.xml
<?xml version="1.0" encoding="UTF-8"?><beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd"> <!-- 这里配置服务相关 --> <context:component-scan base-package="com.jianqiao.util"/> <context:component-scan base-package="com.jianqiao.service"/> <!-- 配置线程池 异步线程池 --> <bean id="taskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor"> <!-- 线程池维护线程的最少数量 --> <property name="corePoolSize" value="10" /> <!-- 线程池维护线程的最大数量 --> <property name="maxPoolSize" value="100" /> <!-- 线程池所使用的缓冲队列 mainExecutor.maxSize --> <property name="queueCapacity" value="1000" /> <!-- 线程池维护线程所允许的空闲时间 --> <property name="keepAliveSeconds" value="3000" /> <!-- 线程池对拒绝任务(无线程可用)的处理策略 AbortPolicy会抛出RejectedExecutionException异常。--> <property name="rejectedExecutionHandler"> <bean class="java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy" /> </property> </bean></beans>
<?xml version="1.0" encoding="UTF-8"?><beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:content="http://www.springframework.org/schema/context" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd"> <!-- 加载配置文件--> <content:property-placeholder location="classpath:properties/*.properties"/> <!-- 这里配置加载 Dao 服务相关 --> <!-- 配置数据源 --> <bean class="com.jolbox.bonecp.BoneCPDataSource" id="dataSource" destroy-method="close"> <!-- 数据库驱动 --> <property name="driverClass" value="${jdbc.driver}"/> <!-- 相应驱动的jdbcUrl --> <property name="jdbcUrl" value="${jdbc.url}"/> <!-- 数据库的用户名 --> <property name="username" value="${jdbc.username}"/> <!-- 数据库的密码 --> <property name="password" value="${jdbc.password}"/> <!-- 检查数据库连接池中空闲连接的间隔时间,单位是分,默认值:240,如果要取消则设置为0 --> <property name="idleConnectionTestPeriod" value="60"/> <!-- 连接池中未使用的链接最大存活时间,单位是分,默认值:60,如果要永远存活设置为0 --> <property name="idleMaxAge" value="30"/> <!-- 每个分区最大的连接数 --> <property name="maxConnectionsPerPartition" value="150"/> <!-- 每个分区最小的连接数 --> <property name="minConnectionsPerPartition" value="5"/> </bean> <!-- 配置 Sql Session Factory--> <bean class="org.springframework.jdbc.datasource.DataSourceTransactionManager" id="transactionManager"> <property name="dataSource" ref="dataSource"></property> </bean></beans>
spring-httpclient.xml
<?xml version="1.0" encoding="UTF-8"?><beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd"> <!-- 加载外部的配置文件 --> <context:property-placeholder location="classpath:properties/*.properties" /> <!-- 配置连接管理器 --> <bean id="connectionManager" class="org.apache.http.impl.conn.PoolingHttpClientConnectionManager"> <!-- 设置最大连接数 --> <property name="maxTotal" value="${httpclient.maxTotal}" /> <!-- 设置每个主机地址的并发数 --> <property name="defaultMaxPerRoute" value="${httpclient.DefaultMaxPerRoute}" /> </bean> <!-- 创建HttpClientBuilder --> <bean id="httpClientBuilder" class="org.apache.http.impl.client.HttpClientBuilder"> <!-- 设置连接管理器 --> <property name="connectionManager" ref="connectionManager" /> </bean> <!-- httpclient --> <bean id="httpClient" class="org.apache.http.impl.client.CloseableHttpClient" factory-bean="httpClientBuilder" factory-method="build" scope="prototype"> </bean> <bean id="requestConfigBuilder" class="org.apache.http.client.config.RequestConfig.Builder"> <!-- 创建连接的最长时间 --> <property name="connectTimeout" value="${httpclient.connectTimeout}"/> <!-- 从连接池中获取到连接的最长时间 --> <property name="connectionRequestTimeout" value="${httpclient.connectionRequestTimeout}"/> <!-- 数据传输的最长时间 --> <property name="socketTimeout" value="${httpclient.socketTimeout}"/> <!-- 提交请求前测试连接是否可用 --> <property name="staleConnectionCheckEnabled" value="${httpclient.staleConnectionCheckEnabled}"/> </bean> <!-- 配置请求参数 --> <bean id="requestConfig" class="org.apache.http.client.config.RequestConfig" factory-bean="requestConfigBuilder" factory-method="build"></bean> <!-- 定时清理连接 --> <bean class="com.jianqiao.util.IdleConnectionEvictor" destroy-method="shutdown"> <constructor-arg index="0" ref="connectionManager"/> </bean></beans>
<?xml version="1.0" encoding="UTF-8"?><beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd"> <!-- 配置spring 和mybatis 整合 --> <!--配置 SqlSessionFactory --> <bean class="org.mybatis.spring.SqlSessionFactoryBean" id="sqlSessionFactory"> <property name="dataSource" ref="dataSource"/> <!-- 配置 mybatis 全局配置文件--> <property name="configLocation" value="classpath:mybatis/mybatis-config.xml"/> <!--配置扫描mapper 目录以及子目录 所有xml 文件 这里我们使用通用mapper 所以用不上了--> <!-- <property name="mapperLocations" value="classpath:mappers/**/*.xml"/>--> <!--配置别名--> <property name="typeAliasesPackage" value="com.jianqiao.pojo"/> </bean> <!-- 扫描mapper --> <bean class="org.mybatis.spring.mapper.MapperScannerConfigurer"> <!-- 扫描mapper路径 --> <property name="basePackage" value="com.jianqiao.mapper"/> </bean></beans>
<?xml version="1.0" encoding="UTF-8"?><beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context" xmlns:mvc="http://www.springframework.org/schema/mvc" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd http://www.springframework.org/schema/mvc http://www.springframework.org/schema/mvc/spring-mvc.xsd"> <!-- 为什么 这个配置文件只能在这个 spring mvc 的配置文件中才能在Controller中获取到呢 --> <context:property-placeholder location="classpath:properties/*.properties"/> <!-- 配置SPRING-MVC相关的内容 --> <!-- 定义Controller的扫描包 --> <context:component-scan base-package="com.jianqiao.controller"/> <!-- 配置试图解析器 --> <bean class="org.springframework.web.servlet.view.InternalResourceViewResolver"> <property name="prefix" value="/WEB-INF/views/"/> <property name="suffix" value=".jsp"/> </bean> <!-- 注解驱动 --> <mvc:annotation-driven/> <mvc:default-servlet-handler/> <!--配置文件上传解析器 --> <bean class="org.springframework.web.multipart.commons.CommonsMultipartResolver" id="multipartResolver"> <property name="defaultEncoding" value="utf-8"/> <property name="maxUploadSize" value="5242880"/> </bean></beans>
作者注:
本文可以实现商品的基本搜索,并且保存到数据库
不足之处是,许多代码还需要做调整
最主要一点,jsoup解析的时候,解析图片的时候有时候会取不出来,希望看到这篇文章的人,也能帮我修复这个bug.谢谢
阅读全文
0 0
- java 多线程实现 爬虫京东搜索商品爬虫
- java多线程爬虫实现
- 京东的商品爬虫
- java搜索---网络爬虫实现
- java搜索---网络爬虫实现
- Java爬虫搜索原理实现
- Java爬虫搜索原理实现
- java爬虫 京东商品页 简单案例
- 手把手教你写电商爬虫-第五课 京东商品评论爬虫 一起来对付反爬虫
- 多线程实现的Java爬虫程序
- 多线程实现的Java爬虫程序
- 用多线程实现的Java爬虫程序
- 用JAVA实现简单爬虫多线程抓取
- JAVA多线程网络爬虫的代码实现
- java搜索爬虫
- 多线程实现爬虫
- java多线程爬虫实例
- java多线程爬虫实例
- Mybatis之更新数据(9)
- Mybatis之SqlSession的应用场合(10)
- Bailian2886 能被3除尽的数之和【入门】
- selenium用法详解
- Mybatis之原始dao开发方法(程序员需要写dao接口和dao实现类)(11)
- java 多线程实现 爬虫京东搜索商品爬虫
- 你所不了解的Linux二进制
- 波士顿咨询:把人工智能应用于工作
- Mybatis之使用mapper代理方法开发dao(程序员只需要写mapper接口(dao接口))(12)
- 考研英语
- DeepMind提出新型超参数最优化方法:性能超越手动调参和贝叶斯优化
- 「我们曾被银行客户扫地出门」, B 轮融资 25 亿后, 云从科技首次披露艰难生存之路
- 清华计算机系智能体晚会:算法的人文情怀
- 智能爆炸不会发生,AI将线性发展:Keras作者François Chollet热文