简单的爬虫程序2

来源:互联网 发布:音乐下载 知乎 编辑:程序博客网 时间:2024/05/17 03:57

失业在宿舍已经6天了,在年底,不好找工作。上班之后,第一次跨年,没经验,以为工作好找。尴尬


闲的无聊,写个简单的爬虫

腾讯城市新闻爬虫练习

http://fj.qq.com/dc_column_article/TagsList.htm?tags=福州


爬虫抓包分析


GET请求,填写相应参数就行了。

核心代码:

// 福州新闻HttpGet httpGet = new HttpGet("http://tags.open.qq.com/interface/tag/articles.php?callback=jQuery18202606978673085389_1417534641913&p="+ i+ "&l=20&tag=%E7%A6%8F%E5%B7%9E&oe=gbk&ie=utf-8&site=fj&_=1417534648230");// 设置HttpGet的头部参数信息httpGet.setHeader("Accept", "application/javascript, */*;q=0.8");httpGet.setHeader("Accept-Charset", "GB2312,utf-8;q=0.7,*;q=0.7");httpGet.setHeader("Accept-Encoding", "gzip, deflate");httpGet.setHeader("Accept-Language", "zh-CN");httpGet.setHeader("Connection", "Keep-Alive");httpGet.setHeader("DNT", "1");httpGet.setHeader("Cookie","pgv_info=ssid=s7418086336; ac=1,019,001; pt2gguin=o1023746826; RK=2dlDJvBBFu; ptcz=a7000fdd9a7d79d08c3356b93cd78e526ae2be2327ee843f9d94a415e5fb4a7f; pgv_pvid=4808498400; uin_cookie=1023746826; euin_cookie=9628486DA91468B319D8EB65692F106CB376CEC227892C37; o_cookie=1023746826");// httpGet.setHeader("Host", "tags.open.qq.com");httpGet.setHeader("Referer","http://js.qq.com/dc_column_article/TagsList.htm?tags=%E8%8B%8F%E5%B7%9E");httpGet.setHeader("User-Agent","Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)");



0 0