requests实现简单文本爬虫

来源：互联网发布：mysql 修改数据库时间编辑：程序博客网时间：2024/06/10 05:12

import requestsimport re page =1url = 'http://www.qiushibaike.com/hot/page/' + str(page)  user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'#请求的头信息，谷歌打开页面按F12打开谷歌监听工具，找到network项，F5刷新页面，点击第一个数据包，即可查看headers相关信息。headers = { 'User-Agent' : user_agent }response=requests.get(url,headers=headers)#创建一个实例content = response.text#访问该实例的text属性，返回html文本content =re.sub('<br/>','\n' ,content)#用'\n'替换<br/>#将HTML文本中的空格标记用断行符替换pattern = re.compile('content">.*?<span>(.*?)</span>.*?</div>',re.S)#正则式匹配HTML文本中的段子.outputs = re.findall(pattern,content)#正则式匹配for i in outputs:  print i +"\n\n"

1 0

requests实现简单文本爬虫
requests简单爬虫试手
简单接口实现 requests
python简单文本爬虫
Python Beautiful Soup+requests实现爬虫
简单的实现爬虫爬取网页文本和图片
python简单爬虫开发（urllib2、requests + BeautifulSoup）
使用requests+beautifulsoup模块实现python网络爬虫功能
python pip下安装Requests；实现单线程爬虫
requests‐bs4路线实现中国大学排名定向爬虫
Python爬虫实例——基于BeautifulSoup和requests实现
爬虫学习3.2 HTTP请求的python实现--Requests
使用requests+beautifulsoup模块实现python网络爬虫功能
scala 实现简单爬虫
python 简单爬虫实现
php 实现简单爬虫
Python实现简单爬虫
Python实现简单爬虫
Python基础2
CDN+P2P直播应用
CF19b： Checkout Assistant（类01背包）
ExtJs的api文档该怎么看
Android苦手的App之旅（4）
requests实现简单文本爬虫
Varnish 4.0 实战
ModelMapper:从对象到对象的映射库
<java并发编程实战>阅读总结(a)
代码
中英文对照介绍Play Framework 框架安全模块
1093. Count PAT's (25)
python相拟度算法（一）-欧几里得距离评介
java 两种方式遍历文件夹及文件