Python爬虫实战(九):爬取动态网页
来源:互联网 发布:淘宝商家进货渠道 编辑:程序博客网 时间:2024/06/06 00:59
#coding=utf-8import reimport jsonimport requestsfrom prettytable import PrettyTabledef getHtml(url): data = { 'page':1, 'num':40, 'sort':'symbol', 'asc':1, 'node':'cyb', 'symbol':'', '_s_r_a':'page'} headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0'} try: page = requests.post(url,data = data,headers = headers) page.encoding = 'gbk' html = page.text return html except: return ""def getdata(html): data = html.replace(':','":') data = data.replace(',',',"') data = data.replace('{','{"') data = data.replace('"{','{') data = re.sub('\d+":\d+":\d+','',data) data = json.loads(data) row = PrettyTable() row.field_names = ["代码", "名称", "最新价", "涨跌额","涨跌幅","买入","卖出","昨收","今开","最高" ,"最低","成交量/手","成交额/万"] for item in data: row.add_row((item['symbol'],item['name'],item['trade'],item['pricechange'],item['changepercent'] ,item['buy'],item['sell'],item['settlement'],item['open'],item['high'] ,item['low'],item['volume'],item['amount'])) print(row) if __name__=='__main__': url = 'http://vip.stock.finance.sina.com.cn/quotes_service/api/json_v2.php/Market_Center.getHQNodeData?' html = getHtml(url) getdata(html)#coding=utf-8import reimport jsonimport requestsfrom prettytable import PrettyTabledef getHtml(url): data = { 'page.pageNo':2, 'tempPageSize':40, } headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0'} page = requests.post(url,headers = headers,data = data) html = page.text print (html)if __name__=='__main__': url = 'http://datacenter.mep.gov.cn:8099/ths-report/report!list.action?xmlname=1465594312346' getHtml(url)
阅读全文
0 0
- Python爬虫实战(九):爬取动态网页
- Python爬虫实战(十一):两种简单的方法爬取动态网页
- 爬虫实战2—动态网页的爬取
- Python爬虫实战(三):简单爬取网页图片
- python爬虫进阶(二):动态网页爬取
- Python爬虫实战(动态网页)
- Python3网络爬虫:Scrapy入门实战之爬取动态网页图片
- 《python爬虫实战》:爬取图片
- python爬虫实战 爬取天极图片
- python爬虫之爬取网页
- Python爬虫爬取网页转码报错
- python爬虫爬取淘宝网页
- python爬虫爬取网页表格数据
- python爬虫 爬取淘宝网页数据
- Python爬虫爬取GBK网页
- python爬虫(爬取豆瓣电影)_动态网页,json解释,中文编码
- [Python爬虫]Scrapy配合Selenium和PhantomJS爬取动态网页
- Python3.X 爬虫实战(动态页面爬取解析)
- Python中匿名函数的几个坑
- 同时安装python2和python3需要注意的一些事
- 论计算机中的色彩表示方法
- 【JZOJ 5432】 三元组
- pillow 图像读写操作
- Python爬虫实战(九):爬取动态网页
- Eclipse 无法打开Console,show view无效
- Java调用多标签学习开源库Mulan
- SPringMVC的文件上传与下载
- JDBC常用API和使用
- 《组合数学引论》第三章部分习题解答
- [thinkPHP5项目实战_21]管理员添加、删除和修改
- MNIST Training
- 《计算机视觉-一种现代方法(第2版)》读书笔记六:应用之图像搜索和检索