Python爬微博(ajax+mongo)
来源:互联网 发布:jsp页面添加java代码 编辑:程序博客网 时间:2024/05/20 21:19
import requestsfrom urllib.parse import urlencodefrom pyquery import PyQuery as pqfrom pymongo import MongoClientbase_url = 'https://m.weibo.cn/api/container/getIndex?'headers = { 'Host': 'm.weibo.cn', 'Referer': 'https://m.weibo.cn/u/2145291155', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36', 'X-Requested-With': 'XMLHttpRequest',}client = MongoClient()db = client['weibo']collection = db['weibo']max_page = 14def get_page(page): params = { 'type': 'uid', 'value': '2145291155', 'containerid': '1076032145291155', 'page': page } url = base_url + urlencode(params) try: response = requests.get(url, headers=headers) if response.status_code == 200: return response.json() except requests.ConnectionError as e: print('Error', e.args)def parse_page(json): if json: items = json.get('cards') for item in items: item = item.get('mblog') weibo = {} #字典 weibo['id'] = item.get('id') weibo['text'] = pq(item.get('text')).text() weibo['attitudes'] = item.get('attitudes_count') weibo['comments'] = item.get('comments_count') weibo['reposts'] = item.get('reposts_count') yield weibodef save_to_mongo(result): if collection.insert(result): print('Saved to Mongo')if __name__ == '__main__': for page in range(1, max_page + 1): json = get_page(page) results = parse_page(json) for result in results: print('\n'.result) save_to_mongo(result)
阅读全文
0 0
- Python爬微博(ajax+mongo)
- Python Mongo
- python mongo
- PyMongo python mongo curd
- python -【mongo】 处理ObjectID
- python mongo 语法纪录
- python使用mongo数据库
- python操作mongo
- python爬取链家网入库mongo
- 使用Python操作mongo
- mongo(四)Mongo Shell
- mongo(五)Mongo Insert
- mongo(六)Mongo Query
- mongo(七)Mongo Update
- Python读写mongo时区问题
- c++调用python操作mongo
- python Mongo环境安装,笔记
- Python读写mongo时区问题
- 安卓各个版本新特性
- Java基础学习三
- AOP中获取注解
- java绘图学习
- 前段成长之路——CSS3基础(一)边框,颜色,字体,背景
- Python爬微博(ajax+mongo)
- 数据库学习纪要(十八):MySQL简介-2
- 什么是C/S结构,什么是B/S结构,两者的区别与联系
- 开启全新奋斗的程序员之路
- JMS与activeMQ,消息中间件入门
- python中ones zeros 的用法
- 子类继承父类静态变量问题
- 21天学通python——第一天
- ReactNative——UI6.ListView实现带标题的多列列表