python3[爬虫实战] 爬虫之requests爬取新浪微博京东客服
来源:互联网 发布:淘宝宝贝长图作用 编辑:程序博客网 时间:2024/05/20 00:15
爬取的内容为京东客服的微博及评论
思路:主要是通过手机端访问新浪微博的api接口,然后进行数据的筛选,
类似于这样的:https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%40%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320
这个主要是登陆上去的微博的url链接,
也可以在
https://passport.weibo.cn/signin/welcome?entry=mweibo&r=http%3A%2F%2Fm.weibo.cn%2F
进行新浪微博的登陆,
可以看到的界面:
这里主要爬取的内容为:
说说,说说下面的评论条目
虽然很简单,但是,不得不说句mmp,爬取的过程很坎坷,现在是一直在ip上,另外,个人经过尝试,睡眠时间30秒一次也不是很好的效果, 睡眠10秒就足够了,可能该封你的ip还是会封的,我这问题应该封ip的情况
爬取的方法主要是通过手机端api进行json数据的获取,然后进行数据的提取。
这里可以使用火狐fox的插件使用:
主要api:
说说API:
第一条微博:
https://m.weibo.cn/api/container/getIndex?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320&type=uid&value=5650743478&containerid=1076035650743478
第二条微博:
https://m.weibo.cn/api/container/getIndex?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320&type=uid&value=5650743478&containerid=1076035650743478&page=2
类似于这样子的,
详情评论内容API:
在每条评论下会有一个idstr:4137390568546147
然后跳到评论详情页:
https://m.weibo.cn/status/4137390568546147
评论条目拼加方式:
https://m.weibo.cn/api/comments/show?id=4137390568546147&page=1
https://m.weibo.cn/api/comments/show?id=4137390568546147&page=2
带大家看一下评论api下返回的数据:JSON格式的
{ "cardlistInfo": { "containerid": "1076035650743478", "v_p": 42, "show_style": 1, "total": 3264, "page": 2 }, "cards": [ { "card_type": 9, "itemid": "1076035650743478_-_4137858652321796", "scheme": "https://m.weibo.cn/status/FfSSl9K0k?mblogid=FfSSl9K0k&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "mblog": { "created_at": "2小时前", "id": "4137858652321796", "mid": "4137858652321796", "idstr": "4137858652321796", "text": "明天又要上班了,用四个字描述下你现在的心情吧<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/others/d_erha-0d2bea3a7d.png\" style=\"width:1em;height:1em;\" alt=\"[二哈]\"></span> ", "textLength": 50, "source": "微博 weibo.com", "favorited": false, "thumbnail_pic": "http://wx4.sinaimg.cn/thumbnail/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg", "bmiddle_pic": "http://wx4.sinaimg.cn/bmiddle/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg", "original_pic": "http://wx4.sinaimg.cn/large/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg", "user": { "id": 5650743478, "screen_name": "京东客服", "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg", "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "statuses_count": 3245, "verified": true, "verified_type": 2, "verified_type_ext": 0, "verified_reason": "北京京东世纪贸易有限公司", "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服", "gender": "f", "mbtype": 2, "urank": 29, "mbrank": 2, "follow_me": false, "following": false, "followers_count": 18427, "follow_count": 235, "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg" }, "reposts_count": 0, "comments_count": 4, "attitudes_count": 2, "isLongText": false, "visible": { "type": 0, "list_id": 0 }, "mblogtype": 0, "bid": "FfSSl9K0k", "pics": [ { "pid": "006apWvQgy1fi7tkjguy4j309q09qt8q", "url": "https://wx4.sinaimg.cn/orj360/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg", "size": "orj360", "geo": { "width": "350", "height": "350", "croped": false }, "large": { "size": "large", "url": "https://wx4.sinaimg.cn/large/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg", "geo": { "width": "350", "height": "350", "croped": false } } } ] }, "show_type": 0, "openurl": "" }, { "card_type": 9, "itemid": "1076035650743478_-_4137692553365577", "scheme": "https://m.weibo.cn/status/FfOyre7xv?mblogid=FfOyre7xv&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "mblog": { "created_at": "13小时前", "id": "4137692553365577", "mid": "4137692553365577", "idstr": "4137692553365577", "text": "你觉得举办哪种《中国有_____》比赛,你能进入决赛? ", "textLength": 49, "source": "微博 weibo.com", "favorited": false, "thumbnail_pic": "http://wx2.sinaimg.cn/thumbnail/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg", "bmiddle_pic": "http://wx2.sinaimg.cn/bmiddle/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg", "original_pic": "http://wx2.sinaimg.cn/large/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg", "user": { "id": 5650743478, "screen_name": "京东客服", "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg", "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "statuses_count": 3245, "verified": true, "verified_type": 2, "verified_type_ext": 0, "verified_reason": "北京京东世纪贸易有限公司", "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服", "gender": "f", "mbtype": 2, "urank": 29, "mbrank": 2, "follow_me": false, "following": false, "followers_count": 18427, "follow_count": 235, "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg" }, "reposts_count": 0, "comments_count": 13, "attitudes_count": 1, "isLongText": false, "visible": { "type": 0, "list_id": 0 }, "mblogtype": 0, "bid": "FfOyre7xv", "pics": [ { "pid": "006apWvQgy1fi7ul9n9rfj30k00lsgnj", "url": "https://wx2.sinaimg.cn/orj360/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg", "size": "orj360", "geo": { "width": 360, "height": 392, "croped": false }, "large": { "size": "large", "url": "https://wx2.sinaimg.cn/large/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg", "geo": { "width": "720", "height": "784", "croped": false } } } ] }, "show_type": 0, "openurl": "" }, { "card_type": 9, "itemid": "1076035650743478_-_4137390568546147", "scheme": "https://m.weibo.cn/status/FfGHmzRf5?mblogid=FfGHmzRf5&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "mblog": { "created_at": "昨天 14:24", "id": "4137390568546147", "mid": "4137390568546147", "idstr": "4137390568546147", "text": "周末就是买买买,吃吃吃<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_huaixiao-bb5966dcc6.png\" style=\"width:1em;height:1em;\" alt=\"[坏笑]\"></span> ", "textLength": 28, "source": "微博 weibo.com", "favorited": false, "thumbnail_pic": "http://wx2.sinaimg.cn/thumbnail/006apWvQgy1fi7taijr9pg307e05kgvl.gif", "bmiddle_pic": "http://wx2.sinaimg.cn/bmiddle/006apWvQgy1fi7taijr9pg307e05kgvl.gif", "original_pic": "http://wx2.sinaimg.cn/large/006apWvQgy1fi7taijr9pg307e05kgvl.gif", "user": { "id": 5650743478, "screen_name": "京东客服", "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg", "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "statuses_count": 3245, "verified": true, "verified_type": 2, "verified_type_ext": 0, "verified_reason": "北京京东世纪贸易有限公司", "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服", "gender": "f", "mbtype": 2, "urank": 29, "mbrank": 2, "follow_me": false, "following": false, "followers_count": 18427, "follow_count": 235, "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg" }, "reposts_count": 0, "comments_count": 19, "attitudes_count": 1, "isLongText": false, "visible": { "type": 0, "list_id": 0 }, "mblogtype": 0, "bid": "FfGHmzRf5", "pics": [ { "pid": "006apWvQgy1fi7taijr9pg307e05kgvl", "url": "https://wx2.sinaimg.cn/orj360/006apWvQgy1fi7taijr9pg307e05kgvl.gif", "size": "orj360", "geo": { "width": "266", "height": "200", "croped": false }, "large": { "size": "large", "url": "https://wx2.sinaimg.cn/large/006apWvQgy1fi7taijr9pg307e05kgvl.gif", "geo": { "width": "266", "height": "200", "croped": false } } } ] }, "show_type": 0, "openurl": "" }, { "card_type": 9, "itemid": "1076035650743478_-_4137278329132849", "scheme": "https://m.weibo.cn/status/FfDMkCjS1?mblogid=FfDMkCjS1&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "mblog": { "created_at": "昨天 06:58", "id": "4137278329132849", "mid": "4137278329132849", "idstr": "4137278329132849", "text": "周六早呀,今天有比我起的还早的吗<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_wabishi-f5765407f7.png\" style=\"width:1em;height:1em;\" alt=\"[挖鼻]\"></span> ", "textLength": 47, "source": "微博 weibo.com", "favorited": false, "thumbnail_pic": "http://wx4.sinaimg.cn/thumbnail/006apWvQgy1fi7tiv5e5qj30dc0d5dfz.jpg", "bmiddle_pic": "http://wx4.sinaimg.cn/bmiddle/006apWvQgy1fi7tiv5e5qj30dc0d5dfz.jpg", "original_pic": "http://wx4.sinaimg.cn/large/006apWvQgy1fi7tiv5e5qj30dc0d5dfz.jpg", "user": { "id": 5650743478, "screen_name": "京东客服", "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg", "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "statuses_count": 3245, "verified": true, "verified_type": 2, "verified_type_ext": 0, "verified_reason": "北京京东世纪贸易有限公司", "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服", "gender": "f", "mbtype": 2, "urank": 29, "mbrank": 2, "follow_me": false, "following": false, "followers_count": 18427, "follow_count": 235, "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg" }, "reposts_count": 0, "comments_count": 8, "attitudes_count": 2, "isLongText": false, "visible": { "type": 0, "list_id": 0 }, "mblogtype": 0, "bid": "FfDMkCjS1", "pics": [ { "pid": "006apWvQgy1fi7tiv5e5qj30dc0d5dfz", "url": "https://wx4.sinaimg.cn/orj360/006apWvQgy1fi7tiv5e5qj30dc0d5dfz.jpg", "size": "orj360", "geo": { "width": 273, "height": 270, "croped": false }, "large": { "size": "large", "url": "https://wx4.sinaimg.cn/large/006apWvQgy1fi7tiv5e5qj30dc0d5dfz.jpg", "geo": { "width": "480", "height": "473", "croped": false } } } ] }, "show_type": 0, "openurl": "" }, { "card_type": 9, "itemid": "1076035650743478_-_4137054743266182", "scheme": "https://m.weibo.cn/status/FfxXIdHGm?mblogid=FfxXIdHGm&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "mblog": { "created_at": "08-04", "id": "4137054743266182", "mid": "4137054743266182", "idstr": "4137054743266182", "text": "就问一句,这样人美心善的90后小哥你们要不要?<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_tian-52ea252705.png\" style=\"width:1em;height:1em;\" alt=\"[舔屏]\"></span><span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_tian-52ea252705.png\" style=\"width:1em;height:1em;\" alt=\"[舔屏]\"></span>", "source": "微博 weibo.com", "favorited": false, "user": { "id": 5650743478, "screen_name": "京东客服", "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg", "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "statuses_count": 3245, "verified": true, "verified_type": 2, "verified_type_ext": 0, "verified_reason": "北京京东世纪贸易有限公司", "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服", "gender": "f", "mbtype": 2, "urank": 29, "mbrank": 2, "follow_me": false, "following": false, "followers_count": 18427, "follow_count": 235, "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg" }, "retweeted_status": { "created_at": "08-04", "id": "4137016583280831", "mid": "4137016583280831", "idstr": "4137016583280831", "text": "<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_tian-52ea252705.png\" style=\"width:1em;height:1em;\" alt=\"[舔屏]\"></span><span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_tian-52ea252705.png\" style=\"width:1em;height:1em;\" alt=\"[舔屏]\"></span><span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_tian-52ea252705.png\" style=\"width:1em;height:1em;\" alt=\"[舔屏]\"></span> <a data-url=\"http://t.cn/R9S6VWV\" href=\"http://media.weibo.cn/article?object_id=1022%3A2309404137016584472707&url_type=39&object_type=article&pos=1&luicode=10000011&lfid=1076035650743478&featurecode=20000320&id=2309404137016584472707&ep=FfwYadLuD%2C1717871843%2CFfwYadLuD%2C1717871843\" data-hide=\"\"><span class=\"url-icon\"><img src=\"https://h5.sinaimg.cn/upload/2015/09/25/3/timeline_card_small_article_default.png\"></span></i><span class=\"surl-text\">90后小哥征婚启事</a> ", "textLength": 38, "source": "微博 weibo.com", "favorited": false, "user": { "id": 1717871843, "screen_name": "京东", "profile_image_url": "https://tvax4.sinaimg.cn/crop.0.0.480.480.180/6664a4e3ly8fffaxrnv8fj20dc0dcmy4.jpg", "profile_url": "https://m.weibo.cn/u/1717871843?uid=1717871843&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "statuses_count": 19903, "verified": true, "verified_type": 2, "verified_type_ext": 50, "verified_reason": "京东网上商城", "description": "中国最大的自营电商企业京东商城集团在线销售家电、数码通讯、电脑、家居百货、服装服饰、母婴、图书、食品等13大类数万个品牌上千万种优质商品。", "gender": "m", "mbtype": 12, "urank": 43, "mbrank": 5, "follow_me": false, "following": false, "followers_count": 4025036, "follow_count": 258, "cover_image_phone": "https://wx1.sinaimg.cn/crop.0.0.640.640.640/6664a4e3ly1fffb8torrtj20ku0ku409.jpg" }, "reposts_count": 12, "comments_count": 24, "attitudes_count": 16, "isLongText": false, "visible": { "type": 0, "list_id": 0 }, "page_info": { "page_pic": { "url": "https://wx3.sinaimg.cn/crop.0.0.617.347.1000/6664a4e3ly1fi7khoua7dj20hk09nn45.jpg" }, "page_url": "http://media.weibo.cn/article?object_id=1022%3A2309404137016584472707&url_type=39&object_type=article&pos=2&luicode=10000011&lfid=1076035650743478&featurecode=20000320&id=2309404137016584472707", "page_title": "京东", "content1": "90后小哥征婚启事", "content2": "", "icon": "https://h5.sinaimg.cn/upload/2016/12/28/14/feed_headlines_icon_flash20161228_2.png", "type": "article" }, "bid": "FfwYadLuD" }, "reposts_count": 0, "comments_count": 30, "attitudes_count": 1, "isLongText": false, "visible": { "type": 0, "list_id": 0 }, "mblogtype": 0, "raw_text": "就问一句,这样人美心善的90后小哥你们要不要?[舔屏][舔屏]", "bid": "FfxXIdHGm" }, "show_type": 0, "openurl": "" }, { "card_type": 9, "itemid": "1076035650743478_-_4136952959746775", "scheme": "https://m.weibo.cn/status/FfvjxETA3?mblogid=FfvjxETA3&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "mblog": { "created_at": "08-04", "id": "4136952959746775", "mid": "4136952959746775", "idstr": "4136952959746775", "text": "周五早上上班的你和下班的你<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_xiaoku-7430606cb7.png\" style=\"width:1em;height:1em;\" alt=\"[笑cry]\"></span> ", "textLength": 33, "source": "微博 weibo.com", "favorited": false, "thumbnail_pic": "http://wx1.sinaimg.cn/thumbnail/006apWvQgy1fi7fkqpatfj30j60j6jsg.jpg", "bmiddle_pic": "http://wx1.sinaimg.cn/bmiddle/006apWvQgy1fi7fkqpatfj30j60j6jsg.jpg", "original_pic": "http://wx1.sinaimg.cn/large/006apWvQgy1fi7fkqpatfj30j60j6jsg.jpg", "user": { "id": 5650743478, "screen_name": "京东客服", "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg", "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "statuses_count": 3245, "verified": true, "verified_type": 2, "verified_type_ext": 0, "verified_reason": "北京京东世纪贸易有限公司", "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服", "gender": "f", "mbtype": 2, "urank": 29, "mbrank": 2, "follow_me": false, "following": false, "followers_count": 18427, "follow_count": 235, "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg" }, "reposts_count": 0, "comments_count": 14, "attitudes_count": 1, "isLongText": false, "visible": { "type": 0, "list_id": 0 }, "mblogtype": 0, "bid": "FfvjxETA3", "pics": [ { "pid": "006apWvQgy1fi7fkqpatfj30j60j6jsg", "url": "https://wx1.sinaimg.cn/orj360/006apWvQgy1fi7fkqpatfj30j60j6jsg.jpg", "size": "orj360", "geo": { "width": 360, "height": 360, "croped": false }, "large": { "size": "large", "url": "https://wx1.sinaimg.cn/large/006apWvQgy1fi7fkqpatfj30j60j6jsg.jpg", "geo": { "width": "690", "height": "690", "croped": false } } }, { "pid": "006apWvQgy1fi7fkuj1tvg308c0fkmxy", "url": "https://wx1.sinaimg.cn/orj360/006apWvQgy1fi7fkuj1tvg308c0fkmxy.gif", "size": "orj360", "geo": { "width": "300", "height": "560", "croped": false }, "large": { "size": "large", "url": "https://wx1.sinaimg.cn/large/006apWvQgy1fi7fkuj1tvg308c0fkmxy.gif", "geo": { "width": "300", "height": "560", "croped": false } } } ] }, "show_type": 0, "openurl": "" }, { "card_type": 9, "itemid": "1076035650743478_-_4136663145262324", "scheme": "https://m.weibo.cn/status/FfnM6m4Yc?mblogid=FfnM6m4Yc&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "mblog": { "created_at": "08-03", "id": "4136663145262324", "mid": "4136663145262324", "idstr": "4136663145262324", "text": "输入法,你们喜欢用哪种?<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/others/d_doge-d903433c82.png\" style=\"width:1em;height:1em;\" alt=\"[doge]\"></span> ", "textLength": 30, "source": "微博 weibo.com", "favorited": false, "thumbnail_pic": "http://wx4.sinaimg.cn/thumbnail/006apWvQgy1fi6i8tkspqj30ku0i7mz4.jpg", "bmiddle_pic": "http://wx4.sinaimg.cn/bmiddle/006apWvQgy1fi6i8tkspqj30ku0i7mz4.jpg", "original_pic": "http://wx4.sinaimg.cn/large/006apWvQgy1fi6i8tkspqj30ku0i7mz4.jpg", "user": { "id": 5650743478, "screen_name": "京东客服", "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg", "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "statuses_count": 3245, "verified": true, "verified_type": 2, "verified_type_ext": 0, "verified_reason": "北京京东世纪贸易有限公司", "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服", "gender": "f", "mbtype": 2, "urank": 29, "mbrank": 2, "follow_me": false, "following": false, "followers_count": 18427, "follow_count": 235, "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg" }, "reposts_count": 4, "comments_count": 40, "attitudes_count": 6, "isLongText": false, "visible": { "type": 0, "list_id": 0 }, "mblogtype": 0, "bid": "FfnM6m4Yc", "pics": [ { "pid": "006apWvQgy1fi6i8tkspqj30ku0i7mz4", "url": "https://wx4.sinaimg.cn/orj360/006apWvQgy1fi6i8tkspqj30ku0i7mz4.jpg", "size": "orj360", "geo": { "width": 309, "height": 270, "croped": false }, "large": { "size": "large", "url": "https://wx4.sinaimg.cn/large/006apWvQgy1fi6i8tkspqj30ku0i7mz4.jpg", "geo": { "width": "750", "height": "655", "croped": false } } }, { "pid": "006apWvQgy1fi6i8z010xj30ku0h6jte", "url": "https://wx3.sinaimg.cn/orj360/006apWvQgy1fi6i8z010xj30ku0h6jte.jpg", "size": "orj360", "geo": { "width": 327, "height": 270, "croped": false }, "large": { "size": "large", "url": "https://wx3.sinaimg.cn/large/006apWvQgy1fi6i8z010xj30ku0h6jte.jpg", "geo": { "width": "750", "height": "618", "croped": false } } }, { "pid": "006apWvQgy1fi6i988w7pj30kt0hbgms", "url": "https://wx2.sinaimg.cn/orj360/006apWvQgy1fi6i988w7pj30kt0hbgms.jpg", "size": "orj360", "geo": { "width": 324, "height": 270, "croped": false }, "large": { "size": "large", "url": "https://wx2.sinaimg.cn/large/006apWvQgy1fi6i988w7pj30kt0hbgms.jpg", "geo": { "width": "749", "height": "623", "croped": false } } }, { "pid": "006apWvQgy1fi6i9bnkgfj30ku0gwgmj", "url": "https://wx2.sinaimg.cn/orj360/006apWvQgy1fi6i9bnkgfj30ku0gwgmj.jpg", "size": "orj360", "geo": { "width": 333, "height": 270, "croped": false }, "large": { "size": "large", "url": "https://wx2.sinaimg.cn/large/006apWvQgy1fi6i9bnkgfj30ku0gwgmj.jpg", "geo": { "width": "750", "height": "608", "croped": false } } } ] }, "show_type": 0, "openurl": "" }, { "card_type": 9, "itemid": "1076035650743478_-_4136613988263792", "scheme": "https://m.weibo.cn/status/FfmuOyFMY?mblogid=FfmuOyFMY&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "mblog": { "created_at": "08-03", "id": "4136613988263792", "mid": "4136613988263792", "idstr": "4136613988263792", "text": "<a class='k' href='https://m.weibo.cn/k/%E5%BC%A0%E8%8B%A5%E6%98%80%E5%94%90%E8%89%BA%E6%98%95%E5%85%AC%E5%BC%80%E6%81%8B%E6%83%85?from=feed'>#张若昀唐艺昕公开恋情#</a> 恭喜呀<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/others/l_xin-8e9a1a0346.png\" style=\"width:1em;height:1em;\" alt=\"[心]\"></span><span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/others/l_xin-8e9a1a0346.png\" style=\"width:1em;height:1em;\" alt=\"[心]\"></span><span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/others/l_xin-8e9a1a0346.png\" style=\"width:1em;height:1em;\" alt=\"[心]\"></span>,大家就默默干了这碗狗粮吧,狗粮够吃吗?不够吃的话,你(jing)们(dong)懂(you)的(shou)<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_wabishi-f5765407f7.png\" style=\"width:1em;height:1em;\" alt=\"[挖鼻]\"></span>", "source": "微博 weibo.com", "favorited": false, "user": { "id": 5650743478, "screen_name": "京东客服", "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg", "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "statuses_count": 3245, "verified": true, "verified_type": 2, "verified_type_ext": 0, "verified_reason": "北京京东世纪贸易有限公司", "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服", "gender": "f", "mbtype": 2, "urank": 29, "mbrank": 2, "follow_me": false, "following": false, "followers_count": 18427, "follow_count": 235, "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg" }, "retweeted_status": { "created_at": "08-02", "id": "4136423907632073", "mid": "4136423907632073", "idstr": "4136423907632073", "text": "时光赐给我们盗不走的爱人,而你赐给我时光。<a href='https://m.weibo.cn/n/唐艺昕'>@唐艺昕</a> ", "textLength": 49, "source": "iPhone 6s", "favorited": false, "thumbnail_pic": "http://wx1.sinaimg.cn/thumbnail/6cf03c75ly1fi5qtg3z8fj20hs0nqq46.jpg", "bmiddle_pic": "http://wx1.sinaimg.cn/bmiddle/6cf03c75ly1fi5qtg3z8fj20hs0nqq46.jpg", "original_pic": "http://wx1.sinaimg.cn/large/6cf03c75ly1fi5qtg3z8fj20hs0nqq46.jpg", "user": { "id": 1827683445, "screen_name": "张若昀", "profile_image_url": "https://tva3.sinaimg.cn/crop.9.0.494.494.180/6cf03c75jw8fajncv51lvj20e80dq74i.jpg", "profile_url": "https://m.weibo.cn/u/1827683445?uid=1827683445&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "statuses_count": 1199, "verified": true, "verified_type": 0, "verified_type_ext": 1, "verified_reason": "演员张若昀", "description": "Per Aspera Ad Astra 循此苦旅,以达天际。 工作邮箱:ruoyunwork@126.com", "gender": "m", "mbtype": 12, "urank": 37, "mbrank": 6, "follow_me": false, "following": false, "followers_count": 13527839, "follow_count": 195, "cover_image_phone": "https://tva1.sinaimg.cn/crop.0.0.640.640.640/549d0121tw1egm1kjly3jj20hs0hsq4f.jpg" }, "picStatus": "0:1,1:1", "reposts_count": 283896, "comments_count": 325438, "attitudes_count": 2380726, "isLongText": false, "visible": { "type": 0, "list_id": 0 }, "cardid": "star_183", "bid": "Ffhyew1rX", "pics": [ { "pid": "6cf03c75ly1fi5qtg3z8fj20hs0nqq46", "url": "https://wx1.sinaimg.cn/orj360/6cf03c75ly1fi5qtg3z8fj20hs0nqq46.jpg", "size": "orj360", "geo": { "width": 360, "height": 480, "croped": false }, "large": { "size": "large", "url": "https://wx1.sinaimg.cn/large/6cf03c75ly1fi5qtg3z8fj20hs0nqq46.jpg", "geo": { "width": "640", "height": "854", "croped": false } } }, { "pid": "6cf03c75ly1fi5qtfv90rj20c80c6dgs", "url": "https://wx1.sinaimg.cn/orj360/6cf03c75ly1fi5qtfv90rj20c80c6dgs.jpg", "size": "orj360", "geo": { "width": 271, "height": 270, "croped": false }, "large": { "size": "large", "url": "https://wx1.sinaimg.cn/large/6cf03c75ly1fi5qtfv90rj20c80c6dgs.jpg", "geo": { "width": "440", "height": "438", "croped": false } } } ] }, "reposts_count": 3, "comments_count": 13, "attitudes_count": 6, "isLongText": false, "visible": { "type": 0, "list_id": 0 }, "mblogtype": 0, "raw_text": "#张若昀唐艺昕公开恋情# 恭喜呀[心][心][心],大家就默默干了这碗狗粮吧,狗粮够吃吗?不够吃的话,你(jing)们(dong)懂(you)的(shou)[挖鼻]", "bid": "FfmuOyFMY" }, "show_type": 0, "openurl": "" }, { "card_type": 9, "itemid": "1076035650743478_-_4136598981629551", "scheme": "https://m.weibo.cn/status/Ffm6C6PV5?mblogid=Ffm6C6PV5&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "mblog": { "created_at": "08-03", "id": "4136598981629551", "mid": "4136598981629551", "idstr": "4136598981629551", "text": "仿佛看到了自己<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/others/d_erha-0d2bea3a7d.png\" style=\"width:1em;height:1em;\" alt=\"[二哈]\"></span>", "source": "微博 weibo.com", "favorited": false, "user": { "id": 5650743478, "screen_name": "京东客服", "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg", "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "statuses_count": 3245, "verified": true, "verified_type": 2, "verified_type_ext": 0, "verified_reason": "北京京东世纪贸易有限公司", "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服", "gender": "f", "mbtype": 2, "urank": 29, "mbrank": 2, "follow_me": false, "following": false, "followers_count": 18427, "follow_count": 235, "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg" }, "retweeted_status": { "created_at": "08-02", "id": "4136434165892638", "mid": "4136434165892638", "idstr": "4136434165892638", "text": "我在张若昀和唐艺昕公开恋情的微博里看到了你唉~~<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/others/d_doge-d903433c82.png\" style=\"width:1em;height:1em;\" alt=\"[doge]\"></span> ", "textLength": 54, "source": "", "favorited": false, "thumbnail_pic": "http://wx3.sinaimg.cn/thumbnail/bb97de37ly1fi5s0g76jrj20yi0p1n0m.jpg", "bmiddle_pic": "http://wx3.sinaimg.cn/bmiddle/bb97de37ly1fi5s0g76jrj20yi0p1n0m.jpg", "original_pic": "http://wx3.sinaimg.cn/large/bb97de37ly1fi5s0g76jrj20yi0p1n0m.jpg", "user": { "id": 3147292215, "screen_name": "草图君", "profile_image_url": "https://tva4.sinaimg.cn/crop.0.0.511.511.180/bb97de37jw8f57ewfuqt9j20e70e8q37.jpg", "profile_url": "https://m.weibo.cn/u/3147292215?uid=3147292215&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "statuses_count": 5980, "verified": true, "verified_type": 0, "verified_type_ext": 1, "verified_reason": "直播红人 微博知名综艺博主", "description": "一个得罪了半个娱乐圈的少年", "gender": "m", "mbtype": 12, "urank": 44, "mbrank": 6, "follow_me": false, "following": false, "followers_count": 6192418, "follow_count": 433, "cover_image_phone": "https://tva2.sinaimg.cn/crop.0.0.640.640.640/bb97de37jw1ewysfmiioyj20yi0ykqe7.jpg" }, "picStatus": "0:1,1:1,2:1,3:1", "reposts_count": 3832, "comments_count": 7349, "attitudes_count": 65785, "isLongText": false, "visible": { "type": 0, "list_id": 0 }, "bid": "FfhOMoIWy", "pics": [ { "pid": "bb97de37ly1fi5s0g76jrj20yi0p1n0m", "url": "https://wx3.sinaimg.cn/orj360/bb97de37ly1fi5s0g76jrj20yi0p1n0m.jpg", "size": "orj360", "geo": { "width": 372, "height": 270, "croped": false }, "large": { "size": "large", "url": "https://wx3.sinaimg.cn/large/bb97de37ly1fi5s0g76jrj20yi0p1n0m.jpg", "geo": { "width": "1242", "height": "901", "croped": false } } }, { "pid": "bb97de37ly1fi5s0goz0nj20hs0nq0tw", "url": "https://wx4.sinaimg.cn/orj360/bb97de37ly1fi5s0goz0nj20hs0nq0tw.jpg", "size": "orj360", "geo": { "width": 360, "height": 480, "croped": false }, "large": { "size": "large", "url": "https://wx4.sinaimg.cn/large/bb97de37ly1fi5s0goz0nj20hs0nq0tw.jpg", "geo": { "width": "640", "height": "854", "croped": false } } }, { "pid": "bb97de37ly1fi5s0h69g3j20c80c7juk", "url": "https://wx1.sinaimg.cn/orj360/bb97de37ly1fi5s0h69g3j20c80c7juk.jpg", "size": "orj360", "geo": { "width": 270, "height": 270, "croped": false }, "large": { "size": "large", "url": "https://wx1.sinaimg.cn/large/bb97de37ly1fi5s0h69g3j20c80c7juk.jpg", "geo": { "width": "440", "height": "439", "croped": false } } }, { "pid": "bb97de37ly1fi5s0fg68mj202g02g3yo", "url": "https://wx1.sinaimg.cn/orj360/bb97de37ly1fi5s0fg68mj202g02g3yo.jpg", "size": "orj360", "geo": { "width": "88", "height": "88", "croped": false }, "large": { "size": "large", "url": "https://wx1.sinaimg.cn/large/bb97de37ly1fi5s0fg68mj202g02g3yo.jpg", "geo": { "width": "88", "height": "88", "croped": false } } } ] }, "reposts_count": 2, "comments_count": 21, "attitudes_count": 7, "isLongText": false, "visible": { "type": 0, "list_id": 0 }, "mblogtype": 0, "raw_text": "仿佛看到了自己[二哈]", "bid": "Ffm6C6PV5" }, "show_type": 0, "openurl": "" }, { "card_type": 11, "show_type": 0, "card_group": [], "openurl": "" }, { "card_type": 9, "itemid": "1076035650743478_-_4136407577953610", "scheme": "https://m.weibo.cn/status/Ffh7Txn62?mblogid=Ffh7Txn62&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "mblog": { "created_at": "08-02", "id": "4136407577953610", "mid": "4136407577953610", "idstr": "4136407577953610", "text": "<a class='k' href='https://m.weibo.cn/k/%E4%B8%80%E4%B8%AA%E6%84%9F%E4%BA%BA%E7%9A%84%E6%95%85%E4%BA%8B?from=feed'>#一个感人的故事#</a>去年暑假,8岁的小明特意坐了三个多小时车去奶奶家;奶奶为了小明也愿意去县城的超市买小明爱的薯片和巧克力等零食,但是奶奶家没有WiFi和智能手机,奶奶可以陪他一起看古装电视剧;讲他最爱听的神话故事,唱小曲哄他睡觉……奶奶家有吃不完的零食,也不会"太无聊了"<br/>今年,奶奶提前做 ...<a href=\"/status/4136407577953610\">全文</a>", "textLength": 393, "source": "微博 weibo.com", "favorited": false, "user": { "id": 5650743478, "screen_name": "京东客服", "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg", "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "statuses_count": 3245, "verified": true, "verified_type": 2, "verified_type_ext": 0, "verified_reason": "北京京东世纪贸易有限公司", "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服", "gender": "f", "mbtype": 2, "urank": 29, "mbrank": 2, "follow_me": false, "following": false, "followers_count": 18427, "follow_count": 235, "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg" }, "reposts_count": 6, "comments_count": 17, "attitudes_count": 2, "isLongText": true, "visible": { "type": 0, "list_id": 0 }, "mblogtype": 0, "page_info": { "page_pic": { "url": "https://ww3.sinaimg.cn/thumb180/74f67c55jw9ey0hrixq57j2050050t92.jpg" }, "page_url": "https://m.weibo.cn/p/index?containerid=100808f50fb5741ffd610570b92baf2cc3b342&extparam=%E4%B8%80%E4%B8%AA%E6%84%9F%E4%BA%BA%E7%9A%84%E6%95%85%E4%BA%8B&luicode=10000011&lfid=1076035650743478&featurecode=20000320", "page_title": "#一个感人的故事#", "content1": "", "content2": "3人关注", "type": "topic" }, "bid": "Ffh7Txn62" }, "show_type": 0, "openurl": "" } ], "ok": 1, "showAppTips": 0, "scheme": "sinaweibo://cardlist?containerid=1076035650743478&luicode=10000011&lfid=100103type=1&q=京东客服&featurecode=20000320"}
上面只是一个页面的说说,估计写前端移动端的要晕死,好恶心,要是返回个null或者空回来。。
上面代码可以直接在jsonview里面进行格式化,
爬取的字段是:cards 下面的mblog下面的:text ,idstr(拼接评论页的)
评论条目:https://m.weibo.cn/api/comments/show?id=4137390568546147&page=2
这里的id就是idstr
详情页就是上面评论条目的json串,搞下来也是一大把,跟上面的差不多,详情页里面的数据跟评论页的数据差不多,这里就不再继续多些了,因为上面的内容已经占用的差不多了
因为微博的封IP地址的原因,所以第一次爬取了4w多数据,就GG了,第二天晚上睡眠30秒,爬取一条,发现,毛用也没有,只好是接着爬,ip不封了之后换了cookie,换了starturl,换了page索引继续爬取,也睡眠了10秒,反正睡多了也没用,最后爬取的垃圾数据有22万左右吧,去掉去重不要的估计也就4000不知道有没有,反正也没数。
附上几张爬虫过程中的图片截图:
最后是微博数据的结果图片:
这里的代码上传到github上了,有需要的话可以自己去下载,另外写了一份类似于 爬取新浪微博京东客服 @京东客服的简单爬虫。
发一下牢骚,json串又多又大又不稳定,返回不一致
贴上部分代码:
# encoding=utf8import requestsimport jsonimport reimport timestartUrl = 'https://m.weibo.cn/api/container/getIndex?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320&type=uid&value=5650743478&containerid=1076035650743478'headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0' , 'Cookie': 'ALF=1504709445; SCF=Ag0epa_4tyFCglnCwHJiaRDznUy645wpqEhg-dG3Sv0cbfGX1wNmqXPnHQroard1FW2nn3RdCnmux4VZ7bFRuMo.; SUHB=0ebt4qVvtKU1d7; _T_WM=22bb4d80315608a0e9bd3bf92b3c1dac; SUB=_2A250jA4VDeRhGeBN6FsT8i7MyTyIHXVXjpJdrDV6PUJbktBeLXjBkW1oTOqmqg0rff3UmekP4TzhMFYtsw..; SUBP=0033WrSXqPxfM725Ws9jqgMF55529P9D9WFNrBkhSeVrfPGckwnaFCcy5JpX5o2p5NHD95Qce0e4eoz7ehz7Ws4DqcjBIcHVdr.peoepeoefeK5Ee5tt; M_WEIBOCN_PARAMS=luicode%3D10000011%26lfid%3D100103type%253D1%2526q%253D%2540%25E4%25BA%25AC%25E4%25B8%259C%25E5%25AE%25A2%25E6%259C%258D%26featurecode%3D20000320%26fid%3D1076035650743478%26uicode%3D10000011' , 'Host':'m.weibo.cn' ,'Accept':'application/json, text/plain, */*', 'Accept-Language':'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3', 'Accept-Encoding':'gzip, deflate, br', 'X-Requested-With':'XMLHttpRequest', 'Referer':'https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%40%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320',}# 详情页listdetaiList = []# 说说textList = []# 说说跟详情页textAnddetailList = []# 评论数,详情页返回的是每一页10个commentsList = []numSizeList = []detaiLinks = []def getJsonData(url): req = requests.get(url, headers=headers) # print(req.text) return req.textjsonData = getJsonData(startUrl)def parseDetailListdata(listdata): for detailData in listdata: text = detailData['text'] if 'text' in detailData else "" reply_text = detailData['reply_text'] if 'reply_text' in detailData else "" f.write(text+'\r\n') print(text) print(reply_text) f.write(reply_text + '\r\n') # passdef parseJsonData(jsonData): global pagedetail jsondata = json.loads(jsonData, 'utf-8') print(jsondata) listdata = jsondata['cards']if 'cards' in jsondata else "" print(listdata) for datainfo in listdata: # print(datainfo) mblog = datainfo['mblog'] if 'mblog' in datainfo else "" # print(mblog) if len(mblog)> 0 : # 有数据,继续执行 descText = mblog['text'] # print(descText) descText = getTextInfo(descText) dex = '发表的说说开始:\r\n' f.write(dex) dex2 = '发表的说说内容:'+descText+'\r\n' f.write(dex2) print("发表的说说开始:") print('发表的说说内容:'+descText) textList.append(descText) comments = mblog['comments_count'] # 评论数 numSizeList.append(comments) # print(comments) # if comments > 1: # 有评论,获取到评论链接上的数据 # detailLine = datainfo['scheme'] # print(detailLine) # detaiList.append(detailLine) idstr = mblog['idstr'] detaiLinks = getpageSize(comments,idstr) pagedetail = 1 for detaillink in detaiLinks: jsonData2 = getJsonData(detaillink) str11 = '评论详情页条目:'+str(pagedetail)+' .......\r\n' f.write(str11) print('评论详情页条目:'+str(pagedetail)+' .......') print(jsonData2) pagedetail = pagedetail +1 jsonDatadetail = json.loads(jsonData2, 'utf-8') listdata = jsonDatadetail['data'] if 'data' in jsonDatadetail else '' # print(listdata) parseDetailListdata(listdata) pagedetail = 1 print('主页条目结束...') f.write('主页条目结束...\r\n') # detailJsonStr = 'https://m.weibo.cn/api/comments/show?id=' + str(idstr) + '&page=' + str(comments) # print(detailJsonStr) # commentsList.append(detailJsonStr) else: # 在里面的话,直接跳出方法 return print('爬取结束......')def getTextInfo(textStr): # 得到文本内容 # for textStr in textList: # print('***********') regx = '<span(.*?)</span>' strregx = re.compile(regx) strregx = re.findall(strregx, str(textStr)) replacestr = str(textStr).replace('<span' + ''.join(strregx) + '</span>', '') str1 = '<span' sstr1 = str(textStr)[0:str(textStr).find(str1)] # print(sstr1) return sstr1 # print(textStr) # print(replacestr)# 得到文本详情页链接def getpageSize(comments,idstr): for i in range(1,int((comments / 10))+2): # 评论也的link detaiLink = 'https://m.weibo.cn/api/comments/show?id=' + str(idstr) + '&page=' +str(i) detaiLinks.append(detaiLink) # print(detaiLink) return detaiLinks# parseJsonData(jsonData)# print(str(textList)) page = 7# print(str(detaiList))f = open('微博京东说说跟评论.txt', 'a',encoding='utf-8')def main_start(): for inde in range(11,50): # startUrl = 'https://m.weibo.cn/api/container/getIndex?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320&type=uid&value=5650743478&containerid=1005055650743478&page='+str(inde) startUrl = 'https://m.weibo.cn/api/container/getIndex?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D@%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320&type=uid&value=5650743478&containerid=1076035650743478&page={}'+str(inde) pageindex = '页数:'+str(inde)+'\r\n' print('startUrl '+'index '+str(inde)+' '+startUrl) f.write(pageindex) data = getJsonData(startUrl) parseJsonData(data) time.sleep(2) f.close()main_start()
现在暂时可以借用这份代码,里面的url跟cookie换一下,用自己的账号就可以。另外爬虫要学会用fiddler等类似的抓包工具,感觉确实是抓包利器。
公司996啊, 加上自己的能力有限,确实现在学习也就到这深度了 以后要多了解一下cookie池,代理池之类似的东西。
github地址:
https://github.com/643435675/PyStudy
end
- python3[爬虫实战] 爬虫之requests爬取新浪微博京东客服
- python3 [爬虫实战] selenium + requests 爬取安居客
- python3 [爬虫实战] selenium + requests 爬取安居客
- [python3]爬虫实战一之爬取糗事百科段子
- Python3网络爬虫:requests爬取动态网页内容
- python3 [爬虫入门实战] 爬虫之使用selenium 爬取百度招聘内容并存mongodb
- python3 [爬虫入门实战]爬虫之scrapy爬取中国医学人才网
- python3 [爬虫入门实战]爬虫之scrapy爬取中华人民共和国民政部
- Python3[爬虫实战] 爬虫之scrapy爬取爱上程序网存MongoDB(android模块)
- python3 [爬虫实战] 微博爬虫京东客服之Selenium + Chrom浏览器的使用(上)
- python3爬虫-爬取新浪新闻首页所有新闻标题
- python3爬虫初探(二)之requests
- python3 [入门基础实战] 爬虫入门之xpath爬取脚本之家python栏目
- Python3.X 爬虫实战(并发爬取)
- python3[爬虫基础入门实战] 爬取豆瓣电影排行top250
- Python3.X 爬虫实战(动态页面爬取解析)
- Python3.X 爬虫实战(并发爬取)
- python3 [爬虫入门实战]爬取熊猫直播用户信息
- easyui-textbox设置不可用
- Java并发编程实战(学习笔记 七 第八章 线程池的使用)
- unity游戏开发_stealth秘密潜入
- LintCode-[中等] 612. K个最近的点
- lua开发环境搭建之HelloLua
- python3[爬虫实战] 爬虫之requests爬取新浪微博京东客服
- 异步代码错误捕获
- 杭电ACM:人见人爱A-B
- 路漫漫其修远兮,吾将上下而求索(非干货,勿入)
- CSS3理解display属性
- 数据结构--堆
- css——三角形的实现
- 互斥资源加锁的实现方式
- 用面向对象的方法实现互斥资源加锁