python3[爬虫实战] 爬虫之requests爬取新浪微博京东客服

来源:互联网 发布:淘宝宝贝长图作用 编辑:程序博客网 时间:2024/05/20 00:15

爬取的内容为京东客服的微博及评论

思路:主要是通过手机端访问新浪微博的api接口,然后进行数据的筛选,

类似于这样的:https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%40%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320

这个主要是登陆上去的微博的url链接,


也可以在
https://passport.weibo.cn/signin/welcome?entry=mweibo&r=http%3A%2F%2Fm.weibo.cn%2F
进行新浪微博的登陆,

可以看到的界面:

这里写图片描述

这里主要爬取的内容为:

说说,说说下面的评论条目

虽然很简单,但是,不得不说句mmp,爬取的过程很坎坷,现在是一直在ip上,另外,个人经过尝试,睡眠时间30秒一次也不是很好的效果, 睡眠10秒就足够了,可能该封你的ip还是会封的,我这问题应该封ip的情况

爬取的方法主要是通过手机端api进行json数据的获取,然后进行数据的提取。

这里可以使用火狐fox的插件使用:

主要api:

说说API:

第一条微博:
https://m.weibo.cn/api/container/getIndex?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320&type=uid&value=5650743478&containerid=1076035650743478

第二条微博:
https://m.weibo.cn/api/container/getIndex?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320&type=uid&value=5650743478&containerid=1076035650743478&page=2

类似于这样子的,

详情评论内容API:

在每条评论下会有一个idstr:4137390568546147

然后跳到评论详情页:
https://m.weibo.cn/status/4137390568546147

评论条目拼加方式:
https://m.weibo.cn/api/comments/show?id=4137390568546147&page=1
https://m.weibo.cn/api/comments/show?id=4137390568546147&page=2

带大家看一下评论api下返回的数据:JSON格式的

{    "cardlistInfo": {        "containerid": "1076035650743478",        "v_p": 42,        "show_style": 1,        "total": 3264,        "page": 2    },    "cards": [        {            "card_type": 9,            "itemid": "1076035650743478_-_4137858652321796",            "scheme": "https://m.weibo.cn/status/FfSSl9K0k?mblogid=FfSSl9K0k&luicode=10000011&lfid=1076035650743478&featurecode=20000320",            "mblog": {                "created_at": "2小时前",                "id": "4137858652321796",                "mid": "4137858652321796",                "idstr": "4137858652321796",                "text": "明天又要上班了,用四个字描述下你现在的心情吧<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/others/d_erha-0d2bea3a7d.png\" style=\"width:1em;height:1em;\" alt=\"[二哈]\"></span> ​​​",                "textLength": 50,                "source": "微博 weibo.com",                "favorited": false,                "thumbnail_pic": "http://wx4.sinaimg.cn/thumbnail/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg",                "bmiddle_pic": "http://wx4.sinaimg.cn/bmiddle/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg",                "original_pic": "http://wx4.sinaimg.cn/large/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg",                "user": {                    "id": 5650743478,                    "screen_name": "京东客服",                    "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg",                    "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320",                    "statuses_count": 3245,                    "verified": true,                    "verified_type": 2,                    "verified_type_ext": 0,                    "verified_reason": "北京京东世纪贸易有限公司",                    "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服",                    "gender": "f",                    "mbtype": 2,                    "urank": 29,                    "mbrank": 2,                    "follow_me": false,                    "following": false,                    "followers_count": 18427,                    "follow_count": 235,                    "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"                },                "reposts_count": 0,                "comments_count": 4,                "attitudes_count": 2,                "isLongText": false,                "visible": {                    "type": 0,                    "list_id": 0                },                "mblogtype": 0,                "bid": "FfSSl9K0k",                "pics": [                    {                        "pid": "006apWvQgy1fi7tkjguy4j309q09qt8q",                        "url": "https://wx4.sinaimg.cn/orj360/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg",                        "size": "orj360",                        "geo": {                            "width": "350",                            "height": "350",                            "croped": false                        },                        "large": {                            "size": "large",                            "url": "https://wx4.sinaimg.cn/large/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg",                            "geo": {                                "width": "350",                                "height": "350",                                "croped": false                            }                        }                    }                ]            },            "show_type": 0,            "openurl": ""        },        {            "card_type": 9,            "itemid": "1076035650743478_-_4137692553365577",            "scheme": "https://m.weibo.cn/status/FfOyre7xv?mblogid=FfOyre7xv&luicode=10000011&lfid=1076035650743478&featurecode=20000320",            "mblog": {                "created_at": "13小时前",                "id": "4137692553365577",                "mid": "4137692553365577",                "idstr": "4137692553365577",                "text": "你觉得举办哪种《中国有_____》比赛,你能进入决赛? ​​​",                "textLength": 49,                "source": "微博 weibo.com",                "favorited": false,                "thumbnail_pic": "http://wx2.sinaimg.cn/thumbnail/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg",                "bmiddle_pic": "http://wx2.sinaimg.cn/bmiddle/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg",                "original_pic": "http://wx2.sinaimg.cn/large/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg",                "user": {                    "id": 5650743478,                    "screen_name": "京东客服",                    "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg",                    "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320",                    "statuses_count": 3245,                    "verified": true,                    "verified_type": 2,                    "verified_type_ext": 0,                    "verified_reason": "北京京东世纪贸易有限公司",                    "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服",                    "gender": "f",                    "mbtype": 2,                    "urank": 29,                    "mbrank": 2,                    "follow_me": false,                    "following": false,                    "followers_count": 18427,                    "follow_count": 235,                    "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"                },                "reposts_count": 0,                "comments_count": 13,                "attitudes_count": 1,                "isLongText": false,                "visible": {                    "type": 0,                    "list_id": 0                },                "mblogtype": 0,                "bid": "FfOyre7xv",                "pics": [                    {                        "pid": "006apWvQgy1fi7ul9n9rfj30k00lsgnj",                        "url": "https://wx2.sinaimg.cn/orj360/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg",                        "size": "orj360",                        "geo": {                            "width": 360,                            "height": 392,                            "croped": false                        },                        "large": {                            "size": "large",                            "url": "https://wx2.sinaimg.cn/large/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg",                            "geo": {                                "width": "720",                                "height": "784",                                "croped": false                            }                        }                    }                ]            },            "show_type": 0,            "openurl": ""        },        {            "card_type": 9,            "itemid": "1076035650743478_-_4137390568546147",            "scheme": "https://m.weibo.cn/status/FfGHmzRf5?mblogid=FfGHmzRf5&luicode=10000011&lfid=1076035650743478&featurecode=20000320",            "mblog": {                "created_at": "昨天 14:24",                "id": "4137390568546147",                "mid": "4137390568546147",                "idstr": "4137390568546147",                "text": "周末就是买买买,吃吃吃<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_huaixiao-bb5966dcc6.png\" style=\"width:1em;height:1em;\" alt=\"[坏笑]\"></span> ​​​",                "textLength": 28,                "source": "微博 weibo.com",                "favorited": false,                "thumbnail_pic": "http://wx2.sinaimg.cn/thumbnail/006apWvQgy1fi7taijr9pg307e05kgvl.gif",                "bmiddle_pic": "http://wx2.sinaimg.cn/bmiddle/006apWvQgy1fi7taijr9pg307e05kgvl.gif",                "original_pic": "http://wx2.sinaimg.cn/large/006apWvQgy1fi7taijr9pg307e05kgvl.gif",                "user": {                    "id": 5650743478,                    "screen_name": "京东客服",                    "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg",                    "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320",                    "statuses_count": 3245,                    "verified": true,                    "verified_type": 2,                    "verified_type_ext": 0,                    "verified_reason": "北京京东世纪贸易有限公司",                    "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服",                    "gender": "f",                    "mbtype": 2,                    "urank": 29,                    "mbrank": 2,                    "follow_me": false,                    "following": false,                    "followers_count": 18427,                    "follow_count": 235,                    "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"                },                "reposts_count": 0,                "comments_count": 19,                "attitudes_count": 1,                "isLongText": false,                "visible": {                    "type": 0,                    "list_id": 0                },                "mblogtype": 0,                "bid": "FfGHmzRf5",                "pics": [                    {                        "pid": "006apWvQgy1fi7taijr9pg307e05kgvl",                        "url": "https://wx2.sinaimg.cn/orj360/006apWvQgy1fi7taijr9pg307e05kgvl.gif",                        "size": "orj360",                        "geo": {                            "width": "266",                            "height": "200",                            "croped": false                        },                        "large": {                            "size": "large",                            "url": "https://wx2.sinaimg.cn/large/006apWvQgy1fi7taijr9pg307e05kgvl.gif",                            "geo": {                                "width": "266",                                "height": "200",                                "croped": false                            }                        }                    }                ]            },            "show_type": 0,            "openurl": ""        },        {            "card_type": 9,            "itemid": "1076035650743478_-_4137278329132849",            "scheme": "https://m.weibo.cn/status/FfDMkCjS1?mblogid=FfDMkCjS1&luicode=10000011&lfid=1076035650743478&featurecode=20000320",            "mblog": {                "created_at": "昨天 06:58",                "id": "4137278329132849",                "mid": "4137278329132849",                "idstr": "4137278329132849",                "text": "周六早呀,今天有比我起的还早的吗<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_wabishi-f5765407f7.png\" style=\"width:1em;height:1em;\" alt=\"[挖鼻]\"></span> ​​​​",                "textLength": 47,                "source": "微博 weibo.com",                "favorited": false,                "thumbnail_pic": "http://wx4.sinaimg.cn/thumbnail/006apWvQgy1fi7tiv5e5qj30dc0d5dfz.jpg",                "bmiddle_pic": "http://wx4.sinaimg.cn/bmiddle/006apWvQgy1fi7tiv5e5qj30dc0d5dfz.jpg",                "original_pic": "http://wx4.sinaimg.cn/large/006apWvQgy1fi7tiv5e5qj30dc0d5dfz.jpg",                "user": {                    "id": 5650743478,                    "screen_name": "京东客服",                    "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg",                    "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320",                    "statuses_count": 3245,                    "verified": true,                    "verified_type": 2,                    "verified_type_ext": 0,                    "verified_reason": "北京京东世纪贸易有限公司",                    "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服",                    "gender": "f",                    "mbtype": 2,                    "urank": 29,                    "mbrank": 2,                    "follow_me": false,                    "following": false,                    "followers_count": 18427,                    "follow_count": 235,                    "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"                },                "reposts_count": 0,                "comments_count": 8,                "attitudes_count": 2,                "isLongText": false,                "visible": {                    "type": 0,                    "list_id": 0                },                "mblogtype": 0,                "bid": "FfDMkCjS1",                "pics": [                    {                        "pid": "006apWvQgy1fi7tiv5e5qj30dc0d5dfz",                        "url": "https://wx4.sinaimg.cn/orj360/006apWvQgy1fi7tiv5e5qj30dc0d5dfz.jpg",                        "size": "orj360",                        "geo": {                            "width": 273,                            "height": 270,                            "croped": false                        },                        "large": {                            "size": "large",                            "url": "https://wx4.sinaimg.cn/large/006apWvQgy1fi7tiv5e5qj30dc0d5dfz.jpg",                            "geo": {                                "width": "480",                                "height": "473",                                "croped": false                            }                        }                    }                ]            },            "show_type": 0,            "openurl": ""        },        {            "card_type": 9,            "itemid": "1076035650743478_-_4137054743266182",            "scheme": "https://m.weibo.cn/status/FfxXIdHGm?mblogid=FfxXIdHGm&luicode=10000011&lfid=1076035650743478&featurecode=20000320",            "mblog": {                "created_at": "08-04",                "id": "4137054743266182",                "mid": "4137054743266182",                "idstr": "4137054743266182",                "text": "就问一句,这样人美心善的90后小哥你们要不要?<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_tian-52ea252705.png\" style=\"width:1em;height:1em;\" alt=\"[舔屏]\"></span><span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_tian-52ea252705.png\" style=\"width:1em;height:1em;\" alt=\"[舔屏]\"></span>",                "source": "微博 weibo.com",                "favorited": false,                "user": {                    "id": 5650743478,                    "screen_name": "京东客服",                    "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg",                    "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320",                    "statuses_count": 3245,                    "verified": true,                    "verified_type": 2,                    "verified_type_ext": 0,                    "verified_reason": "北京京东世纪贸易有限公司",                    "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服",                    "gender": "f",                    "mbtype": 2,                    "urank": 29,                    "mbrank": 2,                    "follow_me": false,                    "following": false,                    "followers_count": 18427,                    "follow_count": 235,                    "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"                },                "retweeted_status": {                    "created_at": "08-04",                    "id": "4137016583280831",                    "mid": "4137016583280831",                    "idstr": "4137016583280831",                    "text": "<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_tian-52ea252705.png\" style=\"width:1em;height:1em;\" alt=\"[舔屏]\"></span><span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_tian-52ea252705.png\" style=\"width:1em;height:1em;\" alt=\"[舔屏]\"></span><span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_tian-52ea252705.png\" style=\"width:1em;height:1em;\" alt=\"[舔屏]\"></span> <a data-url=\"http://t.cn/R9S6VWV\" href=\"http://media.weibo.cn/article?object_id=1022%3A2309404137016584472707&url_type=39&object_type=article&pos=1&luicode=10000011&lfid=1076035650743478&featurecode=20000320&id=2309404137016584472707&ep=FfwYadLuD%2C1717871843%2CFfwYadLuD%2C1717871843\" data-hide=\"\"><span class=\"url-icon\"><img src=\"https://h5.sinaimg.cn/upload/2015/09/25/3/timeline_card_small_article_default.png\"></span></i><span class=\"surl-text\">90后小哥征婚启事</a> ​​​",                    "textLength": 38,                    "source": "微博 weibo.com",                    "favorited": false,                    "user": {                        "id": 1717871843,                        "screen_name": "京东",                        "profile_image_url": "https://tvax4.sinaimg.cn/crop.0.0.480.480.180/6664a4e3ly8fffaxrnv8fj20dc0dcmy4.jpg",                        "profile_url": "https://m.weibo.cn/u/1717871843?uid=1717871843&luicode=10000011&lfid=1076035650743478&featurecode=20000320",                        "statuses_count": 19903,                        "verified": true,                        "verified_type": 2,                        "verified_type_ext": 50,                        "verified_reason": "京东网上商城",                        "description": "中国最大的自营电商企业京东商城集团在线销售家电、数码通讯、电脑、家居百货、服装服饰、母婴、图书、食品等13大类数万个品牌上千万种优质商品。",                        "gender": "m",                        "mbtype": 12,                        "urank": 43,                        "mbrank": 5,                        "follow_me": false,                        "following": false,                        "followers_count": 4025036,                        "follow_count": 258,                        "cover_image_phone": "https://wx1.sinaimg.cn/crop.0.0.640.640.640/6664a4e3ly1fffb8torrtj20ku0ku409.jpg"                    },                    "reposts_count": 12,                    "comments_count": 24,                    "attitudes_count": 16,                    "isLongText": false,                    "visible": {                        "type": 0,                        "list_id": 0                    },                    "page_info": {                        "page_pic": {                            "url": "https://wx3.sinaimg.cn/crop.0.0.617.347.1000/6664a4e3ly1fi7khoua7dj20hk09nn45.jpg"                        },                        "page_url": "http://media.weibo.cn/article?object_id=1022%3A2309404137016584472707&url_type=39&object_type=article&pos=2&luicode=10000011&lfid=1076035650743478&featurecode=20000320&id=2309404137016584472707",                        "page_title": "京东",                        "content1": "90后小哥征婚启事",                        "content2": "",                        "icon": "https://h5.sinaimg.cn/upload/2016/12/28/14/feed_headlines_icon_flash20161228_2.png",                        "type": "article"                    },                    "bid": "FfwYadLuD"                },                "reposts_count": 0,                "comments_count": 30,                "attitudes_count": 1,                "isLongText": false,                "visible": {                    "type": 0,                    "list_id": 0                },                "mblogtype": 0,                "raw_text": "就问一句,这样人美心善的90后小哥你们要不要?[舔屏][舔屏]",                "bid": "FfxXIdHGm"            },            "show_type": 0,            "openurl": ""        },        {            "card_type": 9,            "itemid": "1076035650743478_-_4136952959746775",            "scheme": "https://m.weibo.cn/status/FfvjxETA3?mblogid=FfvjxETA3&luicode=10000011&lfid=1076035650743478&featurecode=20000320",            "mblog": {                "created_at": "08-04",                "id": "4136952959746775",                "mid": "4136952959746775",                "idstr": "4136952959746775",                "text": "周五早上上班的你和下班的你<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_xiaoku-7430606cb7.png\" style=\"width:1em;height:1em;\" alt=\"[笑cry]\"></span> ​​​",                "textLength": 33,                "source": "微博 weibo.com",                "favorited": false,                "thumbnail_pic": "http://wx1.sinaimg.cn/thumbnail/006apWvQgy1fi7fkqpatfj30j60j6jsg.jpg",                "bmiddle_pic": "http://wx1.sinaimg.cn/bmiddle/006apWvQgy1fi7fkqpatfj30j60j6jsg.jpg",                "original_pic": "http://wx1.sinaimg.cn/large/006apWvQgy1fi7fkqpatfj30j60j6jsg.jpg",                "user": {                    "id": 5650743478,                    "screen_name": "京东客服",                    "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg",                    "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320",                    "statuses_count": 3245,                    "verified": true,                    "verified_type": 2,                    "verified_type_ext": 0,                    "verified_reason": "北京京东世纪贸易有限公司",                    "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服",                    "gender": "f",                    "mbtype": 2,                    "urank": 29,                    "mbrank": 2,                    "follow_me": false,                    "following": false,                    "followers_count": 18427,                    "follow_count": 235,                    "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"                },                "reposts_count": 0,                "comments_count": 14,                "attitudes_count": 1,                "isLongText": false,                "visible": {                    "type": 0,                    "list_id": 0                },                "mblogtype": 0,                "bid": "FfvjxETA3",                "pics": [                    {                        "pid": "006apWvQgy1fi7fkqpatfj30j60j6jsg",                        "url": "https://wx1.sinaimg.cn/orj360/006apWvQgy1fi7fkqpatfj30j60j6jsg.jpg",                        "size": "orj360",                        "geo": {                            "width": 360,                            "height": 360,                            "croped": false                        },                        "large": {                            "size": "large",                            "url": "https://wx1.sinaimg.cn/large/006apWvQgy1fi7fkqpatfj30j60j6jsg.jpg",                            "geo": {                                "width": "690",                                "height": "690",                                "croped": false                            }                        }                    },                    {                        "pid": "006apWvQgy1fi7fkuj1tvg308c0fkmxy",                        "url": "https://wx1.sinaimg.cn/orj360/006apWvQgy1fi7fkuj1tvg308c0fkmxy.gif",                        "size": "orj360",                        "geo": {                            "width": "300",                            "height": "560",                            "croped": false                        },                        "large": {                            "size": "large",                            "url": "https://wx1.sinaimg.cn/large/006apWvQgy1fi7fkuj1tvg308c0fkmxy.gif",                            "geo": {                                "width": "300",                                "height": "560",                                "croped": false                            }                        }                    }                ]            },            "show_type": 0,            "openurl": ""        },        {            "card_type": 9,            "itemid": "1076035650743478_-_4136663145262324",            "scheme": "https://m.weibo.cn/status/FfnM6m4Yc?mblogid=FfnM6m4Yc&luicode=10000011&lfid=1076035650743478&featurecode=20000320",            "mblog": {                "created_at": "08-03",                "id": "4136663145262324",                "mid": "4136663145262324",                "idstr": "4136663145262324",                "text": "输入法,你们喜欢用哪种?<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/others/d_doge-d903433c82.png\" style=\"width:1em;height:1em;\" alt=\"[doge]\"></span> ​​​",                "textLength": 30,                "source": "微博 weibo.com",                "favorited": false,                "thumbnail_pic": "http://wx4.sinaimg.cn/thumbnail/006apWvQgy1fi6i8tkspqj30ku0i7mz4.jpg",                "bmiddle_pic": "http://wx4.sinaimg.cn/bmiddle/006apWvQgy1fi6i8tkspqj30ku0i7mz4.jpg",                "original_pic": "http://wx4.sinaimg.cn/large/006apWvQgy1fi6i8tkspqj30ku0i7mz4.jpg",                "user": {                    "id": 5650743478,                    "screen_name": "京东客服",                    "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg",                    "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320",                    "statuses_count": 3245,                    "verified": true,                    "verified_type": 2,                    "verified_type_ext": 0,                    "verified_reason": "北京京东世纪贸易有限公司",                    "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服",                    "gender": "f",                    "mbtype": 2,                    "urank": 29,                    "mbrank": 2,                    "follow_me": false,                    "following": false,                    "followers_count": 18427,                    "follow_count": 235,                    "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"                },                "reposts_count": 4,                "comments_count": 40,                "attitudes_count": 6,                "isLongText": false,                "visible": {                    "type": 0,                    "list_id": 0                },                "mblogtype": 0,                "bid": "FfnM6m4Yc",                "pics": [                    {                        "pid": "006apWvQgy1fi6i8tkspqj30ku0i7mz4",                        "url": "https://wx4.sinaimg.cn/orj360/006apWvQgy1fi6i8tkspqj30ku0i7mz4.jpg",                        "size": "orj360",                        "geo": {                            "width": 309,                            "height": 270,                            "croped": false                        },                        "large": {                            "size": "large",                            "url": "https://wx4.sinaimg.cn/large/006apWvQgy1fi6i8tkspqj30ku0i7mz4.jpg",                            "geo": {                                "width": "750",                                "height": "655",                                "croped": false                            }                        }                    },                    {                        "pid": "006apWvQgy1fi6i8z010xj30ku0h6jte",                        "url": "https://wx3.sinaimg.cn/orj360/006apWvQgy1fi6i8z010xj30ku0h6jte.jpg",                        "size": "orj360",                        "geo": {                            "width": 327,                            "height": 270,                            "croped": false                        },                        "large": {                            "size": "large",                            "url": "https://wx3.sinaimg.cn/large/006apWvQgy1fi6i8z010xj30ku0h6jte.jpg",                            "geo": {                                "width": "750",                                "height": "618",                                "croped": false                            }                        }                    },                    {                        "pid": "006apWvQgy1fi6i988w7pj30kt0hbgms",                        "url": "https://wx2.sinaimg.cn/orj360/006apWvQgy1fi6i988w7pj30kt0hbgms.jpg",                        "size": "orj360",                        "geo": {                            "width": 324,                            "height": 270,                            "croped": false                        },                        "large": {                            "size": "large",                            "url": "https://wx2.sinaimg.cn/large/006apWvQgy1fi6i988w7pj30kt0hbgms.jpg",                            "geo": {                                "width": "749",                                "height": "623",                                "croped": false                            }                        }                    },                    {                        "pid": "006apWvQgy1fi6i9bnkgfj30ku0gwgmj",                        "url": "https://wx2.sinaimg.cn/orj360/006apWvQgy1fi6i9bnkgfj30ku0gwgmj.jpg",                        "size": "orj360",                        "geo": {                            "width": 333,                            "height": 270,                            "croped": false                        },                        "large": {                            "size": "large",                            "url": "https://wx2.sinaimg.cn/large/006apWvQgy1fi6i9bnkgfj30ku0gwgmj.jpg",                            "geo": {                                "width": "750",                                "height": "608",                                "croped": false                            }                        }                    }                ]            },            "show_type": 0,            "openurl": ""        },        {            "card_type": 9,            "itemid": "1076035650743478_-_4136613988263792",            "scheme": "https://m.weibo.cn/status/FfmuOyFMY?mblogid=FfmuOyFMY&luicode=10000011&lfid=1076035650743478&featurecode=20000320",            "mblog": {                "created_at": "08-03",                "id": "4136613988263792",                "mid": "4136613988263792",                "idstr": "4136613988263792",                "text": "<a class='k' href='https://m.weibo.cn/k/%E5%BC%A0%E8%8B%A5%E6%98%80%E5%94%90%E8%89%BA%E6%98%95%E5%85%AC%E5%BC%80%E6%81%8B%E6%83%85?from=feed'>#张若昀唐艺昕公开恋情#</a> 恭喜呀<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/others/l_xin-8e9a1a0346.png\" style=\"width:1em;height:1em;\" alt=\"[心]\"></span><span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/others/l_xin-8e9a1a0346.png\" style=\"width:1em;height:1em;\" alt=\"[心]\"></span><span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/others/l_xin-8e9a1a0346.png\" style=\"width:1em;height:1em;\" alt=\"[心]\"></span>,大家就默默干了这碗狗粮吧,狗粮够吃吗?不够吃的话,你(jing)们(dong)懂(you)的(shou)<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/default/d_wabishi-f5765407f7.png\" style=\"width:1em;height:1em;\" alt=\"[挖鼻]\"></span>",                "source": "微博 weibo.com",                "favorited": false,                "user": {                    "id": 5650743478,                    "screen_name": "京东客服",                    "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg",                    "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320",                    "statuses_count": 3245,                    "verified": true,                    "verified_type": 2,                    "verified_type_ext": 0,                    "verified_reason": "北京京东世纪贸易有限公司",                    "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服",                    "gender": "f",                    "mbtype": 2,                    "urank": 29,                    "mbrank": 2,                    "follow_me": false,                    "following": false,                    "followers_count": 18427,                    "follow_count": 235,                    "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"                },                "retweeted_status": {                    "created_at": "08-02",                    "id": "4136423907632073",                    "mid": "4136423907632073",                    "idstr": "4136423907632073",                    "text": "时光赐给我们盗不走的爱人,而你赐给我时光。<a href='https://m.weibo.cn/n/唐艺昕'>@唐艺昕</a> ​​​",                    "textLength": 49,                    "source": "iPhone 6s",                    "favorited": false,                    "thumbnail_pic": "http://wx1.sinaimg.cn/thumbnail/6cf03c75ly1fi5qtg3z8fj20hs0nqq46.jpg",                    "bmiddle_pic": "http://wx1.sinaimg.cn/bmiddle/6cf03c75ly1fi5qtg3z8fj20hs0nqq46.jpg",                    "original_pic": "http://wx1.sinaimg.cn/large/6cf03c75ly1fi5qtg3z8fj20hs0nqq46.jpg",                    "user": {                        "id": 1827683445,                        "screen_name": "张若昀",                        "profile_image_url": "https://tva3.sinaimg.cn/crop.9.0.494.494.180/6cf03c75jw8fajncv51lvj20e80dq74i.jpg",                        "profile_url": "https://m.weibo.cn/u/1827683445?uid=1827683445&luicode=10000011&lfid=1076035650743478&featurecode=20000320",                        "statuses_count": 1199,                        "verified": true,                        "verified_type": 0,                        "verified_type_ext": 1,                        "verified_reason": "演员张若昀",                        "description": "Per Aspera Ad Astra 循此苦旅,以达天际。 工作邮箱:ruoyunwork@126.com",                        "gender": "m",                        "mbtype": 12,                        "urank": 37,                        "mbrank": 6,                        "follow_me": false,                        "following": false,                        "followers_count": 13527839,                        "follow_count": 195,                        "cover_image_phone": "https://tva1.sinaimg.cn/crop.0.0.640.640.640/549d0121tw1egm1kjly3jj20hs0hsq4f.jpg"                    },                    "picStatus": "0:1,1:1",                    "reposts_count": 283896,                    "comments_count": 325438,                    "attitudes_count": 2380726,                    "isLongText": false,                    "visible": {                        "type": 0,                        "list_id": 0                    },                    "cardid": "star_183",                    "bid": "Ffhyew1rX",                    "pics": [                        {                            "pid": "6cf03c75ly1fi5qtg3z8fj20hs0nqq46",                            "url": "https://wx1.sinaimg.cn/orj360/6cf03c75ly1fi5qtg3z8fj20hs0nqq46.jpg",                            "size": "orj360",                            "geo": {                                "width": 360,                                "height": 480,                                "croped": false                            },                            "large": {                                "size": "large",                                "url": "https://wx1.sinaimg.cn/large/6cf03c75ly1fi5qtg3z8fj20hs0nqq46.jpg",                                "geo": {                                    "width": "640",                                    "height": "854",                                    "croped": false                                }                            }                        },                        {                            "pid": "6cf03c75ly1fi5qtfv90rj20c80c6dgs",                            "url": "https://wx1.sinaimg.cn/orj360/6cf03c75ly1fi5qtfv90rj20c80c6dgs.jpg",                            "size": "orj360",                            "geo": {                                "width": 271,                                "height": 270,                                "croped": false                            },                            "large": {                                "size": "large",                                "url": "https://wx1.sinaimg.cn/large/6cf03c75ly1fi5qtfv90rj20c80c6dgs.jpg",                                "geo": {                                    "width": "440",                                    "height": "438",                                    "croped": false                                }                            }                        }                    ]                },                "reposts_count": 3,                "comments_count": 13,                "attitudes_count": 6,                "isLongText": false,                "visible": {                    "type": 0,                    "list_id": 0                },                "mblogtype": 0,                "raw_text": "#张若昀唐艺昕公开恋情# 恭喜呀[心][心][心],大家就默默干了这碗狗粮吧,狗粮够吃吗?不够吃的话,你(jing)们(dong)懂(you)的(shou)[挖鼻]",                "bid": "FfmuOyFMY"            },            "show_type": 0,            "openurl": ""        },        {            "card_type": 9,            "itemid": "1076035650743478_-_4136598981629551",            "scheme": "https://m.weibo.cn/status/Ffm6C6PV5?mblogid=Ffm6C6PV5&luicode=10000011&lfid=1076035650743478&featurecode=20000320",            "mblog": {                "created_at": "08-03",                "id": "4136598981629551",                "mid": "4136598981629551",                "idstr": "4136598981629551",                "text": "仿佛看到了自己<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/others/d_erha-0d2bea3a7d.png\" style=\"width:1em;height:1em;\" alt=\"[二哈]\"></span>",                "source": "微博 weibo.com",                "favorited": false,                "user": {                    "id": 5650743478,                    "screen_name": "京东客服",                    "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg",                    "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320",                    "statuses_count": 3245,                    "verified": true,                    "verified_type": 2,                    "verified_type_ext": 0,                    "verified_reason": "北京京东世纪贸易有限公司",                    "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服",                    "gender": "f",                    "mbtype": 2,                    "urank": 29,                    "mbrank": 2,                    "follow_me": false,                    "following": false,                    "followers_count": 18427,                    "follow_count": 235,                    "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"                },                "retweeted_status": {                    "created_at": "08-02",                    "id": "4136434165892638",                    "mid": "4136434165892638",                    "idstr": "4136434165892638",                    "text": "我在张若昀和唐艺昕公开恋情的微博里看到了你唉~~<span class=\"url-icon\"><img src=\"//h5.sinaimg.cn/m/emoticon/icon/others/d_doge-d903433c82.png\" style=\"width:1em;height:1em;\" alt=\"[doge]\"></span> ​​​",                    "textLength": 54,                    "source": "",                    "favorited": false,                    "thumbnail_pic": "http://wx3.sinaimg.cn/thumbnail/bb97de37ly1fi5s0g76jrj20yi0p1n0m.jpg",                    "bmiddle_pic": "http://wx3.sinaimg.cn/bmiddle/bb97de37ly1fi5s0g76jrj20yi0p1n0m.jpg",                    "original_pic": "http://wx3.sinaimg.cn/large/bb97de37ly1fi5s0g76jrj20yi0p1n0m.jpg",                    "user": {                        "id": 3147292215,                        "screen_name": "草图君",                        "profile_image_url": "https://tva4.sinaimg.cn/crop.0.0.511.511.180/bb97de37jw8f57ewfuqt9j20e70e8q37.jpg",                        "profile_url": "https://m.weibo.cn/u/3147292215?uid=3147292215&luicode=10000011&lfid=1076035650743478&featurecode=20000320",                        "statuses_count": 5980,                        "verified": true,                        "verified_type": 0,                        "verified_type_ext": 1,                        "verified_reason": "直播红人 微博知名综艺博主",                        "description": "一个得罪了半个娱乐圈的少年",                        "gender": "m",                        "mbtype": 12,                        "urank": 44,                        "mbrank": 6,                        "follow_me": false,                        "following": false,                        "followers_count": 6192418,                        "follow_count": 433,                        "cover_image_phone": "https://tva2.sinaimg.cn/crop.0.0.640.640.640/bb97de37jw1ewysfmiioyj20yi0ykqe7.jpg"                    },                    "picStatus": "0:1,1:1,2:1,3:1",                    "reposts_count": 3832,                    "comments_count": 7349,                    "attitudes_count": 65785,                    "isLongText": false,                    "visible": {                        "type": 0,                        "list_id": 0                    },                    "bid": "FfhOMoIWy",                    "pics": [                        {                            "pid": "bb97de37ly1fi5s0g76jrj20yi0p1n0m",                            "url": "https://wx3.sinaimg.cn/orj360/bb97de37ly1fi5s0g76jrj20yi0p1n0m.jpg",                            "size": "orj360",                            "geo": {                                "width": 372,                                "height": 270,                                "croped": false                            },                            "large": {                                "size": "large",                                "url": "https://wx3.sinaimg.cn/large/bb97de37ly1fi5s0g76jrj20yi0p1n0m.jpg",                                "geo": {                                    "width": "1242",                                    "height": "901",                                    "croped": false                                }                            }                        },                        {                            "pid": "bb97de37ly1fi5s0goz0nj20hs0nq0tw",                            "url": "https://wx4.sinaimg.cn/orj360/bb97de37ly1fi5s0goz0nj20hs0nq0tw.jpg",                            "size": "orj360",                            "geo": {                                "width": 360,                                "height": 480,                                "croped": false                            },                            "large": {                                "size": "large",                                "url": "https://wx4.sinaimg.cn/large/bb97de37ly1fi5s0goz0nj20hs0nq0tw.jpg",                                "geo": {                                    "width": "640",                                    "height": "854",                                    "croped": false                                }                            }                        },                        {                            "pid": "bb97de37ly1fi5s0h69g3j20c80c7juk",                            "url": "https://wx1.sinaimg.cn/orj360/bb97de37ly1fi5s0h69g3j20c80c7juk.jpg",                            "size": "orj360",                            "geo": {                                "width": 270,                                "height": 270,                                "croped": false                            },                            "large": {                                "size": "large",                                "url": "https://wx1.sinaimg.cn/large/bb97de37ly1fi5s0h69g3j20c80c7juk.jpg",                                "geo": {                                    "width": "440",                                    "height": "439",                                    "croped": false                                }                            }                        },                        {                            "pid": "bb97de37ly1fi5s0fg68mj202g02g3yo",                            "url": "https://wx1.sinaimg.cn/orj360/bb97de37ly1fi5s0fg68mj202g02g3yo.jpg",                            "size": "orj360",                            "geo": {                                "width": "88",                                "height": "88",                                "croped": false                            },                            "large": {                                "size": "large",                                "url": "https://wx1.sinaimg.cn/large/bb97de37ly1fi5s0fg68mj202g02g3yo.jpg",                                "geo": {                                    "width": "88",                                    "height": "88",                                    "croped": false                                }                            }                        }                    ]                },                "reposts_count": 2,                "comments_count": 21,                "attitudes_count": 7,                "isLongText": false,                "visible": {                    "type": 0,                    "list_id": 0                },                "mblogtype": 0,                "raw_text": "仿佛看到了自己[二哈]",                "bid": "Ffm6C6PV5"            },            "show_type": 0,            "openurl": ""        },        {            "card_type": 11,            "show_type": 0,            "card_group": [],            "openurl": ""        },        {            "card_type": 9,            "itemid": "1076035650743478_-_4136407577953610",            "scheme": "https://m.weibo.cn/status/Ffh7Txn62?mblogid=Ffh7Txn62&luicode=10000011&lfid=1076035650743478&featurecode=20000320",            "mblog": {                "created_at": "08-02",                "id": "4136407577953610",                "mid": "4136407577953610",                "idstr": "4136407577953610",                "text": "<a class='k' href='https://m.weibo.cn/k/%E4%B8%80%E4%B8%AA%E6%84%9F%E4%BA%BA%E7%9A%84%E6%95%85%E4%BA%8B?from=feed'>#一个感人的故事#</a>去年暑假,8岁的小明特意坐了三个多小时车去奶奶家;奶奶为了小明也愿意去县城的超市买小明爱的薯片和巧克力等零食,但是奶奶家没有WiFi和智能手机,奶奶可以陪他一起看古装电视剧;讲他最爱听的神话故事,唱小曲哄他睡觉……奶奶家有吃不完的零食,也不会&quot;太无聊了&quot;<br/>今年,奶奶提前做 ​​​...<a href=\"/status/4136407577953610\">全文</a>",                "textLength": 393,                "source": "微博 weibo.com",                "favorited": false,                "user": {                    "id": 5650743478,                    "screen_name": "京东客服",                    "profile_image_url": "https://tva4.sinaimg.cn/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg",                    "profile_url": "https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320",                    "statuses_count": 3245,                    "verified": true,                    "verified_type": 2,                    "verified_type_ext": 0,                    "verified_reason": "北京京东世纪贸易有限公司",                    "description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服",                    "gender": "f",                    "mbtype": 2,                    "urank": 29,                    "mbrank": 2,                    "follow_me": false,                    "following": false,                    "followers_count": 18427,                    "follow_count": 235,                    "cover_image_phone": "https://tva4.sinaimg.cn/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"                },                "reposts_count": 6,                "comments_count": 17,                "attitudes_count": 2,                "isLongText": true,                "visible": {                    "type": 0,                    "list_id": 0                },                "mblogtype": 0,                "page_info": {                    "page_pic": {                        "url": "https://ww3.sinaimg.cn/thumb180/74f67c55jw9ey0hrixq57j2050050t92.jpg"                    },                    "page_url": "https://m.weibo.cn/p/index?containerid=100808f50fb5741ffd610570b92baf2cc3b342&extparam=%E4%B8%80%E4%B8%AA%E6%84%9F%E4%BA%BA%E7%9A%84%E6%95%85%E4%BA%8B&luicode=10000011&lfid=1076035650743478&featurecode=20000320",                    "page_title": "#一个感人的故事#",                    "content1": "",                    "content2": "3人关注",                    "type": "topic"                },                "bid": "Ffh7Txn62"            },            "show_type": 0,            "openurl": ""        }    ],    "ok": 1,    "showAppTips": 0,    "scheme": "sinaweibo://cardlist?containerid=1076035650743478&luicode=10000011&lfid=100103type=1&q=京东客服&featurecode=20000320"}

上面只是一个页面的说说,估计写前端移动端的要晕死,好恶心,要是返回个null或者空回来。。

上面代码可以直接在jsonview里面进行格式化,

这里写图片描述

爬取的字段是:cards 下面的mblog下面的:text ,idstr(拼接评论页的)

评论条目:https://m.weibo.cn/api/comments/show?id=4137390568546147&page=2

这里的id就是idstr

详情页就是上面评论条目的json串,搞下来也是一大把,跟上面的差不多,详情页里面的数据跟评论页的数据差不多,这里就不再继续多些了,因为上面的内容已经占用的差不多了

因为微博的封IP地址的原因,所以第一次爬取了4w多数据,就GG了,第二天晚上睡眠30秒,爬取一条,发现,毛用也没有,只好是接着爬,ip不封了之后换了cookie,换了starturl,换了page索引继续爬取,也睡眠了10秒,反正睡多了也没用,最后爬取的垃圾数据有22万左右吧,去掉去重不要的估计也就4000不知道有没有,反正也没数。

附上几张爬虫过程中的图片截图:

这里写图片描述

这里写图片描述

这里写图片描述

最后是微博数据的结果图片:

这里写图片描述

这里的代码上传到github上了,有需要的话可以自己去下载,另外写了一份类似于 爬取新浪微博京东客服 @京东客服的简单爬虫。

发一下牢骚,json串又多又大又不稳定,返回不一致

贴上部分代码:

# encoding=utf8import requestsimport jsonimport reimport timestartUrl = 'https://m.weibo.cn/api/container/getIndex?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320&type=uid&value=5650743478&containerid=1076035650743478'headers = {    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0'    ,    'Cookie': 'ALF=1504709445; SCF=Ag0epa_4tyFCglnCwHJiaRDznUy645wpqEhg-dG3Sv0cbfGX1wNmqXPnHQroard1FW2nn3RdCnmux4VZ7bFRuMo.; SUHB=0ebt4qVvtKU1d7; _T_WM=22bb4d80315608a0e9bd3bf92b3c1dac; SUB=_2A250jA4VDeRhGeBN6FsT8i7MyTyIHXVXjpJdrDV6PUJbktBeLXjBkW1oTOqmqg0rff3UmekP4TzhMFYtsw..; SUBP=0033WrSXqPxfM725Ws9jqgMF55529P9D9WFNrBkhSeVrfPGckwnaFCcy5JpX5o2p5NHD95Qce0e4eoz7ehz7Ws4DqcjBIcHVdr.peoepeoefeK5Ee5tt; M_WEIBOCN_PARAMS=luicode%3D10000011%26lfid%3D100103type%253D1%2526q%253D%2540%25E4%25BA%25AC%25E4%25B8%259C%25E5%25AE%25A2%25E6%259C%258D%26featurecode%3D20000320%26fid%3D1076035650743478%26uicode%3D10000011'    ,    'Host':'m.weibo.cn'    ,'Accept':'application/json, text/plain, */*',    'Accept-Language':'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',    'Accept-Encoding':'gzip, deflate, br',    'X-Requested-With':'XMLHttpRequest',    'Referer':'https://m.weibo.cn/u/5650743478?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%40%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320',}# 详情页listdetaiList = []# 说说textList = []# 说说跟详情页textAnddetailList = []# 评论数,详情页返回的是每一页10个commentsList = []numSizeList  = []detaiLinks = []def getJsonData(url):    req = requests.get(url, headers=headers)    # print(req.text)    return req.textjsonData = getJsonData(startUrl)def parseDetailListdata(listdata):    for detailData in listdata:        text = detailData['text'] if 'text' in detailData else ""        reply_text = detailData['reply_text'] if 'reply_text' in detailData else ""        f.write(text+'\r\n')        print(text)        print(reply_text)        f.write(reply_text + '\r\n')    # passdef parseJsonData(jsonData):    global pagedetail    jsondata = json.loads(jsonData, 'utf-8')    print(jsondata)    listdata = jsondata['cards']if 'cards' in jsondata else ""    print(listdata)    for datainfo in listdata:        # print(datainfo)        mblog = datainfo['mblog'] if 'mblog' in datainfo else ""        # print(mblog)        if len(mblog)> 0 :  # 有数据,继续执行            descText = mblog['text']            # print(descText)            descText = getTextInfo(descText)            dex = '发表的说说开始:\r\n'            f.write(dex)            dex2 = '发表的说说内容:'+descText+'\r\n'            f.write(dex2)            print("发表的说说开始:")            print('发表的说说内容:'+descText)            textList.append(descText)            comments = mblog['comments_count']  # 评论数            numSizeList.append(comments)            # print(comments)            # if comments > 1:  # 有评论,获取到评论链接上的数据            #     detailLine = datainfo['scheme']            #     print(detailLine)            #     detaiList.append(detailLine)            idstr = mblog['idstr']            detaiLinks = getpageSize(comments,idstr)            pagedetail = 1            for detaillink in detaiLinks:                jsonData2 = getJsonData(detaillink)                str11 = '评论详情页条目:'+str(pagedetail)+'      .......\r\n'                f.write(str11)                print('评论详情页条目:'+str(pagedetail)+'      .......')                print(jsonData2)                pagedetail = pagedetail +1                jsonDatadetail = json.loads(jsonData2, 'utf-8')                listdata = jsonDatadetail['data'] if 'data' in jsonDatadetail else ''                # print(listdata)                parseDetailListdata(listdata)            pagedetail = 1            print('主页条目结束...')            f.write('主页条目结束...\r\n')            # detailJsonStr = 'https://m.weibo.cn/api/comments/show?id=' + str(idstr) + '&page=' + str(comments)            # print(detailJsonStr)            # commentsList.append(detailJsonStr)        else:            # 在里面的话,直接跳出方法            return    print('爬取结束......')def getTextInfo(textStr):    # 得到文本内容    # for textStr in textList:    # print('***********')    regx = '<span(.*?)</span>'    strregx = re.compile(regx)    strregx = re.findall(strregx, str(textStr))    replacestr = str(textStr).replace('<span' + ''.join(strregx) + '</span>', '')    str1 = '<span'    sstr1 = str(textStr)[0:str(textStr).find(str1)]    # print(sstr1)    return sstr1        # print(textStr)        # print(replacestr)# 得到文本详情页链接def getpageSize(comments,idstr):    for i in range(1,int((comments / 10))+2):        # 评论也的link        detaiLink = 'https://m.weibo.cn/api/comments/show?id=' + str(idstr) + '&page=' +str(i)        detaiLinks.append(detaiLink)        # print(detaiLink)        return detaiLinks# parseJsonData(jsonData)# print(str(textList))  page = 7# print(str(detaiList))f = open('微博京东说说跟评论.txt', 'a',encoding='utf-8')def main_start():    for inde in range(11,50):        # startUrl = 'https://m.weibo.cn/api/container/getIndex?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320&type=uid&value=5650743478&containerid=1005055650743478&page='+str(inde)        startUrl = 'https://m.weibo.cn/api/container/getIndex?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D@%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320&type=uid&value=5650743478&containerid=1076035650743478&page={}'+str(inde)        pageindex = '页数:'+str(inde)+'\r\n'        print('startUrl   '+'index '+str(inde)+'     '+startUrl)        f.write(pageindex)        data = getJsonData(startUrl)        parseJsonData(data)        time.sleep(2)    f.close()main_start()

现在暂时可以借用这份代码,里面的url跟cookie换一下,用自己的账号就可以。另外爬虫要学会用fiddler等类似的抓包工具,感觉确实是抓包利器。

公司996啊, 加上自己的能力有限,确实现在学习也就到这深度了 以后要多了解一下cookie池,代理池之类似的东西。

github地址:
https://github.com/643435675/PyStudy

end

阅读全文
0 0
原创粉丝点击