Python 爬虫简单实战之CSDN

来源:互联网 发布:java类的主方法是啥 编辑:程序博客网 时间:2024/06/07 07:59

此文仅是分享 <(o゜▽゜)o☆[BINGO!]

  • 代码实现很简单,即用python爬虫不断请求文章页面即可.
  • 主要用到requests库即可
  • 别太过分了:-O

示例代码:

# -*- coding: utf-8 -*-# @Author   : Sdite# @DateTime : 2017-07-16 14:17:22import requestsfrom bs4 import BeautifulSoupimport reimport timeheaders = {    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36',}# 准备阶段,获取博客内文章的链接,存放在变量url中url = "http://blog.csdn.net/vonsdite"res = requests.get(url=url, headers=headers)part = re.compile(r'<span class="link_title"><a href="(/vonsdite/article/details/.+?)"')url = part.findall(res.text)url = ['http://blog.csdn.net/' + tmp for tmp in url]# 刷阅读量阶段while True:    for u in url:        res = requests.get(url=u, headers=headers)        text = res.text        soup = BeautifulSoup(text, 'lxml')        rank = soup.select('#blog_rank')        part = re.compile(r'<li>(访问:)<span>(\d+次)</span></li>')        rank = part.findall(str(rank[0]))        rank = rank[0][0] + rank[0][1]        print('博客: ' + rank)    time.sleep(2)