Python网络数据采集5(译者：哈雷)

来源：互联网发布：linux系统调用过程编辑：程序博客网时间：2024/04/27 18:09

第三章开始爬取数据
以wiki中kevin Bacon 页面作为爬取对象，然后选取其中指向特定的网页再次爬取。示例如下
[python]代码片

from urllib.request import urlopen  from bs4 import BeautifulSoup  import datetime  import random  import re  random.seed(datetime.datetime.now())#随机种子  def getLinks(articleUrl):      html = urlopen("http://en.wikipedia.org"+articleUrl)      bsObj = BeautifulSoup(html)      return bsObj.find("div", {"id":"bodyContent"}).findAll("a",href=re.compile("^(/wiki/)((?!:).)*$"))  links = getLinks("/wiki/Kevin_Bacon")  while len(links) > 0:      newArticle = links[random.randint(0, len(links)-1)].attrs["href"]#随机选取一个网页爬取      print(newArticle)      links = getLinks(newArticle)

     利用scrapy爬取数据的方法暂时先不介绍。

0 0

Python网络数据采集5(译者：哈雷)
Python网络数据采集1(译者：哈雷)
Python网络数据采集4(译者：哈雷)
Python网络数据采集6(译者：哈雷)
Python网络数据采集7(译者：哈雷)
Python网络数据采集8(译者：哈雷)
Python网络数据采集9(译者：哈雷)
Python网络数据采集10(译者：哈雷)
Python网络数据采集11(译者：哈雷)
python网络数据采集2(译者：哈雷）
python网络数据采集3(译者：哈雷）
python网络数据采集
Python网络数据采集
Python网络数据采集
Python网络数据采集
Python网络数据采集
Python网络数据采集
Python网络数据采集
linux命令笔记
设计模式之行为型模式
Android异常--bitmapUtils加载图片不显示
mybatis的xml没有提示
phpmailer使用美橙互联企邮发送邮件
Python网络数据采集5(译者：哈雷)
使用sqlite3打开.db3的SQLite文件
【js实例】js发送验证码后倒计时60秒
遍历磁盘
有哪些一般人不知道的数据获取方式
数组逆序
SDUT oj 二叉排序树
Linux内核设计与实现（三） linux进程管理之进程创建-2
99. Recover Binary Search Tree