爬取wikipedia词条

来源：互联网发布：mac格式化u盘编辑：程序博客网时间：2024/05/29 23:24

#导入Beautifulsoup包from bs4 import BeautifulSoup as bsfrom urllib.request import urlopenimport re# 请求URL并把结果用utf-8编码resp=urlopen("https://en.wikipedia.org/wiki/Main_page").read().decode("utf-8")# 使用BeautifulSoup去解析soup=bs(resp,"html.parser")# 获取所有以/wiki开头的a 标签的href属性listUrls=soup.findAll("a",href=re.compile("^/wiki/"))# 打印出urlfor url in listUrls:    # print(url) #打印出来是整条a标签    if not re.search("\.(jpg|JPG)$",url["href"]): #上面取的有包含.jpg的图片，故要在href属性中排除        #将url的名字+"https://en.wikipedia.org"+url中的href属性合并打印出来        print(url.get_text(),"<---->","https://en.wikipedia.org"+url["href"])

0 0

爬取wikipedia词条
Wikipedia词条翻译：Python
Wikipedia词条翻译:Martin Fowler
Python+MongoDB 爬取百度词条
Python 爬取百度词条Python Demo
Python爬虫，爬取百度百科词条
Python爬虫爬取百度百科词条
python爬取百度百科词条内容
java爬取百度百科词条
爬取百度词条内链接
按条件爬取百度百科词条及其相关词条的ID
Java爬虫爬取python百度百科词条及相关词条页面
简单的python爬虫（爬取百度百科词条）
Python3爬取百科词条+导入MySQL数据库
Python 爬虫的实践运用(1)--爬取百度百科的词条
Python网络爬虫（三）：连续爬取百度百科词条数据
Python开发爬虫爬取百度百科词条信息(源码下载)
利用scrapy框架爬取互动百科的词条--存成json
react-native 性能优化，处理卡顿
MySQL之aborted connections和aborted clients
台湾清华大学彭明辉教授的研究生手册
windows使用SQLite
python 爬取网页中的图片到本地
爬取wikipedia词条
9、matplotlib 基础入门
两数组的交
【LeetCode】357. Count Numbers with Unique Digits
数据库基础手札
基于itop4412的EC20在Android4.4的PPP拨号联网
BJ模拟 Cut (最小割树+最小生成树)
常见的排序算法
物联网操作系统微软 Windows IoT Core 与华为 LiteOS 对比