用urllib和BeautifulSoup获取维基百科词条信息

来源：互联网发布：淘宝网上的女装编辑：程序博客网时间：2024/06/06 03:07

notes:由图可以看到Request Method是GET，使用postman注意方法选择。

简单的例子：用urllib和BeautifulSoup获取维基百科词条信息

from urllib.request import urlopenfrom bs4 import BeautifulSoupimport re#请求url并把结果用utf-8编码resp=urlopen("https://en.wikipedia.org/wiki/Main_Page").read().decode("utf-8")#使用BeautifulSoup去解析soup=BeautifulSoup(resp,"html.parser")#获取所有易/wiki/开头的a标签的href属性listUrls=soup.findAll("a",href=re.compile("^/wiki"))#输出所有的词条名称和urlfor url in listUrls:    #过滤以.jpg或者.JPG结尾的url    if not re.search("\.(jpg|JPG)$",url["href"]):        # print(url["href"])        #string只能获取一个 get_text()获取标签下所有文字        print(url.get_text(),"<---->","https://en.wikipedia.org"+url["href"])

阅读全文

0 0