程序博客网 > 网络互动活动

04 BeautifulSoup小实例

来源：互联网发布：网络互动活动编辑：程序博客网时间：2024/04/28 20:40

”’
Created on 2017年6月6日

@author: v_huxiaoting
”’
import re
from bs4 import BeautifulSoup
html_doc = “””
The Dormouse’s story

The Dormouse’s story

Once upon a time there were three little sisters; and their names wereElsie,Lacie andTillie;and they lived at the bottom of a well.

…

“””
soup = BeautifulSoup(
html_doc,
“html.parser”
)
print(“获取所有的链接”)

links = soup.find_all(‘a’)

for link in links:
print(link.name,link[‘href’],link.get_text())

print(“获取Lacie的链接”)

link_node = soup.find(‘a’,href=”http://example.com/lacie”)
print(link_node.name,link_node[‘href’],link_node.get_text())

到这里，因为看的是Python2.x的教程，所以，总是出现各种坑。这个坑踩的有点久，是因为之前按照Python2的写法

soup = BeautifulSoup(

html_doc,

“html.parser”,

from_encoding=”utf-8”

)

一直以为是别的地方出问题，结果是因为这个字符编码。

原因是因为：Python3 缺省的编码是unicode, 再在from_encoding设置为utf8, 会被忽视掉，导致报错，而影响其他的代码。

使用正则进行匹配

print(‘正则匹配’)
link_node1 = soup.find(‘a’,href=re.compile(r”ill”))

re.compile() 正则匹配

r表示不进行转义

print(link_node1.name,link_node[‘href’],link_node1.get_text())

获取p段落文字

print(“获取p段落文字”)
p_node = soup.find(‘p’,class_=’title’)
print(p_node.name,p_node.get_text())

阅读全文

0 0

网络互动活动

网络互动活动

原创粉丝点击

热门问题 老师的惩罚人脸识别我在镇武司摸鱼那些年重生之率土为王我在大康的咸鱼生活盘龙之生命进化天生仙种凡人之先天五行春回大明朝姑娘不必设防，我是瞎子胃经乱淑女乱论乱也性也乱就乱他个天翻地覆乱形文化石小路乱撞乱也性也公男乱女欲男乱女涶男乱女色男乱女乱的图片机动车乱停乱放怎么处罚伦乱图片则乱大谋生理期乱了怎么办阿庆乱史禁乱之爱乱字图片危邦不入乱邦不居英男乱女性乱图片论乱图片乱华影乱后宫之真龙太监乱论影院媓男乱女亂经典短篇乱文900篇全集下载乱世佳人乱世惊情乱世家族乱世桃花乱世铜炉乱世婚宠乱世宏图乱世为王乱世遇佳人乱世巨星乱世激流乱世何时了