Python实战计划学习作业1-2

来源:互联网 发布:淘宝网套装女装 编辑:程序博客网 时间:2024/05/23 01:12

代码如下

from bs4 import BeautifulSouphtml_path = "/Users/reed/Documents/dev/Plan-for-combating/week1/1_2/1_2answer_of_homework/index.html"with open(html_path, 'r') as wb_data:    soup = BeautifulSoup(wb_data, "lxml")    images = soup.select("body > div > div > div.col-md-9 > div > div > div > img")    names = soup.select("body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a")    prices = soup.select("body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right")    reviews = soup.select("body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right")    stars_block = soup.find_all("div", class_="ratings")for name, image, price, review, star_block in zip(names, images, prices, reviews, stars_block):    stars_num = len(star_block.find_all("span", class_="glyphicon glyphicon-star"))    print(name.get_text(), "\n    ", image.get('src'), "\n    ", price.get_text(), "\n    ", review.get_text(),          "\n    ", stars_num, " stars\n")

输出结果

/Library/Frameworks/Python.framework/Versions/3.5/bin/python3.5 /Users/reed/PycharmProjects/web01/web_parse2.pyEarPod      img/pic_0000_073a9256d9624c92a05dc680fc28865f.jpg      $24.99      65 reviews      5  starsNew Pocket      img/pic_0005_828148335519990171_c234285520ff.jpg      $64.99      12 reviews      4  starsNew sunglasses      img/pic_0006_949802399717918904_339a16e02268.jpg      $74.99      31 reviews      4  starsArt Cup      img/pic_0008_975641865984412951_ade7a767cfc8.jpg      $84.99      6 reviews      3  starsiphone gamepad      img/pic_0001_160243060888837960_1c3bcd26f5fe.jpg      $94.99      18 reviews      4  starsBest Bed      img/pic_0002_556261037783915561_bf22b24b9e4e.jpg      $214.5      18 reviews      4  starsiWatch      img/pic_0011_1032030741401174813_4e43d182fce7.jpg      $500      35 reviews      4  starsPark tickets      img/pic_0010_1027323963916688311_09cc2d7648d9.jpg      $15.5      8 reviews      4  starsProcess finished with exit code 0

总结,学习BeautifulSoup里的find和find_all函数,非常好用,再使用find_all后获取一个特定区块的html代码后,可以使用for in循环再次进入子块进行find_all查找。

0 0