Python爬虫小记（一）

来源：互联网发布：文言虚词乎的用法编辑：程序博客网时间：2024/06/05 00:59

刚开始接触处python爬虫，为了让这件事情一直持续下去，因此用博客的方式记录下自己学习过程当中的点滴。该博客一般直接只贴代码，除非遇到特别棘手的问题。使用python 3.0 版本和PyCharm开发工具。希望自己能够坚持下去，积累的越来越多。
以下是第一个代码片段，实现爬取html中的标签内容更，很简单。其中的网址来自于《Python网络数据采集》。

from urllib.request import urlopenfrom urllib.error import HTTPErrorfrom bs4 import BeautifulSouptry:    html=urlopen("http://www.pythonscraping.com/pages/page1.html")except HTTPError as e:    print (e)else:    if html is None:        print("URL is not found")    else:        bsObj = BeautifulSoup(html.read(),"lxml")        try:            content = bsObj.find("body").div        except AttributeError as e:            print("Tag was not found")        else:            if content is None:                print("badContent is None")            else:                print(content)

阅读全文

0 0