Python3之利用requests和BeautifulSoup抓取部分信息

来源:互联网 发布:windows中安装ipython 编辑:程序博客网 时间:2024/06/05 09:09
import requestsimport osfrom bs4 import BeautifulSoupimgPath = r'D:\Users\Quincy_C\PycharmProjects\S6\bs模块\汽车图片'response = requests.get(url='http://www.autohome.com.cn/news/')response.encoding = response.apparent_encodingbs = BeautifulSoup(response.text, features='html.parser')bs_obj = bs.find(id="auto-channel-lazyload-article")li_list = bs_obj.find_all('li')for i in li_list:    a = i.find('a')    if a:        txt = a.find('h3').text        print(a.find('img').attrs.get('src'))        # requests.get('url').content返回的是字节        imgContent = requests.get(a.find('img').attrs.get('src')).content        import uuid        if not os.path.isdir(imgPath):            os.mkdir(imgPath)        else:            imgUrl = str(uuid.uuid4()) + '.jpg'            with open(os.path.join(imgPath, imgUrl), 'wb') as f:                f.write(imgContent)
如果要讲图片存放在指定的文件夹,可以这样:
            with open(os.path.join(imgPath, imgUrl), 'wb') as f:                f.write(imgContent)

或者:

os.chdir(imgPath)

都可以的,之前搞过,忘记了。记录一下!
总结一下:

requests

requests.get(‘url’,headers=headers)发送一个请求
response.encoding = response.apparent_encoding指定编码
requests.get(‘url’).text获取网页内容
requests.get(‘url’).content获取图片的字节

BeautifulSoup

bs = BeautifulSoup(requests.get(‘url’).text,features=’html.parser’)
bs.find(‘div’,id=”)
bs.find_all(‘div’,id=”)
bs.find_all(‘div’,class=”)
a.attrs获取一个字典
a.ttrs.get(”)获取具体的内容