Python爬虫1----房源信息

来源:互联网 发布:电子地图数据库 编辑:程序博客网 时间:2024/04/26 05:17

任务描述

爬取300个房源信息,每页具体信息如下

具体信息


Python代码

#-*- coding: UTF-8 -*-# 20170217:work wellfrom bs4 import BeautifulSoupimport requests# 形成小猪主页上前10页的网址urls = ['http://bj.xiaozhu.com/search-duanzufang-p{}-0/'.format(str(i)) for i in range(1, 11, 1)]# 性别不同,标签的class属性内容不同,通过这个差异区分房东性别def get_lorder_sex(class_name):    if class_name == ['member_ico']:        return '男'    elif class_name == ['member_ico1']:        return '女'#对每一页上的具体信息进行解析def get_attar(url):    web_data = requests.get(url)    soup = BeautifulSoup(web_data.text, 'lxml')    titles = soup.select('div.pho_info > h4 > em')    locations = soup.select('div.pho_info > p > span')    prices = soup.select('div.day_l > span')    images = soup.select('div.pho_show_big > div > img')    lorder_names = soup.select('div.w_240 > h6 > a')    lorder_images = soup.select('div.member_pic > a > img')    lorder_genders = soup.select('div.member_pic > div')    for title, location, price, image, lorder_name, lorder_image, gender in zip(titles, locations, prices, images, lorder_names, lorder_images, lorder_genders):        data = {            'title': title.get_text(),            'location': location.get_text(),            'price': price.get_text(),            'image': image.get('src'),            'lorder_name': lorder_name.get_text(),            'lorder_image': lorder_image.get('src'),            "gender": get_lorder_sex(gender.get("class"))        }        print data#10个主页中,每一个主页又有很多小页:上面记录了待租房的具体信息for url in urls:    web_data = requests.get(url)    soup = BeautifulSoup(web_data.text, 'lxml')    url_links = soup.select('a.resule_img_a')    for url_link in url_links:        get_attar(url_link.get('href'))

结果展示

这里仅截取其中两个房源信息

这里写图片描述


不足之处

Pycharm的控制台中,对中文汉字,只能显示其字符编码,未能显示中文

0 0
原创粉丝点击