So cute are you Python 12

来源:互联网 发布:奇偶排序并行算法 编辑:程序博客网 时间:2024/06/05 15:19

 1.BeautifulSoup 的安装过程:

**1.1 下载 BeautifulSoup  下载地址(点击下载)

**1.2 安装

安装的时候就是:

           python setup.py build

           python setup.py install

引入包要用:

        import bs4

        from bs4 import BeautifulSoup

2.BeautifulSoup

#!/usr/bin/evn python#coding:utf-8#FileName:re_learn01.py#Function:show first time to use beautifulSoup#History:25-10-2013import bs4from bs4 import BeautifulSoup;def bea_Demo():    demoHtml="""<html><body><div class="icon_col">   <h1 class="h1user">Certtt</h1></div></body></html>"""    soup = BeautifulSoup(demoHtml)    print "type(soup)=",type(soup)    print "soup=",soup    h1userSoup = soup.find(name="h1",attrs={"class":"h1user"})    #    print "h1userSoup=",h1userSoup    h1userUnicodeStr = h1userSoup.string    print "h1userUnicodeStr=",h1userUnicodeStrif __name__=='__main__':    bea_Demo()
结果:

# python be_learn01.py type(soup)= <class 'bs4.BeautifulSoup'>soup= <html><body><div class="icon_col"><h1 class="h1user">Certtt</h1></div></body></html>h1userSoup= <h1 class="h1user">Certtt</h1>h1userUnicodeStr= Certtt
2.一个简单的页面的测试:

#!/usr/bin/evn python  #coding:utf-8  #FileName:re_learn01.py  #Function:show first time to use beautifulSoup  #History:25-10-2013  import bs4import urllibfrom bs4 import BeautifulSoupdef bea_Demo():    url='http://home.51cto.com/index.php?s=/space/7743046'    ss=urllib.urlopen(url)    page=ss.read()    soup = BeautifulSoup(page)    print "type(soup)=",type(soup)    h1userSoup=[]    h1userSoup = soup.findAll(name="ul")    #print "soup=",soup    for h in h1userSoup:        res=h.findAll('a')        for r in res:            if r!=None:                #print ''                print "***:",r.string,"::",r,"\n"                    if __name__=='__main__':    bea_Demo()
结果:

$ python bea_learn02.py ***: 家园 :: <a href="http://home.51cto.com" target="_blank">家园</a> ***: 学院 :: <a href="http://edu.51cto.com" target="_blank">学院</a> ***: 博客 :: <a href="http://blog.51cto.com" target="_blank">博客</a> ***: 论坛 :: <a href="http://bbs.51cto.com" target="_blank">论坛</a> ***: 下载 :: <a href="http://down.51cto.com" target="_blank">下载</a> ***: 自测 :: <a href="http://selftest.51cto.com" target="_blank">自测</a> ***: 门诊 :: <a href="http://doctor.51cto.com" target="_blank">门诊</a> ***: 周刊 :: <a href="http://blog.51cto.com/newsletter/" target="_blank">周刊</a> ***: 读书 :: <a href="http://book.51cto.com" target="_blank">读书</a> ***: 技术圈 :: <a href="http://g.51cto.com" target="_blank">技术圈</a> 



原创粉丝点击