python核心编程学习(五)

来源:互联网 发布:合肥百度快照优化 编辑:程序博客网 时间:2024/06/05 15:38

核心编程中一个抓取网页的例子:

'''Created on 2012-3-7@author: Administrator'''#!/usr/bin/env pythonfrom urllib import urlretrievedef firstNonBlank(lines):    for eachLine in lines:        if not eachLine.strip():            continue        else:            return eachLine        def firstLast(webpage):    f=open(webpage)    lines=f.readlines()    f.close()    print firstNonBlank(lines)    lines.reverse()    print firstNonBlank(lines)    def download(url='http://www.baidu.com',process=firstLast):    try:        retval = urlretrieve(url)[0]    except IOError:        retval=None            if retval:        process(retval)if __name__=='__main__':    download()   

运行结果:

pydev debugger: starting<!doctype html><html><head><meta http-equiv="Content-Type" content="text/html;charset=gb2312"><title>�ٶ�һ�£����֪��      </title><style>html{overflow-y:auto}body{font:12px arial;text-align:center;background:#fff}body,p,form,ul,li{margin:0;padding:0;list-style:none}body,form,#fm{position:relative}td{text-align:left}img{border:0}a{color:#00c}a:active{color:#f60}#u{color:#999;padding:4px 10px 5px 0;text-align:right}#u a{margin:0 5px}#u .reg{margin:0}#m{width:680px;margin:0 auto;}#nv a,#nv b,.btn,#lk{font-size:14px}#fm{padding-left:90px;text-align:left}input{border:0;padding:0}#nv{height:19px;font-size:16px;margin:0 0 4px;text-align:left;text-indent:117px}.s_ipt_wr{width:418px;height:30px;display:inline-block;margin-right:5px;background:url(http://www.baidu.com/img/i-1.0.0.png) no-repeat -304px 0;border:1px solid #b6b6b6;border-color:#9a9a9a #cdcdcd #cdcdcd #9a9a9a;vertical-align:top}.s_ipt{width:405px;height:22px;font:16px/22px arial;margin:5px 0 0 7px;background:#fff;outline:none;-webkit-appearance:none}.s_btn{width:95px;height:32px;padding-top:2px\9;font-size:14px;background:#ddd url(http://www.baidu.com/img/i-1.0.0.png);cursor:pointer}.s_btn_h{background-position:-100px 0}.s_btn_wr{width:97px;height:34px;display:inline-block;background:url(http://www.baidu.com/img/i-1.0.0.png) no-repeat -202px 0;*position:relative;z-index:0;vertical-align:top}#lg img{vertical-align:top;margin-bottom:3px}#lk{margin:33px 0}#lk span{font:14px "����"}#lm{height:60px}#lh{margin:16px 0 5px;word-spacing:3px}.tools{position:absolute;top:-4px;*top:10px;right:-13px;}#mHolder{width:62px;position:relative;z-index:296;display:none}#mCon{height:18px;line-height:18px;position:absolute;cursor:pointer;padding:0 18px 0 0;background:url(http://www.baidu.com/img/bg-1.0.0.gif) no-repeat right -134px;background-position:right -136px\9}#mCon span{color:#00c;cursor:default;display:block}#mCon .hw{text-decoration:underline;cursor:pointer}#mMenu a{width:100%;height:100%;display:block;line-height:22px;text-indent:6px;text-decoration:none;filter:none\9}#mMenu,#user ul{box-shadow:1px 1px 2px #ccc;-moz-box-shadow:1px 1px 2px #ccc;-webkit-box-shadow:1px 1px 2px #ccc;filter: progid:DXImageTransform.Microsoft.Shadow(Strength=2, Direction=135, Color="#cccccc")\9;}#mMenu{width:56px;border:1px solid #9b9b9b;list-style:none;position:absolute;right:7px;top:28px;display:none;background:#fff}#mMenu a:hover{background:#ebebeb}#mMenu .ln{height:1px;background:#ebebeb;overflow:hidden;font-size:1px;line-height:1px;margin-top:-1px}#cp,#cp a{color:#77c}#seth{display:none;behavior:url(#default#homepage)}#setf{display:none}</style><!--4c0e8ed06c2bd0cc-->

需要注意:

1、

eachLine.strip():去掉前后空格
2、
urlretrieve(url) 读取到本地:c:\users\admini~1\appdata\local\temp\tmpbw_rzt