Python写的网络爬虫程序（很简单）

来源：互联网发布：linux awk脚本编辑：程序博客网时间：2024/05/12 08:01

Python写的网络爬虫程序（很简单）

这是我的一位同学传给我的一个小的网页爬虫程序，觉得挺有意思的，和大家分享一下。不过有一点需要注意，要用python2.3，如果用python3.4会有些问题出现。

python程序如下：

import re,urllibstrTxt=""x=1ff=open("wangzhi.txt","r")for line in ff.readlines():f=open(str(x)+".txt","w+")print linen=re.findall(r"<p>(.*?)<\/p>",urllib.urlopen(line).read(),re.M)for i in n:if len(i)!=0:i=i.replace(" ","")i= i.replace("<strong>","")                        i = i.replace("</strong>","")                        strTxt = strTxt + i                        strTxt = re.sub(r"<a href=(.*?)>", r"", strTxt)                        strTxt=re.sub(r"<a(.*?)>",r"",strTxt)                        strTxt=re.sub(r"<span>(.*?)</span>",r"", strTxt)                        strTxt = re.sub(r"<\/[Aa]>", r"", strTxt)                #print strTxt                f.write(strTxt)                strTxt=""        f.close        x=x+1ff.close()</span>

wangzhi.txt的内容如下：

http://sports.163.com/14/1126/22/AC0TVK4E00052UUC.html
http://sports.163.com/14/1126/22/AC0TGD4700052UUC.html
http://sports.163.com/14/1126/22/AC0TAHNK00052UUC.html

结果分析：

运行程序，有3个输出文件，分别是3个URL地址对应的网页的内容。

0 0