Python爬虫从入门到懵逼-1

来源：互联网发布：活动致辞知乎编辑：程序博客网时间：2024/06/03 22:56

封装自己的简易爬虫框架

1.框架封装

#--coding:utf-8--import urllib2class my_crawler:#我的爬虫类  python类定义:结束 如果要继承则在冒号前用括号 可以多继承    def __init__(self,url,path):#Python中的构造方法 self  相当于Java的this        self._url=url        self._path=path    def read_resource(self):#读取资源的方法        raw_url=urllib2.quote(self._url) #考虑到中文问题这里用quote编码        url=urllib2.unquote(raw_url)#unquote解码        return urllib2.urlopen(url,timeout=5).read()#打开网页 设置超时时间 读取内容    def write_resource(self):        try:            f=open(self._path,'wb')#open打开文件  wb二进制写入            f.write(self.read_resource()) #写文件            f.close()#关闭文件            print (self._url+"信息成功爬取并写入"+self._path)        except Exception as e:#尝试捕获异常            print ("出现异常"+e)

2.框架使用

#--coding:utf-8--from my_crawler_framework import my_crawler#从my_crawler_framework模块中引入my_crawler类url="http://www.baidu.com/s?wd=肥猫下楼吃面包"#定义需爬取的网址path="d:/img/crawlers/zp.html"#文件写入路径framework=my_crawler(url,path)#利用构造方法创建my_crawler对象framework.write_resource()#调用my_crawler对象的write_resource方法

阅读全文

0 0