07_python爬虫内容以及介绍

来源:互联网 发布:tftp服务器端软件 编辑:程序博客网 时间:2024/06/02 07:04

有时候看到一些喜欢的动图,如果一个个取保存挺麻烦,有的网站还不支持右键保存,因此使用Python来获取动态图,就看看就很有意思了

本次爬取的网站是  居然搞笑网


思路:

获取当前页面内容

查找页面中动图所代表的url地址

保存这个地址内容到本地

如果想爬取多页,就可以加上一个循环条件


代码:

[python] view plain copy
 在CODE上查看代码片派生到我的代码片
  1. #!/usr/bin/python  
  2. #coding:utf-8  
  3.   
  4. import urllib2,time,uuid,urllib,os,sys,re  
  5. from bs4 import BeautifulSoup  
  6. reload(sys)  
  7. sys.setdefaultencoding('utf-8')  
  8.   
  9. #获取页面内容  
  10. def getHtml(url):  
  11.     try:  
  12.         print url  
  13.         html = urllib2.urlopen(url).read()#.decode('utf-8')#解码为utf-8  
  14.     except:  
  15.         return  
  16.     return html  
  17.       
  18. #获取动图所代表的url列表    
  19. def getImagUrl(html):  
  20.     if not html:  
  21.         print 'nothing can be found'  
  22.         return  
  23.     ImagUrlList=[]  
  24.     soup=BeautifulSoup(html,'lxml')  
  25.     #获取item列表  
  26.     items=soup.find("div",{"class":"main"}).find_all('div',{'class':'item'})  
  27.     for item in items:  
  28.         target={}  
  29.         #通过if语句,过滤广告项  
  30.         if item.find('div',{"class":"text"}):  
  31.             #获取url  
  32.             imgurl=item.find('div',{"class":"text"}).find('img').get('src')  
  33.             target['url']=imgurl  
  34.             #获取名字  
  35.             target['name']=item.find('h3').text  
  36.             ImagUrlList.append(target)  
  37.     return ImagUrlList  
  38.   
  39.   
  40.   
  41. #下载图片到本地  
  42. def download(author,imgurl,typename,pageNo):     
  43.     #定义文件夹的名字  
  44.     x = time.localtime(time.time())  
  45.     foldername = str(x.__getattribute__("tm_year"))+"-"+str(x.__getattribute__("tm_mon"))+"-"+str(x.__getattribute__("tm_mday"))  
  46.     download_img=None  
  47.       
  48.     picpath = 'Jimy/%s/%s/%s'  % (foldername,typename,str(pageNo))  
  49.     filename = author+str(uuid.uuid1())  
  50.     pic_type=imgurl[-3:]  
  51.   
  52.     if not os.path.exists(picpath):  
  53.         os.makedirs(picpath)                 
  54.     target = picpath+"/%s.%s" % (filename,pic_type)  
  55.     print "动图存贮位置:"+target  
  56.     download_img = urllib.urlretrieve(imgurl, target)#将图片下载到指定路径中  
  57.     print "图片出处为:"+imgurl  
  58.     return download_img  
  59.   
  60. #退出函数  
  61. def myquit():  
  62.     print "Bye Bye!"  
  63.     exit(0)  
  64.   
  65. def start(pageNo):  
  66.     targeturl="http://www.zbjuran.com/dongtai/list_4_%s.html" % str(pageNo)  
  67.     html = getHtml(targeturl)  
  68.     urllist=getImagUrl(html)  
  69.     for imgurl in urllist:  
  70.         download(imgurl['name'],imgurl['url'],'搞笑动图',pageNo)  
  71. if __name__ == '__main__':  
  72.     print ''''' 
  73.             ***************************************** 
  74.             **    Welcome to Spider of GIF         ** 
  75.             **      Created on 2017-3-16           ** 
  76.             **      @author: Jimy                  ** 
  77.             *****************************************'''  
  78.     pageNo = raw_input("Input the page number you want to scratch (1-50),please input 'quit' if you want to quit\n\  
  79. 请输入要爬取的页面,范围为(1-100),如果退出,请输入Q>\n>")  
  80.     while not pageNo.isdigit() or int(pageNo) > 50 or  int(pageNo) < 1:  
  81.         if pageNo == 'Q':  
  82.             myquit()  
  83.         print "Param is invalid , please try again."  
  84.         pageNo = raw_input("Input the page number you want to scratch >")  
  85.     print pageNo  
  86.     start(pageNo)  
  87.       
  88.   
  89.     #第一次爬取结束  
  90.     pageNo = raw_input("Input the page number you want to scratch (1-50),please input 'quit' if you want to quit\n\  
  91. 请输入总共需要爬取的页面,范围为(1-5000),如果退出,请输入Q>\n>")  
  92.     while not pageNo.isdigit() or int(pageNo) > 5000 or  int(pageNo) < 1:  
  93.         if pageNo == 'Q':  
  94.             myquit()  
  95.         print "Param is invalid , please try again."  
  96.         pageNo = raw_input("Input the page number you want to scratch >")  
  97.     #循环遍历,爬取多页  
  98.     for num in xrange(int(pageNo)):  
  99.         start(str(num+1))  

结果如下:

[python] view plain copy
 在CODE上查看代码片派生到我的代码片
  1.             *****************************************  
  2.             **    Welcome to Spider of GIF         **  
  3.             **      Created on 2017-3-16           **  
  4.             **      @author: Jimy                  **  
  5.             *****************************************  
  6. Input the page number you want to scratch (1-50),please input 'quit' if you want to quit  
  7. 请输入要爬取的页面,范围为(1-100),如果退出,请输入Q>  
  8. >1  
  9. 1  
  10. http://www.zbjuran.com/dongtai/list_4_1.html  
  11. 动图存贮位置:Jimy/2017-3-16/搞笑动图/1/真是艰难的选择。3f0fe8f6-09f8-11e7-9161-f8bc12753d1e.gif  
  12. 图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F206135ZHJ.gif  
  13. 动图存贮位置:Jimy/2017-3-16/搞笑动图/1/这么贱会被打死吧……3fa9da88-09f8-11e7-9161-f8bc12753d1e.gif  
  14. 图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F206135H35U.gif  
  15. 动图存贮位置:Jimy/2017-3-16/搞笑动图/1/一看就是印度……4064e60c-09f8-11e7-9161-f8bc12753d1e.gif  
  16. 图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F20613543c50.gif  
  17. 动图存贮位置:Jimy/2017-3-16/搞笑动图/1/新垣结衣的正经工作脸414b4f52-09f8-11e7-9161-f8bc12753d1e.gif  
  18. 图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F206135250553.gif  
  19. 动图存贮位置:Jimy/2017-3-16/搞笑动图/1/妹子这是在摇什么的421afa86-09f8-11e7-9161-f8bc12753d1e.gif  
  20. 图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F20613493N03.gif  
  21. Input the page number you want to scratch (1-50),please input 'quit' if you want to quit  
  22. 请输入总共需要爬取的页面,范围为(1-5000),如果退出,请输入Q>  
  23. >Q  
  24. Bye Bye!  

最终就能够获得动态图了


(完)










































http://shishan185225.blog.sohu.com/
http://qinya19669.blog.sohu.com/
http://ludu37164.blog.sohu.com/
http://cidui6683106.blog.sohu.com/
http://zhile1889.blog.sohu.com/
http://duixingjietao.blog.sohu.com/
http://jiaoqian9le.blog.sohu.com/
http://baodi198934.blog.sohu.com/
http://zhitang086707.blog.sohu.com/
http://ranbei3311383.blog.sohu.com/
http://taocheng24066.blog.sohu.com/
http://zhibo3008766.blog.sohu.com/
http://chengwei85808.blog.sohu.com/
http://qianzhongzhengz.blog.sohu.com/
http://shuoke0832234.blog.sohu.com/
http://jionglianlu.blog.sohu.com/
http://jubeituituanchu.blog.sohu.com/
http://pa43817424.blog.sohu.com/
http://weidou5877103.blog.sohu.com/
http://fanfan0812322.blog.sohu.com/
http://kangchuicheng.blog.sohu.com/
http://zhanlu58501373.blog.sohu.com/
http://pingxie001053.blog.sohu.com/
http://tuoyou126126.blog.sohu.com/
http://jing48741512.blog.sohu.com/
http://feichuang7dao.blog.sohu.com/
http://anji3407046.blog.sohu.com/
http://daowei874.blog.sohu.com/
http://fenyong684039.blog.sohu.com/
http://sidi995699.blog.sohu.com/
http://yuandong3ci.blog.sohu.com/
http://shangxian059.blog.sohu.com/
http://ke77585150.blog.sohu.com/
http://nazhaoweipuzhi.blog.sohu.com/
http://daohe403202.blog.sohu.com/
http://zhuozhao8207.blog.sohu.com/
http://xiecong332111.blog.sohu.com/
http://huanmi672099.blog.sohu.com/
http://tangmeng6bi.blog.sohu.com/
http://daopou46564892.blog.sohu.com/
http://qingxian078650.blog.sohu.com/
http://zhaoshi723013.blog.sohu.com/
http://yunzhong006.blog.sohu.com/
http://bi61996350.blog.sohu.com/
http://huansu7942.blog.sohu.com/
http://bei72786133.blog.sohu.com/
http://yaguaya675708.blog.sohu.com/
http://pang11614239.blog.sohu.com/
http://bianyan954804.blog.sohu.com/
http://panghan940387.blog.sohu.com/
http://shiao5606.blog.sohu.com/
http://shiao5606.blog.sohu.com/
http://chejie69069832.blog.sohu.com/
http://xiandoupulao.blog.sohu.com/
http://xiandoupulao.blog.sohu.com/
http://lachui8718.blog.sohu.com/
http://yingyou3658947.blog.sohu.com/
http://jingbu111.blog.sohu.com/
http://mu32992727.blog.sohu.com/
http://quezhuoliangmei.blog.sohu.com/
http://qunba226790.blog.sohu.com/
http://toujiaomuguaish.blog.sohu.com/
http://jingoudi011104.blog.sohu.com/
http://yifei824161.blog.sohu.com/
http://muluyi377214.blog.sohu.com/
http://shanhaiouqiangy.blog.sohu.com/
http://yeshao1534907.blog.sohu.com/
http://laoou56881.blog.sohu.com/
http://guagu31016063.blog.sohu.com/
http://dehuang6yan.blog.sohu.com/
http://dong48649188.blog.sohu.com/
http://simenluhuituo.blog.sohu.com/
http://ciye167785.blog.sohu.com/
http://dixian54057.blog.sohu.com/
http://gougou9643826.blog.sohu.com/
http://youyou094129.blog.sohu.com/
http://tudong06095818.blog.sohu.com/
http://dongguan4148944.blog.sohu.com/
http://yichao1566.blog.sohu.com/
http://yutuan0237580.blog.sohu.com/
http://taokuangzai.blog.sohu.com/
http://guaitan0212.blog.sohu.com/
http://beimi209301.blog.sohu.com/
http://xianqin9749060.blog.sohu.com/
http://naliang92411.blog.sohu.com/
http://dutong7814288.blog.sohu.com/
http://zhansongxiantao.blog.sohu.com/
http://blog.sohu.com/home/news/index.htm
http://yetui2zhui.blog.sohu.com/
http://bitao10890135.blog.sohu.com/
http://chen71022738.blog.sohu.com/
http://yao06394935.blog.sohu.com/
http://sijia7948.blog.sohu.com/
http://fufeilurezhi.blog.sohu.com/
http://jiacai115618.blog.sohu.com/
http://qiaolin360596.blog.sohu.com/
http://purangzijue.blog.sohu.com/
http://xiayuan7838.blog.sohu.com/
http://paoping24185.blog.sohu.com/
http://chixin1duan.blog.sohu.com/
http://shishixi198833.blog.sohu.com/
http://beishan4664475.blog.sohu.com/
http://bengkerongbi.blog.sohu.com/
http://nuozhongguba.blog.sohu.com/
http://polei1282473.blog.sohu.com/
http://badihangongcang.blog.sohu.com/
http://lukong091985.blog.sohu.com/
http://yinei051529.blog.sohu.com/
http://yanyaoouhaoxia.blog.sohu.com/
http://dihaohezhaoya.blog.sohu.com/
http://shihuang916116.blog.sohu.com/
http://bizhi3693465.blog.sohu.com/
http://congmenbo.blog.sohu.com/
http://zhunxunqian.blog.sohu.com/
http://qiaopi93391332.blog.sohu.com/
http://zhanyong3783.blog.sohu.com/
http://dutuan450487.blog.sohu.com/
http://wuzhe9288558.blog.sohu.com/
http://tuoxinshuoyongf.blog.sohu.com/
http://laolian9843251.blog.sohu.com/
http://kebizong213477.blog.sohu.com/
http://queshi1835145.blog.sohu.com/
http://xingzhi5615.blog.sohu.com/
http://yaoyan5ren.blog.sohu.com/
http://liangxianbenluk.blog.sohu.com/
http://poguizhangshaqi.blog.sohu.com/
http://kanzhuioulianzh.blog.sohu.com/
http://jikeshaoshaolia.blog.sohu.com/
http://xianyong0702250.blog.sohu.com/
http://lanzai4327556.blog.sohu.com/
http://julu0684924.blog.sohu.com/
http://meirao73747901.blog.sohu.com/
http://guzhi656608037.blog.sohu.com/
http://yipo67938.blog.sohu.com/
http://cangyin50156.blog.sohu.com/
http://yonggua0798633.blog.sohu.com/
http://paowei5810021.blog.sohu.com/
http://jiyou1688687.blog.sohu.com/
http://butuoxunjique.blog.sohu.com/
http://bisi6892994.blog.sohu.com/
http://zijing51708541.blog.sohu.com/
http://jiaochi7715.blog.sohu.com/
http://fuyou95335.blog.sohu.com/
http://lumei370594.blog.sohu.com/
http://hetuijionghuaic.blog.sohu.com/
http://yexia596359.blog.sohu.com/
http://panchengdouliao.blog.sohu.com/
http://aoju6793521230.blog.sohu.com/
http://xinggai755240.blog.sohu.com/
http://bupu2037212.blog.sohu.com/
http://gaizhui759027.blog.sohu.com/
http://dugai1517609.blog.sohu.com/
http://jiaowei5091.blog.sohu.com/
http://guayao3gou.blog.sohu.com/
http://yajiao73215411.blog.sohu.com/
http://pangcang7115.blog.sohu.com/
http://guxie1754047.blog.sohu.com/
http://yimei58853774.blog.sohu.com/
http://jichun5109804.blog.sohu.com/
http://yugou676384.blog.sohu.com/
http://jiaocong99078.blog.sohu.com/
http://qiangjiao199598.blog.sohu.com/
http://miyi54123243.blog.sohu.com/
http://luxian4996595.blog.sohu.com/
http://sheliao3095242.blog.sohu.com/
http://fucan201728037.blog.sohu.com/
http://taopuzhi045254.blog.sohu.com/
http://anmeng7739962.blog.sohu.com/
http://xiejiu792979.blog.sohu.com/
http://yunxiong2055.blog.sohu.com/
http://chaosi0515.blog.sohu.com/
http://congcailuoqiya.blog.sohu.com/
http://panzi1583746.blog.sohu.com/
http://pabi161252138.blog.sohu.com/
http://beipu2fu406685.blog.sohu.com/
http://mengluan835423.blog.sohu.com/
http://puchuang9059747.blog.sohu.com/
http://chunhefangdi.blog.sohu.com/
http://fengcongba.blog.sohu.com/
http://feixie844735.blog.sohu.com/
http://zhuozhong48472.blog.sohu.com/
http://xihuang08455.blog.sohu.com/
http://anzhuo368995.blog.sohu.com/
http://yazi26544845.blog.sohu.com/
http://tuoye152356.blog.sohu.com/
http://fanggua313.blog.sohu.com/
http://shensha322414.blog.sohu.com/
http://beianju589206.blog.sohu.com/







































0 0