CVPR2017_Papers下载爬虫程序

来源:互联网 发布:编程软件cimit怎么样 编辑:程序博客网 时间:2024/06/17 04:54

每年CVPR总是要看不少papers,于是,不如把所有papers都下载下来,再一一筛选,免去了在线查找的麻烦。So,下载就是简单的不能再简单的爬虫程序,毕竟,山不在高,有仙则名,水不在深,有龙则灵,code不在全,能用就行!

#!/usr/bin/env python# coding=utf-8import urllibimport urllib2import redef getHtml(url):    page = urllib.urlopen(url)    html = page.read()    return htmldef download_file(download_url,file_name, count):    response = urllib2.urlopen(download_url)    file = open(file_name, 'w')    file.write(response.read())    file.close()    print("Completed" + str(count).zfill(4))save_path = '/home/nick/cvpr2017/'  # New folderurl = 'http://openaccess.thecvf.com/CVPR2017.py'html = getHtml(url)parttern = re.compile(r'\bcontent_cvpr_2017.*paper\.pdf\b')url_list = parttern.findall(html)print len(url_list)  # Should be 783count = 0breakpoint = 0for url in url_list:    count += 1    if count>breakpoint:  # Sometime there is timeout wrong, So we need to continue to  download from the breakpoint        name = url.split('/')[-1]        file_name = save_path + name        download_file('http://openaccess.thecvf.com/'+url,file_name, count)