Python — 爬取算法与数据结构 flash

来源：互联网发布：搭建yum仓库编辑：程序博客网时间：2024/06/05 16:35

最近几天，在看数据结构的知识，看到bfprt算法的内容，不太了解。

查看了别人写的博客：http://blog.csdn.net/hnzziafyz/article/details/51339968，提到有一个福州大学的教学视频， http://ds.fzu.edu.cn/fine/resources/FlashContent.asp?id=82，关于算法的讲解用视频的方式最能够让人理解了。

其实是flash文件，看了下效果还不错。看了下网页一共有１０７页，打算全部下载下来。

http://ds.fzu.edu.cn/fine/resources/FlashContent.asp?id=82

分两部分：http://ds.fzu.edu.cn/fine/resources/FlashContent.asp?id= 和 82 （82为网页数）

模板url为　model_url = 'http://ds.fzu.edu.cn/fine/resources/FlashContent.asp?id='

点击全屏欣赏，网页跳到 : http://ds.fzu.edu.cn/fine/resources/TFlash/线性时间选择算法.swf

所以只需要模板ＵＲＬ中正则匹配到.swf文件就可以下载flash了，表达式为：reg = r'value="TFlash/(.+?\.swf)">'

#coding=utf-8import urllibimport remodel_url = 'http://ds.fzu.edu.cn/fine/resources/FlashContent.asp?id='flash_url = 'http://ds.fzu.edu.cn/fine/resources/TFlash/'#src="http://ds.fzu.edu.cn/fine/resources/TFlash/disarrayinsert.swf"def getHtml(url):    page = urllib.urlopen(url)    html = page.read()    return htmldef getflash(html):    reg = r'value="TFlash/(.+?\.swf)">'    flashre = re.compile(reg)    flashname = re.findall(flashre,html)    print flashname    for x in flashname:        flash = flash_url + x         urllib.urlretrieve(flash,x)        print flashfor i in range(1,108):    real_url = model_url+str(107)    print real_url    htmls=getHtml(real_url)    getflash(htmls)    print('The %d page\'s flash are downloaded' % i)

程序运行结果。有些网页是空的没有ｆｌａｓｈ文件，最后总共下载９５个文件

0 0

Python — 爬取 算法与数据结构 flash

Python — 爬取算法与数据结构 flash