python wget下载文件处理的一些问题

来源：互联网发布：服务器端软件开发编辑：程序博客网时间：2024/06/06 14:24

wget的安装

由于尝试pip安装一直失败，下载了wget3.2的数据包。

解压后运行python setup.py install即可完成安装。

基本使用的下载方法

import wget

wget.download(downloadURL,filepathandname)

关于下载链接地址的获取

目前使用的方法：利用urllib得到网页数据，然后规则匹配得到相应的链接。

利用urllib获取(是否有更好的方法，BeautifulSoup？)：

import urllibfrom urllib import requestimport repageRequest = request.urlopen(driver.current_url)pageRead = pageRequest.read().decode('utf-8')#为什么需要decode？#在python3.0中 pageRequest.read()返回字节型数据，而re模块需要字符串#系统提示错误can't use a string pattern on a bytes-like objectfor eachline in pageRead.split('\n'):    webDownloadURL = re.findall('src="(.+)"',eachline)    if(len(webDownloadURL)>0) and re.search('iframe',eachline):        wgetURL = webDownloadURL[0]        print('%s'%wgetURL)

其他下载文件的方法

#1、文件存储形式filedownload=urllib2.urlopen(url)urldata=filedownload.read()fwrite=open(path,'wb')fwrite.write(urldata)fwrite.close()#2、urllib.urlretrieveurllib.urlretrieve(url, filename)

关于文件系统的处理

文件主要使用到的包：os，shutil；判断文件是否存在：os.path.exists(‘....’)

#判断文件夹是否存在，不存在建立新的文件夹import osimport shutilif os.path.exists(prefixpathname):    passelse:    os.mkdir(prefixpathname)     #删除文件os.rmdir(...)os.remove(...)#由于os.rmdir无法删除一个包含其他文件夹的文件夹，而shutil具有这样的功能shutil.rmtree(...)

0 0