Python简单网页爬虫——极客学院视频自动下载

来源:互联网 发布:cf手雷威力排行 数据 编辑:程序博客网 时间:2024/05/16 10:17

一、需求背景

最近正好在研究Python,看了菜鸟教程上的基本教程,然后又再看极客学院的教学视频,向实战进军。

极客学院的视频是需要年费会员才能下载的,客户端倒是可以批量下载,但是下载之后,没有目录结构,文件名和扩展名也被隐掉了,只能在客户端观看,但是客户端又做的没那么人性化,不能按课程分门别类,所有的课程都是在一个列表之中,很是麻烦,而且资料又不全。
恰好,看到了网页爬虫的相关内容,正好可以解决我这一问题,来个自动化下载,连带资料一起打包,按“类型/阶段/课程/视频”多级目录下载,岂不是很省事。

为什么要写一个自动化的脚本,原因如下:
1 下载一个视频,至少点击两下
2 文件名是一串代号
3 没有批量下载,下载整节课程麻烦死

年费会员的服务也就这样了,除了可以下载和缓存这个特权外,服务就一般般了

说干就干,Let’s go

注:本例基于Python3,在Ubuntu 14.04下开发,其已经集成了Python2和3,默认在控制台输入python命令使用的是Python2,调用Python3使用命令”python3”

二、获取课程路线

爬取极客学院只是体系图中的所有课程体系,并得到其链接集合
http://www.jikexueyuan.com/path/

PathSpider.py

#!/usr/bin/python3import requestsimport SpiderUtilfrom lxml import etreefrom CoursePathSpider import CoursePathSpiderclass PathSpider(object):    """ 课程路径图爬虫 """    URL_PATH = 'http://www.jikexueyuan.com/path/'    XPATH_PATH_LINK = '//a[@class="pathlist-one cf"]'    XPATH_PATH_NAME = 'div[@class="pathlist-txt"]/h2/text()'    XPATH_PATH_INTRO = 'div[@class="pathlist-txt"]/p/text()'    def __init__(self):        super(PathSpider, self).__init__()        self.path_info_list = []        self.response = None        self.selector = None    def parse_html(self):        print("正在获取课程路线列表...")        # try:        self.response = requests.get(PathSpider.URL_PATH)        # print(self.response.text)        self.selector = etree.HTML(self.response.text)        for link_ele in self.selector.xpath(PathSpider.XPATH_PATH_LINK):            self.path_info_list.append(_PathInfo(link_ele))        # except Exception:        #     print("连接异常")        #     # exit()        # else:        #     pass    def show(self):        n = 0        print("共有学习路线图:", len(self.path_info_list), "个,分别是:")        for info in self.path_info_list:            n += 1            print(str(n)+".", info.name)    def show_detail(self, index):        if SpiderUtil.is_valid_index(index, len(self.path_info_list)):            self.path_info_list[index].show()            return "OK"        else:            return "error"    def download(self, index):        if SpiderUtil.is_valid_index(index, len(self.path_info_list)):            print("开始下载", self.path_info_list[index].name)            self.path_info_list[index].download()            return "OK"        else:            return "error"class _PathInfo(object):    def __init__(self, selector):        super(_PathInfo, self).__init__()        # print("正在获取课程路线列表...")        self.selector = selector        self.name = selector.xpath(PathSpider.XPATH_PATH_NAME)[0]        self.inrto = selector.xpath(PathSpider.XPATH_PATH_INTRO)        self.url = selector.xpath('@href')[0]    def show(self):        print("课程:", self.name)        print("简介:", self.inrto)        print("链接:", self.url)        return    def sub_spider(self):        spider = CoursePathSpider(self.url, self.name)        return spider    def download(self):        self.sub_spider().download()

三、分析课程路线

分析具体的课程体系,按章节分组,并得到其课程视频链接集合
获取的结果是一个二级目录机构,如下所示:

  1. Python快速入门
    1. Python语言集成开发环境搭建
    2. Python语言基本语法
    3. Python语言Web开发框架web2py
  2. Python初级课程
    ……..

在下载视频时,将其作为存储路径,这样就得到了一个层级的目录结构,方便观看

代码如下:

CoursePathSpider.py

#!/usr/bin/python3import requestsimport SpiderUtilimport osfrom lxml import etreefrom LessonVideoSpider import VideoSpiderclass CoursePathSpider(object):    """ 课程路径图网页分析 """    # 课程章路径    XPATH_CHAPTER = '//*[@id="container"]/div/div[@class="pathstage mar-t30"]'    # 章节名    xpath_chapter_name = 'div[@class="pathstage-txt"]/h2/text()'    # 章下的课程列表路径    xpath_chapter_lesson_list = 'div/div[@class="stagewidth lesson-list"]/ul[@class="cf"]/li'    # 课程名和链接    xpath_lesson_name = 'div[@class="lesson-infor"]/h2[@class="lesson-info-h2"]/a/text()'    xpath_lesson_link = 'div[@class="lesson-infor"]/h2[@class="lesson-info-h2"]/a/@href'    def __init__(self, url, simple_name):        super(CoursePathSpider, self).__init__()        self.url = url        self.simple_name = simple_name        self.response = None        self.chapter_list = []        self.selector = None        self.title = ''        self.chapter_list = []    def parse_html(self):        print("正在打开网址:", self.url)        self.response = requests.get(self.url)        print("开始处理返回结果...")        self.selector = etree.HTML(self.response.text)        self.title = self.selector.xpath('//title/text()')[0]        if self.simple_name == '':            if len(self.title) > 10:                self.simple_name = self.title[0, 10]            else:                self.simple_name = self.title        print("课程名称:", self.title)        for chapterEle in self.selector.xpath(CoursePathSpider.XPATH_CHAPTER):            self.add_chapter(_Chapter(chapterEle))    def add_chapter(self, chapter):        if isinstance(chapter, _Chapter):            self.chapter_list.append(chapter)        else:            raise ValueError("chapter is not a instance of Chapter")    def download(self, path, index='a'):        path = path + "/" + self.simple_name        if SpiderUtil.is_all(index):            print("下载完整路线")            self.download_all(path)        else:            index2 = SpiderUtil.is_valid_index(index, len(self.chapter_list))            print("下载:" + self.chapter_list[index2].name)            return self.chapter_list[index2].download(path)    def download_all(self, path):        for chapter in self.chapter_list:            chapter.download(path)        print("课程路线:", self.title, '保存成功')    def show(self):        print(self.title)        for chapter in self.chapter_list:            chapter.show()    def lessons(self):        lesson_list_2 = []        for chapter in self.chapter_list:            lesson_list_2.extend(chapter.lessonlist)        return lesson_list_2class _Chapter(object):    """按章分析 """    def __init__(self, selector):        super(_Chapter, self).__init__()        self.selector = selector        self.lesson_list = []        self.name = ""        self.parse_html()    def parse_html(self):        self.name = self.selector.xpath(CoursePathSpider.xpath_chapter_name)[0]        print(self.name)        index = 0        for lessonEle in self.selector.xpath(CoursePathSpider.xpath_chapter_lesson_list):            index += 1            self.add_lesson(_Lesson(lessonEle, pre_name=str(index) + ". "))    def add_lesson(self, lesson):        if isinstance(lesson, _Lesson):            self.lesson_list.append(lesson)        else:            raise ValueError("lesson is not a instance of Lesson")    def download(self, path, index='a'):        if SpiderUtil.is_all(index):            self.download_all(path)        else:            index2 = SpiderUtil.is_valid_index(index, len(self.lesson_list))            if index2 != 0:                print("下载:" + self.lesson_list[index2].name)                return self.lesson_list[index2].download(path)            else:                return "error"        return "OK"    def download_all(self, parent):        path = parent + "/" + self.name        for lesson in self.lesson_list:            lesson.download(path)            # result =            # if not SpiderUtil.is_ok(result):            #     break    def show(self):        print(self.name)        for lesson in self.lesson_list:            lesson.show()class _Lesson(object):    """分析课程信息 """    def __init__(self, selector, pre_name=''):        self.selector = selector        self.name = pre_name + selector.xpath(CoursePathSpider.xpath_lesson_name)[0]        self.link = selector.xpath(CoursePathSpider.xpath_lesson_link)[0]        self.path = ""        self.sub_spider = VideoSpider()        # print("--", self.name)    def download(self, parent):        self.path = parent + "/" + self.name        if not os.path.exists(self.path):            os.makedirs(self.path)            self.save_inf()        print("正在下载课程:", self.name)        self.sub_spider.download(self.path, self.link)        self.sub_spider.save_info(self.path)    def save_inf(self):        file = open(self.path + "/readme.txt", "a+")        file.write("#课程名称")        file.write("\nname=" + self.name)        file.write("\n#课程链接")        file.write("\nlink=" + self.link)        file.close()    def show(self):        print("-- ", self.name, "=", self.link)

四、分析视频链接

重点就是这个了,极客学院的视频下载很麻烦,没有批量下载入口。观看视频时只能切换一个视频,然后点“下载本节视频”,很是不方便,把cookies和请求头,视频播放链接,三个核心参数,传给下面模块,就可以实现自动下载了。

LessonVideoSpider.py

#!/usr/bin/python3import osimport reimport requestsfrom lxml import etreeimport SpiderUtilclass VideoSpider(object):    STATUE_SUCCEED = '成功:'    STATUE_FAILED = '失败:'    STATUE_JUMPED = '跳过:'    url_download = 'http://www.jikexueyuan.com/course/video_download'    video_ex_name = ".mp4"    xpath_lesson_bg = '//div[@class="infor-content"]/text()'    xpath_video_list = '//div[@class="lesson-box"]/ul/li'    xpath_video_index = 'i[@class="lessonmbers"]/em/text()'    xpath_video_name = 'div[@class="text-box"]/h2/a/text()'    xpath_video_href = 'div[@class="text-box"]/h2/a/@href'    """ VideoSpider 课程详情页,即视频播放页面  """    def __init__(self):        super(VideoSpider, self).__init__()        self.url = ""        self.course_id = ""        self.video_list = []        self.bg_txt = ""        self.response = None        self.selector = None    def parse_html(self, url):        if re.match("http://", url) is None:            raise ValueError("Invalid URL "+url)        self.url = url        print("正在获取下载地址:" + self.url)        course_id_list = re.findall('/(\d*?).html', url)        self.course_id = ""        if len(course_id_list) == 1:            self.course_id = course_id_list[0]        self.response = requests.get(self.url, headers=SpiderUtil.headers, cookies=SpiderUtil.cookies)        self.selector = etree.HTML(self.response.text)        self.bg_txt = self.selector.xpath(VideoSpider.xpath_lesson_bg)        for video_ele in self.selector.xpath(VideoSpider.xpath_video_list):            video = Video()            video.parse_html(video_ele, self.course_id)            self.add_video(video)        print("下载地址分析完成:" + self.url)    def add_video(self, video):        if isinstance(video, Video):            self.video_list.append(video)    def download(self, path, url):        self.parse_html(url)        print("开始下载视频")        result = {VideoSpider.STATUE_SUCCEED: 0, VideoSpider.STATUE_FAILED: 0, VideoSpider.STATUE_JUMPED: 0}        if not os.path.exists(path):            os.makedirs(path)        for video in self.video_list:            data = video.download(path)            result[data] += 1        file = open(path + "/readme.txt", "a+")        file.write("\n下载日志:总计" + str(len(self.video_list)) + str(result))        file.close()        print("该课程视频下载完成:总计", len(self.video_list), result)        return result    def save_info(self, path):        if not os.path.exists(path):            os.makedirs(path)        file = open(path + "/readme.txt", "a+")        for text in self.bg_txt:            file.write(text)        for video in self.video_list:            video.save_info(file)        file.close()class Video(object):    """docstring for VideoInfo"""    def __init__(self):        super(Video, self).__init__()        self.seq = "1"        self.index = ""        self.name = ""        self.href = ""        self.response = None        self.result_dic = {}        self.download_flag = False    def parse_html(self, selector, course_id):        self.index = selector.xpath(VideoSpider.xpath_video_index)[0]        self.name = selector.xpath(VideoSpider.xpath_video_name)[0]        self.href = selector.xpath(VideoSpider.xpath_video_href)[0]        temp = re.findall('_(\d).html', self.href)        if len(temp) == 1:            self.seq = temp[0]        params = {'seq': self.seq, 'course_id': course_id}        self.response = requests.get(VideoSpider.url_download, params=params,                                     headers=SpiderUtil.headers, cookies=SpiderUtil.cookies)        self.result_dic = eval(self.response.text)        # print("下载请求返回结果:", self.response.text)        if len(self.result_dic["data"]) == 0:            raise ValueError("cookies is not a valid")        else:            self.download_flag = True            return "OK"    def url(self):        if self.result_dic["code"] == 200:            return self.result_dic["data"]["urls"]        else:            return ""    def file_name(self):        try:            self.result_dic["filename"]        except KeyError:            file_name = SpiderUtil.replace_special(self.result_dic["data"]["title"])            self.result_dic["filename"] = self.index + "." + file_name + VideoSpider.video_ex_name        else:            pass        finally:            return self.result_dic["filename"]    def download(self, path):        if not self.download_flag:            raise ValueError("cookies is not a valid")        file_name = path + "/" + self.file_name()        if not os.path.exists(file_name):            try:                print("正在下载视频:", self.file_name())                response = requests.get(self.url())                with open(file_name, "wb") as code:                    code.write(response.content)                print("【", file_name, "】下载完成")            except Exception:                return VideoSpider.STATUE_FAILED            else:                return VideoSpider.STATUE_SUCCEED        else:            print("【", file_name, "】已经存在了")            return VideoSpider.STATUE_JUMPED    def save_info(self, file):        file.write("\n"+self.seq + self.name)        file.write("\n请求结果:" + str(self.result_dic))

五、客户端

Main.py

#!/usr/bin/python3from PathSpider import PathSpiderimport platformimport osimport redef int_input(info="请选择:", list=()):    input_str = ""    while input_str == "":        input_str = input(info)        input_int = -1        try:            input_int = int(input_str)        except ValueError:            print("请输入数字!")            continue        else:            if list.count(input_int) == 0:                print("请根据提示输入!")                continue            else:                return input_intdef path_input():    ''' 检查输入的路径是否存在,存在则返回        如果不存在,创建该文件夹,            创建失败,重新输入            创建成功,返回路径     '''    default = default_path()    input_path = ""    while input_path == "":        input_path = input('请输入存储路径【默认"' + default + '"】:')        if input_path == "":            input_path = default        elif re.match("[A-Z]|[a-z]://", input_path) is None:            input_path = default + "/" + input_path        elif re.match("/|~", input_path) is None:            input_path = default + "/" + input_path        else:            pass        if check_path(input_path) or make_file_path(input_path):            return input_path        else:            print("地址【", input_path, "】无效,情重新输入", flush=True)            continuedef check_path(file_path):    if os.path.exists(file_path):        return True    else:        return Falsedef make_file_path(file_path):    try:        os.makedirs(file_path)    except IOError as ioe:        ioe.print("创建文件夹失败")        return False    else:        return Truedef default_path():    sys_name = platform.system()    path_list = []    if sys_name == "Windows":        path_list = ["F://", "E://", "D://", "C://"]    elif sys_name == "Linux":        path_list = ["/mnt/hgfs/G/", "~/"]    else:        return ''    for dpath in path_list:        if os.path.exists(dpath):            return dpath + '极客学院视频'    return ""if __name__ == '__main__':    try:        path_spider = PathSpider()        path_spider.parse_html()        path_spider.show()        _path_list = path_spider.path_info_list        course_spider = None        index = 0        if len(_path_list) > 0:            print("[1~", len(_path_list), "]选择课程路线/ 0 退出")            index = int_input(list=range(0, len(_path_list)))        if index == 0:            exit()        else:            _path_list[index - 1].show()            course_spider = _path_list[index - 1].sub_spider()            course_spider.parse_html()            print("全部下载[0],按章下载请输入[1~", len(course_spider.chapter_list), "]")            index = int_input(list=range(0, len(_path_list)))        path = path_input()        print("视频将下载到下载到:", path)        if index == 0:            course_spider.download(path)        else:            course_spider.download(path, index)    except ConnectionError as ce:        print("连接超时,请检查网络!")    else:        print("下载结束!")

六、辅助工具模块

SpiderUtil.py

#!/usr/bin/python3import stringheaders = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',           'Accept-Encoding': 'gzip, deflate, sdch',           'Accept-Language': 'zh-CN,zh;q=0.8,en;q=0.6',           'Connection': 'keep-alive',           'DNT': '1',           'Host': 'www.jikexueyuan.com',           'Upgrade-Insecure-Requests': '1',           'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36'}# 具体的值就不分享了,每次登录之后也是不一样的,想要那个视频的可以留言,发你邮箱cookies = {'stat_uuid': '',           'sensorsdata2015jssdkcross': '',           'r_user_id': '',  # stat_fromWebUrl': '',           'stat_ssid': '',           'looyu_id': '',           '_gat': '1',           'uname': '用户名',           'uid': '',           'code': '',           'authcode': '登录了你的会员账号,去浏览器里复制cookies',           'level_id': '2',           'is_expire': '0',           'domain': '',           '_99_mon': '',           'Hm_lvt_f3c68d41bda15331608595c98e9c3915': '',           'Hm_lpvt_f3c68d41bda15331608595c98e9c3915': '',           # 'undefined': '',           'stat_isNew': '0',           'looyu_20001269': '好像没什么用',           '_ga': '', }def is_ok(str1):    if isinstance(str1, str):        return str1.lower() == "ok" or str1.lower() == "y" or str == ''    else:        return Falsedef is_all(str1):    if isinstance(str1, str):        return str1.lower() == "a" or str == ''    else:        return Falsedef is_valid_index(index, length):    if isinstance(index, int):        if (index >= 1) and (index <= length):            return index - 1    elif isinstance(index, str):        try:            index2 = int(index)        except Exception as e:            print(e)            return 0        else:            if (index2 >= 1) and (index2 <= length):                return index2 - 1    else:        return 0def replace_special(source_str):    special = ('/', '\\', ':', '<', '>', '|', '*', '?', '"', ' ')    for s in special:        source_str = source_str.replace(s, "")    return source_strif __name__ == '__main__':    jieguo = is_valid_index("3", 10)    print(jieguo)    print(is_ok("Ok"))    print(is_ok("oo"))    print("特殊字符替换", replace_special('/ \\ " ? * | < > : '))

这样我们就得到了,一个带目录结构的链接列表了,我们可以根据这个,创建基本目录,从而将下载的视频文件写入到对应目录。
以上这个课程体系链接是不需要什么验证就能直接打开的,但是打开我们爬取到的链接是需要用户名验证的,有些收费视频只有会员才能观看,而且是只有会员才能下载的。
接下来,需要在提交请求的时候,将验证信息一同提交,以拿到下载视频的链接
前提:有会员账号,点我注册即送一个月会员

七、最终效果

执行命令:./Main.py正在获取课程路线列表...共有学习路线图: 24 个,分别是:1. Android2. HTML5开发3. Java语言4. PHP语言5. JavaWeb6. iOS开发7. Asp.Net8. C语言9. Python10. Unity3D11. Swift语言12. C#13. Bootstrap14. Cocos2d-x游戏开发15. 计算机一级16. 计算机二级17. GUI18. WatchKit开发19. Docker20. Egret21. Arduino22. Android UiAutomator23. JavaScript24. Node.js[1~ 24 ]选择课程路线/ 0 退出请选择:2课程: HTML5开发简介: ['HTML5增加了很多新特性,包含Canvas元素、Video元素和Audio元素等。\r\n']链接: http://www.jikexueyuan.com/path/html5/正在打开网址: http://www.jikexueyuan.com/path/html5/开始处理返回结果...课程名称: Html5从入门到精通学习知识体系_极客学院1. HTML5开发前准备2. HTML5基础3. CSS基础4. CSS3 基础5. CSS3 炫酷动画6. JavaScript基础7. HTML5新特性8. 响应式布局9. jQuery基础10. jQuery UI基础11. jQuery Mobile基础12. 移动HTML5开发进阶全部下载[0],按章下载请输入[1~ 12 ]请选择:0请输入存储路径【默认"/mnt/hgfs/G/极客学院视频"】:视频将下载到下载到: /mnt/hgfs/G/极客学院视频下载完整路线正在下载课程: 1. HTML5开发前准备正在获取下载地址:http://www.jikexueyuan.com/course/181.html下载地址分析完成:http://www.jikexueyuan.com/course/181.html开始下载视频正在下载视频: 1.HTML5开发前的准备.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/1. HTML5开发前准备/1. HTML5开发前准备/1.HTML5开发前的准备.mp4 】下载完成正在下载视频: 2.开发前的准备-快捷键.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/1. HTML5开发前准备/1. HTML5开发前准备/2.开发前的准备-快捷键.mp4 】下载完成该课程视频下载完成:总计 2 {'跳过:': 0, '失败:': 0, '成功:': 2}正在下载课程: 1. HTML5特性简介正在获取下载地址:http://www.jikexueyuan.com/course/127.html下载地址分析完成:http://www.jikexueyuan.com/course/127.html开始下载视频正在下载视频: 1.HTML5简介.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/2. HTML5基础/1. HTML5特性简介/1.HTML5简介.mp4 】下载完成正在下载视频: 2.HTML5集成开发环境搭建.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/2. HTML5基础/1. HTML5特性简介/2.HTML5集成开发环境搭建.mp4 】下载完成正在下载视频: 3.HTML5基础详解.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/2. HTML5基础/1. HTML5特性简介/3.HTML5基础详解.mp4 】下载完成该课程视频下载完成:总计 3 {'跳过:': 0, '失败:': 0, '成功:': 3}正在下载课程: 2. HTML5元素、属性和格式化正在获取下载地址:http://www.jikexueyuan.com/course/128.html下载地址分析完成:http://www.jikexueyuan.com/course/128.html开始下载视频正在下载视频: 1.HTML5元素简介及使用方法.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/2. HTML5基础/2. HTML5元素、属性和格式化/1.HTML5元素简介及使用方法.mp4 】下载完成正在下载视频: 2.HTML5属性使用方法.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/2. HTML5基础/2. HTML5元素、属性和格式化/2.HTML5属性使用方法.mp4 】下载完成正在下载视频: 3.HTML5格式化及使用.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/2. HTML5基础/2. HTML5元素、属性和格式化/3.HTML5格式化及使用.mp4 】下载完成该课程视频下载完成:总计 3 {'跳过:': 0, '失败:': 0, '成功:': 3}正在下载课程: 3. HTML5样式、链接和表格正在获取下载地址:http://www.jikexueyuan.com/course/136.html下载地址分析完成:http://www.jikexueyuan.com/course/136.html开始下载视频正在下载视频: 1.HTML5样式的使用.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/2. HTML5基础/3. HTML5样式、链接和表格/1.HTML5样式的使用.mp4 】下载完成正在下载视频: 2.HTML5链接属性及使用.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/2. HTML5基础/3. HTML5样式、链接和表格/2.HTML5链接属性及使用.mp4 】下载完成正在下载视频: 3.HTML5表格使用.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/2. HTML5基础/3. HTML5样式、链接和表格/3.HTML5表格使用.mp4 】下载完成该课程视频下载完成:总计 3 {'跳过:': 0, '失败:': 0, '成功:': 3}正在下载课程: 4. HTML5列表、块和布局正在获取下载地址:http://www.jikexueyuan.com/course/135.html下载地址分析完成:http://www.jikexueyuan.com/course/135.html开始下载视频正在下载视频: 1.HTML5列表的使用.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/2. HTML5基础/4. HTML5列表、块和布局/1.HTML5列表的使用.mp4 】下载完成正在下载视频: 2.HTML5块元素标签的使用.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/2. HTML5基础/4. HTML5列表、块和布局/2.HTML5块元素标签的使用.mp4 】下载完成正在下载视频: 3.HTML5布局的使用.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/2. HTML5基础/4. HTML5列表、块和布局/3.HTML5布局的使用.mp4 】下载完成该课程视频下载完成:总计 3 {'跳过:': 0, '失败:': 0, '成功:': 3}正在下载课程: 5. HTML5表单提交和PHP环境搭建正在获取下载地址:http://www.jikexueyuan.com/course/139.html下载地址分析完成:http://www.jikexueyuan.com/course/139.html开始下载视频正在下载视频: 1.HTML5表单的创建.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/2. HTML5基础/5. HTML5表单提交和PHP环境搭建/1.HTML5表单的创建.mp4 】下载完成正在下载视频: 2.PHP环境搭建.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/2. HTML5基础/5. HTML5表单提交和PHP环境搭建/2.PHP环境搭建.mp4 】下载完成正在下载视频: 3.HTML5表单与PHP交互.mp4【 /mnt/hgfs/G/极客学院视频/HTML5开发/2. HTML5基础/5. HTML5表单提交和PHP环境搭建/3.HTML5表单与PHP交互.mp4 】下载完成该课程视频下载完成:总计 3 {'跳过:': 0, '失败:': 0, '成功:': 3}

好了,这样直接运行Main.py就可以实现下载了,前面三个模块都是可以独立使用的。LessonVideoSpider需要依赖SpiderUtil,当然你也可以将两个类合并一下。

刚刚开始学习,有错误的地方请指正

0 0
原创粉丝点击