python 爬虫初学项目一(80s电影网)

来源:互联网 发布:java程序设计培训 编辑:程序博客网 时间:2024/06/05 21:48

python 爬虫初学项目一(80s电影网)

初学python爬虫,第一篇博客,以后会不断更新。

  • 爬取80s网站的电视剧的部分
  • 爬取每个电视剧中每集的下载链接
  • 做简单的输出打印

代码如下:

代码块

代码块语法遵循标准markdown代码,例如:

import requestsfrom bs4 import BeautifulSoupdef url_open(url):    res = requests.get(url)    res.encoding = 'utf-8'    soup = BeautifulSoup(res.text, 'html.parser')    return soupdef search_80s(number, first_url='http://www.80s.tw/ju/list/----0--p'):    name = []    page = []    url = []    for i in range(1,number+1):        url.append(first_url+str(i))      for i in range(len(url)):            soup = url_open(url[i])        name_list = soup.select('h3 a')[:25]        for line in name_list:            name.append(line.text.strip())                page.append('http://www.80s.tw'+line['href'])    return name,pagedef get_download_url(page):    name = []    url = []    soup = url_open(page)    every_name = soup.select('span a ')    for line in every_name:        name.append(line.text.strip())        url.append(line['href'])    str1 = '豆瓣短评'     if str1 in name:        tmp_index = name.index('豆瓣短评')        name = name[tmp_index+1:-6]        url = url[tmp_index+1:-6]    return name,urlname, page = search_80s(1)for i in range(len(name)):    print(name[i],page[i])    dl_name, dl_url = get_download_url(page[i])    for j in range(len(dl_name)):        print(dl_name[j], dl_url[j])

小弟初学python,写的爬虫代码可能的不太好,希望大家给点建议。