在虾米音乐查找豆瓣音乐豆列中的专辑

来源：互联网发布：用vs2012编写c语言编辑：程序博客网时间：2024/05/01 21:23

在虾米音乐查找豆瓣音乐豆列中的专辑，返回播放器地址的小工具。放在了github.com上，第一作，嗯。对python还没有非常熟练，这python让我写的像php一样……

（考虑到基本不会有人用命令行来做这种事情所以这个就当我做的玩具好了顺便熟悉下github.com）

豆瓣没有获取豆列信息的api，所以只能自己从豆列页面里抓出来，好在有BeautifulSoup这个库一切都变得很简单。从豆瓣的豆列里可以找到专辑的条形码，但是虾米API返回的数据里没有条形码，所以只能用匹配专辑名和艺术家名这种很拙劣的方式。艺术家名可能有不同的写法，这就导致会漏掉很多虾米上其实能找到的专辑。但如果只匹配专辑名，出错率又高的离谱，劣中选优，使用专辑名和艺术家名两个字段来匹配一张专辑。

如果能找到更好的匹配方式，就把这个做成在线小工具。

源码：

#! /usr/bin/python# -*- coding: utf-8 -*-import sys, os, urllib, urllib2, jsonfrom bs4 import BeautifulSoup#User-Agentua = 'Mozilla/5.0'#豆瓣音乐搜索接口dbsch = 'https://api.douban.com/v2/music/search?q='#虾米音乐搜索接口xmsch = 'http://www.xiami.com/app/nineteen/search/key/'#虾米播放器url前部xmplay_font = 'http://www.xiami.com/song/play?ids=/song/playlist/id/'#虾米播放器url后部xmplay_end = '/type/1'#使用url获取数据def get_content(url, user_agent = None):    try:        if user_agent:            req = urllib2.Request(url, headers = {'User-Agent': user_agent})        else:            req = urllib2.Request(url)        fd = urllib2.urlopen(req)        data = None         while 1:            buf = fd.read(1024*9)            if not len(buf):                break            else:                if data is None:                    data = buf                else:                    data += buf         return data    except:        pass#在虾米查找url指定的豆列中的专辑def work(url):    #使用BeautifuSoup4解析豆列页面数据 得到所有专辑的标题和艺术家列表    soup = BeautifulSoup(get_content(url))    titles = soup.find_all('div', class_='pl2')    titles = [t.a.string for t in titles]    artists = soup.find_all('p', class_='pl')    artists = [a.text.split(':')[1][:-4] for a in artists]    disk_list= zip(titles, artists)    #在虾米查找专辑    jsd = json.JSONDecoder()    for disk in disk_list:        title = disk[0].strip().encode('utf-8')        singer = disk[1].strip().encode('utf-8')        print '-'*20        print 'search %s-%s...' % (title, singer)            #使用虾米api查找专辑 向虾米发送请求要加上User-Agent 否则会出现403错误        page = 1        while 1:            url = '%s%s+%s/page/%s' % (xmsch, title, singer, page)            page += 1             xmlist = get_content(url, ua)            try:                #解析json                xmlist = jsd.decode(xmlist)            except:                break            #遍历搜索结果集的一个分页 寻找豆列中的专辑            if not xmlist['results']:                print '%s-%s not found' % (title, singer)                break            else:                album_id = None                for item in xmlist['results']:                    ctitle = urllib.unquote(repr(item['album_name'])).strip()                    csinger = urllib2.unquote(repr(item['artist_name'])).strip()                    #拙劣的匹配方式……                    if title in ctitle and singer in csinger:                        album_id = item['album_id'].encode('utf-8')                        break                else:                    continue                #构造虾米播放器的url                print '%s-%s was found in album %s' % \                       (title, singer, album_id)                print '%s%s%s' % (xmplay_font, album_id, xmplay_end)                break        if __name__ == '__main__':    if len(sys.argv) < 2:        print 'Useage: python dl.py douban_list_url.'        exit(0)    else:        work(sys.argv[1])

PS：推荐一个能在豆瓣页面直接听豆列的chrome插件豆皮