Python应用系列(1),抓取aso100网站的app排名

来源:互联网 发布:软件需求管理系统 编辑:程序博客网 时间:2024/05/18 02:19

Python应用系列(1),抓取aso100网站的app排名。

 

背景:要过年了,要做2016一年的判断,需要和同行业对比,判断趋势。

用途:根据aso100.com网站,抓取新分类下的app应用排名列表,导出到Excel文件。

说明:此段代码仅供学习交流,欢迎评论。

 

知识点:

1. BeautifulSoup,真心说好用。文档地址 https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html

2. csv读写,文档地址 https://docs.python.org/3.5/library/csv.html

3. 字符串操作 split

 

import requests

from bs4 import BeautifulSoup

newsurl='https://aso100.com/rank/index/country/cn/device/iphone/brand/free/genre/6009'

res=requests.get(newsurl)

res.encoding="utf-8"

soup =BeautifulSoup(res.text,"html.parser")

#print(soup.prettify())

 

import csv

with open('C:/xxx.csv', 'w', newline='') ascsvfile:

   spamwriter = csv.writer(csvfile, delimiter=' ',

                            quotechar='|', quoting=csv.QUOTE_MINIMAL)

   spamwriter.writerow(['id','url','总排名','分类排名','app名称','公司'])

   for link in soup.find_all('div',class_="thumbnail"):

       total='-'

       if(len(link.h6.next_sibling.next_sibling)>1):

           total=link.h6.next_sibling.next_sibling.contents[1].text

       id=link.a['href'].split('/')[4]

       url='https://aso100.com'+ link.a['href']

       

       spamwriter.writerow([id,url,total,''.join(link.a.h5.text.split('.')[0:1]),''.join(link.a.h5.text.split('.')[1:2]),link.a.h6.text])

         

print ('抓取完毕')

#查看

with open('C:/xxx.csv', newline='') ascsvfile:

   spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')

   for row in spamreader:

       print(', '.join(row))


原文地址: http://blog.csdn.net/lanmao100/article/details/54025983

转载请注明。

0 0
原创粉丝点击