Python应用系列（1），抓取aso100网站的app排名

来源：互联网发布：软件需求管理系统编辑：程序博客网时间：2024/05/18 02:19

Python应用系列（1），抓取aso100网站的app排名。

背景：要过年了，要做2016一年的判断，需要和同行业对比，判断趋势。

用途：根据aso100.com网站，抓取新分类下的app应用排名列表，导出到Excel文件。

说明：此段代码仅供学习交流，欢迎评论。

知识点：

1. BeautifulSoup，真心说好用。文档地址 https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html

2. csv读写，文档地址 https://docs.python.org/3.5/library/csv.html

3. 字符串操作 split

import requests

from bs4 import BeautifulSoup

newsurl='https://aso100.com/rank/index/country/cn/device/iphone/brand/free/genre/6009'

res=requests.get(newsurl)

res.encoding="utf-8"

soup =BeautifulSoup(res.text,"html.parser")

#print(soup.prettify())

import csv

with open('C:/xxx.csv', 'w', newline='') ascsvfile:

spamwriter = csv.writer(csvfile, delimiter=' ',

quotechar='|', quoting=csv.QUOTE_MINIMAL)

spamwriter.writerow(['id','url','总排名','分类排名','app名称','公司'])

for link in soup.find_all('div',class_="thumbnail"):

total='-'

if(len(link.h6.next_sibling.next_sibling)>1):

total=link.h6.next_sibling.next_sibling.contents[1].text

id=link.a['href'].split('/')[4]

url='https://aso100.com'+ link.a['href']

spamwriter.writerow([id,url,total,''.join(link.a.h5.text.split('.')[0:1]),''.join(link.a.h5.text.split('.')[1:2]),link.a.h6.text])

print ('抓取完毕')

#查看

with open('C:/xxx.csv', newline='') ascsvfile:

spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')

for row in spamreader:

print(', '.join(row))

原文地址： http://blog.csdn.net/lanmao100/article/details/54025983

转载请注明。

0 0