Python抓取第一网贷中国网贷理财每日收益率指数

来源：互联网发布：素媛真实事件结局知乎编辑：程序博客网时间：2024/04/29 05:06

链接：http://www.p2p001.com/licai/index/id/147.html

所需获取数据链接类似于：http://www.p2p001.com/licai/shownews/id/454.html：

库：

requests （For human）

re （正则）

pandas （用来处理数据）

BeautifulSoup （用来解析网页文本）

此次抓取逻辑思维在代码之后

上代码：

#coding utf-8import requestsimport reimport pandasfrom bs4 import BeautifulSoupuser_agent = 'User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)'headers = {'User-Agent':user_agent}#定义函数,得到每日报的链接,并以列表形式返回def get_newsurl():    newsurl=[]    url1='http://www.p2p001.com/licai/index/id/147/p/'    num=1    url2='.html'    while num<=22:        url=url1+str(num)+url2        try:            r1=requests.get(url,headers=headers)                except:            print ('wrong %s' % url)        else:            s1=BeautifulSoup(r1.text,'lxml')            for x in s1.find_all(href=re.compile('licai/shownews')):                newsurl.append(x['href'])            num=num+1            return newsurl#定义函数,得到的数据,以字典形式返回def get_info():    url='http://www.p2p001.com'    date=[]    zonghe=[]    one=[]    one_three=[]    three_six=[]    six_twelve=[]    twelve_most=[]    for y in get_newsurl():        try:            main_url=url+y            r2=requests.get(main_url,headers=headers)        except:                print ('wrong %s' % main_url)        else:            s2=BeautifulSoup(r2.text,'lxml')            date.append(s2.find(text=re.compile('统计日期'))[5:])                       rate=s2.find_all('td')            zonghe.append(rate[10].string)            one.append(rate[11].string)            one_three.append(rate[12].string)            three_six.append(rate[13].string)            six_twelve.append(rate[14].string)            twelve_most.append(rate[15].string)    p={'Date':date,        '综合':zonghe,        '1个月':one,        '1-3个月':one_three,        '3-6个月':three_six,        '6-12个月':six_twelve,        '12个月及以上':twelve_most}    return p#pandas存储数据p=pd.DataFrame(get_info())