python案例

来源:互联网 发布:什么牌子的美工刀片好 编辑:程序博客网 时间:2024/06/03 17:07
一:实验背景: 
结合《python数据分析实战》里面的方法,对山东的十个城市的6月17日的气温进行收集,分析气温和距离海岸线距离(以下简称s)的关系. 
用到的库 
matplotlib 库画出图像 
scikit-learn 库对数据进行回归分析 
numpy 库对数据进行切片 
工具:pycharm 

数据:高密,莱阳等十个地区的气温数据,6月17日当天分时段的温度 


二 单城市温度可视化 
我们选择城市莱西,使用pandas对其数据进行加工整理,使用matplot进行可视化展示,并且保存svg图片

import  pandas as pdimport  datetimeimport  matplotlib.pyplot as pltimport  matplotlib.dates as mdatesfrom dateutil import  parserdf_laixi = pd.read_csv('E:/wea/WeatherData/laixi.csv')y1=df_laixi['temp']x1=df_laixi['day']dayoflaixi=[ parser.parse(x) for x in x1 ]fig,ax=plt.subplots()plt.xticks(rotation=70)hours=mdates.DateFormatter('%H:%M')ax.xaxis.set_major_formatter(hours)ax.plot(dayoflaixi,y1,'r')plt.show()plt.savefig('E:/wea/WeatherData/laixi.svg')
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

莱西的温度日走势图 


在下午两点到六点之间,出现最高气温tm 
三 判断当日最高气温tm和距离s是否存在显性关系 
我们选取三个最近的城市,威海,牟平,烟台和三个最远的城市,莱西,平度,高密.分析这两组城市的最高气温走势.

#!/usr/bin/env python# encoding: utf-8import  pandas as pdimport  datetimeimport  matplotlib.pyplot as pltimport  matplotlib.dates as mdatesfrom dateutil import  parser#读入文件df_weihai = pd.read_csv('E:/wea/WeatherData/weihai.csv')df_mouping = pd.read_csv('E:/wea/WeatherData/mouping.csv')df_yantai= pd.read_csv('E:/wea/WeatherData/yantai.csv')df_pingdu = pd.read_csv('E:/wea/WeatherData/pingdu.csv')df_gaomi = pd.read_csv('E:/wea/WeatherData/gaomi.csv')df_laixi = pd.read_csv('E:/wea/WeatherData/laixi.csv')#y,x轴读取数据y1 = df_weihai['temp']x1 = df_weihai['day']y2 = df_mouping['temp']x2 = df_mouping['day']y3 = df_yantai['temp']x3 = df_yantai['day']y4 = df_laixi['temp']x4 = df_laixi['day']y5 = df_pingdu['temp']x5 = df_pingdu['day']y6 = df_gaomi['temp']x6 = df_gaomi['day']#把日期从string转化成datetimeday_weihai = [parser.parse(x) for x in x1]day_mouping = [parser.parse(x) for x in x2]day_yantai = [parser.parse(x) for x in x3]dat_laixi= [parser.parse(x) for x in x4]day_pingdu = [parser.parse(x) for x in x5]day_gaomi = [parser.parse(x) for x in x6]#绘图,调用subplots()定义fig,ax变量fig,ax=plt.subplots()plt.xticks(rotation=70)hours=mdates.DateFormatter('%H:%M')ax.xaxis.set_major_formatter(hours)#绘图,在一个面板里面画六条线ax.plot(day_weihai,y1,'r',day_mouping,y2,'r',day_yantai,y3,'r')ax.plot(dat_laixi,y4,'g',day_pingdu,y5,'g',day_gaomi,y6,'g')figplt.show()

我们得到了阳性的结论,三个最远城市的最高气温明显高于三个距离海岸最近城市的最高气温.说明tm和s存在显性的关系. 
四 最高气温和距离s的定量描述 
选取十个城市的最高气温,绘制tm/s散点图

#!/usr/bin/env python# encoding: utf-8import  numpy as npimport  pandas as pdimport  datetimeimport  matplotlib.pyplot as pltimport  matplotlib.dates as mdatesfrom dateutil import  parserdf_gaomi = pd.read_csv('E:/wea/WeatherData/gaomi.csv')df_laixi = pd.read_csv('E:/wea/WeatherData/laixi.csv')df_laiyang = pd.read_csv('E:/wea/WeatherData/laiyang.csv')df_longkou = pd.read_csv('E:/wea/WeatherData/longkou.csv')df_mouping = pd.read_csv('E:/wea/WeatherData/mouping.csv')df_pingdu = pd.read_csv('E:/wea/WeatherData/pingdu.csv')df_qixia= pd.read_csv('E:/wea/WeatherData/qixia.csv')df_weihai = pd.read_csv('E:/wea/WeatherData/weihai.csv')df_yantai= pd.read_csv('E:/wea/WeatherData/yantai.csv')df_zhaoyuan = pd.read_csv('E:/wea/WeatherData/zhaoyuan.csv')#dist列表,读取城市的s距离dist = [df_weihai['dist'][0],        df_yantai['dist'][0],        df_mouping['dist'][0],        df_zhaoyuan['dist'][0],        df_longkou['dist'][0],        df_qixia['dist'][0],        df_laiyang['dist'][0],        df_laixi['dist'][0],        df_pingdu['dist'][0],        df_gaomi['dist'][0]]#temp_max列表,,存放每个城市的最高气温temp_max = [df_weihai['temp'].max(),            df_yantai['temp'].max(),            df_mouping['temp'].max(),            df_zhaoyuan['temp'].max(),            df_longkou['temp'].max(),            df_qixia['temp'].max(),            df_laiyang['temp'].max(),            df_laixi['temp'].max(),            df_pingdu['temp'].max(),            df_gaomi['temp'].max()]#temp_min 存放每个城市的最低气温temp_min = [df_weihai['temp'].min(),            df_yantai['temp'].min(),            df_mouping['temp'].min(),            df_zhaoyuan['temp'].min(),            df_longkou['temp'].min(),            df_qixia['temp'].min(),            df_laiyang['temp'].min(),            df_laixi['temp'].min(),            df_pingdu['temp'].min(),            df_gaomi['temp'].min()]#绘制最高气温/s关系图fig,ax =plt.subplots()ax.plot(dist,temp_max,'ro')figplt.show()

由散点图,我们看到在100km以内,tm和距离s存在近似的线性关系,在100km以后,关系发生改变 
五 scikit回归分析 
由上面的分析,我们假定两个线性相关,用scikit_learn来模拟两条线的走势.

#新建两个列表 dist1靠近海,dist2远离海洋dist1=dist[0:5]dist2=dist[5:10]dist1=[[x] for x in dist1]dist2=[[x] for x in dist2]temp_m1=temp_max[0:5]temp_m2=temp_max[5:10]#调用svr函数,在参数中规定linear线性拟合,c是拟合度svr_lin1=SVR(kernel='linear',C=1e3)svr_lin2=SVR(kernel='linear',C=1e3)svr_lin1.fit(dist1,temp_m1)svr_lin2.fit(dist2,temp_m2)xp1=np.arange(10,100,10).reshape((9,1))xp2=np.arange(50,400,50).reshape((7,1))yp1=svr_lin1.predict(xp1)yp2=svr_lin2.predict(xp2)fig,ax=plt.subplots()ax.set_xlim(0,400)ax.plot(xp1,yp1,c='b',label='strong sea weather')ax.plot(xp2,yp2,c='g',label='low sea weather')figplt.show()

拟合后的曲线,看到在50公里附近,两个线出现了交叉,说明海洋气候对最高气温的影响,在50公里附近. 
我们用y=ax+b来描述两条直线

print  svr_lin1.coef_ #斜率print svr_lin1.intercept_ #截距print  svr_lin2.coef_ #斜率print  svr_lin2.intercept_ #截距
  • 1
  • 2
  • 3
  • 4

输出结果: 
结论: 
在山东半岛,距离海岸线50公里以内,当地最高气温tm受海洋气候影响,和距离海岸线的距离s(km)近似满足: 
tm=0.04794118s+27.65617647 
距离50公里以后,近似满足: 
tm=0.00401274s+29.98745223



原创粉丝点击