用Python处理"大"XLS文件
来源:互联网 发布:百成玻璃优化排版软件 编辑:程序博客网 时间:2024/05/21 10:57
权当学习Python练手用的.
数据来data.gov.uk,大小有58.4MB
文件都是些什么内容?
- ’Accident_Index’,
- ‘Location_Easting_OSGR’,
- ‘Location_Northing_OSGR’,
- ‘Longitude’,
- ‘Latitude’,
- ‘Police_Force’,
- ‘Accident_Severity’,
- ‘Number_of_Vehicles’,
- ‘Number_of_Casualties’,
- ‘Date’,
- ‘Day_of_Week’,
- ‘Time’,
- ‘Local_Authority_(District)’,
- ‘Local_Authority_(Highway)’,
- ‘1st_Road_Class’, ‘1st_Road_Number’,
- ‘Road_Type’,
- ‘Speed_limit’,
- ‘Junction_Detail’,
- ‘Junction_Control’,
- ‘2nd_Road_Class’,
- ‘2nd_Road_Number’,
- ‘Pedestrian_Crossing-Human_Control’,
- ‘Pedestrian_Crossing_Physical_Facilities’,
- ’Light_Conditions’,
- ‘Weather_Conditions’,
- ‘Road_Surface_Conditions’,
- ‘Special_Conditions_at_Site’,
- ‘Carriageway_Hazards’,
- ‘Urban_or_Rural_Area’,
- ‘Did_Police_Officer_Attend_Scene_of_Accident’,
- ‘LSOA_of_Accident_Location’
LowMemory 方式读取文件
#read the filefiledir='/home/derek/Desktop/python-data-analyis/large-excel-files/Accidents_2013.csv'data = pd.read_csv(filedir,low_memory=False)print data.ix[:10]['Day_of_Week']
- SQL likes 提取数据信息
print 'Accidents'print '----------'#选择星期日发生的事故accidents_sunday = data[data.Day_of_Week==1]print 'Accidents which happended on a Sunday: ',len(accidents_sunday)#选择星期日发生的且涉事人数在十人以上的事故accidents_sunday_twenty_cars = data[(data.Day_of_Week==1) & (data.Number_of_Vehicles>10)]print'Accidents which happened on a Sunday involving > 10 cars: ' , len(accidents_sunday_twenty_cars)#选择星期日发生的且涉事人数在十人以上且天气情况是下雨的事故(2对应的是无风下雨)accidents_sunday_twenty_cars_rain = data[(data.Day_of_Week==1) & (data.Number_of_Vehicles>10) & (data.Weather_Conditions==2)]print'Accidents which happened on a Sunday involving > 10 cars with rainning: ' , len(accidents_sunday_twenty_cars_rain)#选择在伦敦的星期日发生的事故london_data = data[(data['Police_Force'] == 1) & (data.Day_of_Week==1)]print 'Accidents in London on a Sunday',len(london_data)#选择在2000年的伦敦的星期日发生的事故london_data_2000 = london_data[((pd.to_datetime('2000-1-1', errors='coerce')) > (pd.to_datetime(london_data['Date'],errors='coerce'))) & (pd.to_datetime(london_data['Date'],errors='coerce') < (pd.to_datetime('2000-12-31', errors='coerce')))]print 'Accidents in London on a Sunday in 2000:',len(london_data_2000)
给人的感觉是特别像SQL语句,DataFrame的这种切片,方式特别好用,对不对?
pd.to_datetime(london_data['Date'],errors='coerce')
这里是日期转换函数.
输出:
Accidents----------Accidents which happended on a Sunday: 14854Accidents which happened on a Sunday involving > 10 cars: 1Accidents which happened on a Sunday involving > 10 cars with rainning: 1Accidents in London on a Sunday 2374Accidents in London on a Sunday in 2000: 0
- 将部分DataFrame数据以XLSX文件存储下来
确保你安装了XlsxWriter
sudo pip install XlsxWriter
writer = pd.ExcelWriter('london_data.xlsx', engine='xlsxwriter')london_data.to_excel(writer, 'sheet1')writer.save()writer.close()
- 块读取,分析一个星期中那一天最有出事故的概率最大
代码.2013,2014,2015三年的事故记录,在’Accidents_2013.csv’,’Accidents_2014.csv’, ‘Accidents_2015.csv’这三个文件中
import pandas as pdfrom pandas import Seriesimport matplotlib.pyplot as plt#read the filedir='/home/derek/Desktop/python-data-analyis/large-excel-files/'filedir=['Accidents_2013.csv','Accidents_2014.csv', 'Accidents_2015.csv']tot = Series([])for i in range(3): #块读取文件, 每次读1000条记录 data = pd.read_csv(dir + filedir[i],chunksize=1000) for piece in data: tot = tot.add(piece['Day_of_Week'].value_counts(), fill_value=0)day_index = ['Sun', 'Mon', 'Tues', 'Wed', 'Thur', 'Fri', 'Sat']print 'data like:'#tot = tot.sort_values(ascending=False)print tot#重新构造一个Series,是为了给索引命名new_Series = Series(tot.values, index=day_index)new_Series.plot()plt.show()plt.close()
控制台输出:
data like:1 460522 609563 650064 640395 644456 693787 55162dtype: float64
图:
三年记录在案的有425038条记录.
结论: 看来,英国人在工作日出行要比在休息日造成更多的事故.星期五的出行造成的事故最多,或许,星期五急着回家,哈哈.相比起来,星期五不适合外出.
参考文章来源
文件没有提供,是因为:读者可以自己去下载,可能找到更想更好用Python分析的数据.
0 0
- 用Python处理"大"XLS文件
- python 处理xls
- python 处理xls
- Python处理大文件
- Python处理大文件
- Python处理大文件
- python读取xls文件
- python读取xls文件
- python 操作xls文件
- python读取xls文件
- 使用python读取xls文件
- python读取excel(Python处理xls)
- python读取excel(Python处理xls)
- 使用python对xls进行处理
- xls文件感染宏病毒的处理方法
- python处理xml大文件[xml.sax]
- 用python处理Excel文档(1)——用xlrd模块读取xls/xlsx文档
- 用python处理Excel文档(2)——用xlsxwriter模块写xls/xlsx文档
- 模拟点击事件动态增删元素
- poj 2109
- 操作系统原理:链接与ELF文件
- 1045. 快速排序(25)
- 概率图模型学习(3)——马尔科夫网表示1
- 用Python处理"大"XLS文件
- canvas简单API
- MAC下安装与配置MySQL
- Collection中的基本方法
- 数据挖掘-商品购买记录-商品推荐_亲和度分析
- apache .htaccess文件的语法
- Audio系列之音量键
- z-index什么意思
- Intel Galileo Debian Image Prequits_0