Pandas学习笔记
来源:互联网 发布:js获取flv的帧图片 编辑:程序博客网 时间:2024/05/22 06:19
Pandas是python的一个数据分析包,最初由AQR Capital Management于2008年4月开发,并于2009年底开源出来,目前由专注于Python数据包开发的PyData开发team继续开发和维护,属于PyData项目的一部分。Pandas最初被作为金融数据分析工具而开发出来,因此,pandas为时间序列分析提供了很好的支持。 Pandas的名称来自于面板数据(panel data)和python数据分析(data analysis)。panel data是经济学中关于多维数据集的一个术语,在Pandas中也提供了panel的数据类型。
In [1]: import pandasIn [2]: food_info = pandas.read_csv("food_info.csv")//读取本地csv文件 print type(food_info)//打印数据类型print "-------------------------" print food_info.dtypes//数据中的类型集合 print help(pandas.read_csv)//打印帮助文档Out[2]: <class 'pandas.core.frame.DataFrame'>//pandas的主要数据类型 ------------------------- NDB_No int64 Shrt_Desc object//相当于string Water_(g) float64 Energ_Kcal int64 Protein_(g) float64In [3]: food_info.head()//读取数据的前5行 food_info.head(3)//读取前3行food_info.tail(4)//读取尾4行food_info.columns//获取所有的列名food_info.shape//查看数据的规模(m行n列)Out[3]: NDB_NoShrt_DescWater_(g)Energ_KcalProtein_(g)Lipid_Tot_(g)Ash_(g)Carbohydrt_(g)Fiber_TD_(g)Sugar_Tot_(g)...Vit_A_IUVit_A_RAEVit_E_(mg)Vit_D_mcgVit_D_IUVit_K_(mcg)FA_Sat_(g)FA_Mono_(g)FA_Poly_(g)Cholestrl_(mg) 01001BUTTER WITH SALT15.877170.8581.112.110.060.00.06...2499.0684.02.321.560.07.051.36821.0213.043215.0 11002BUTTER WHIPPED WITH SALT15.877170.8581.112.110.060.00.06...2499.0684.02.321.560.07.050.48923.4263.012219.0 21003BUTTER OIL ANHYDROUS0.248760.2899.480.000.000.00.00...3069.0840.02.801.873.08.661.92428.7323.694256.0 31004CHEESE BLUE42.4135321.4028.745.112.340.00.50...721.0198.00.250.521.02.418.6697.7780.80075.0 41005CHEESE BRICK41.1137123.2429.683.182.790.00.51...1080.0292.00.260.522.02.518.7648.5980.78494.0 5 rows × 36 columnsIn [4]: food_info.loc[0]//读取第1行数据 food_info.loc[3:6]//读取3,4,5,6行ndb_col = food_info["NDB_No"]//读取"NDB_No"列columns = ["NDB_No","Shrt_Desc"]N_S = food_info[columns]//读取"NDB_No"和"Shrt_Desc"列In [5]: columns = food_info.columns.tolist()//获取列名列表 print columns gram_columns = [] for c in columns: if c.endswith("(g)")://找出以(g)结尾的列 gram_columns.append(c) gram_df = food_info[gram_columns] print gram_dfOut[5]: ['NDB_No', 'Shrt_Desc', 'Water_(g)', 'Energ_Kcal', 'Protein_(g)', 'Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)', 'Sugar_Tot_(g)', 'Calcium_(mg)', 'Iron_(mg)', 'Magnesium_(mg)', 'Phosphorus_(mg)', 'Potassium_(mg)', 'Sodium_(mg)', 'Zinc_(mg)', 'Copper_(mg)', 'Manganese_(mg)', 'Selenium_(mcg)', 'Vit_C_(mg)', 'Thiamin_(mg)', 'Riboflavin_(mg)', 'Niacin_(mg)', 'Vit_B6_(mg)', 'Vit_B12_(mcg)', 'Vit_A_IU', 'Vit_A_RAE', 'Vit_E_(mg)', 'Vit_D_mcg', 'Vit_D_IU', 'Vit_K_(mcg)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)', 'Cholestrl_(mg)'] Water_(g) Protein_(g) Lipid_Tot_(g) Ash_(g) Carbohydrt_(g) \ 0 15.87 0.85 81.11 2.11 0.06 Fiber_TD_(g) Sugar_Tot_(g) FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) 0 0.0 0.06 51.368 21.021 3.043 In [6]: food_info.sort_values("Water_(g)",inplace=True)//默认升序排序,inplace表示是否重新在原来数据上排序 print food_info["Water_(g)"] food_info.sort_values("Water_(g)",inplace=True,ascending=False)//降序排序 print "-------------------------"print food_info["Water_(g)"]Out[6]:676 0.00 743 0.00 744 0.00 745 0.00 758 0.00 761 0.00-------------------------4209 100.00 4377 100.00 4378 100.00 4376 100.00 4348 100.00 4404 99.98 4372 99.98In [7]: import pandas as pd import numpy as np titanic_survival = pd.read_csv("titanic_train.csv")//泰坦尼克号船员获救信息 titanic_survival.head()Out[7]:PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked 0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS 1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C 2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS 3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S 4503Allen, Mr. William Henrymale35.0003734508.0500NaNSIn [8]: age = titanic_survival["Age"] age.head(10) age_is_null = pd.isnull(age) print age_is_null//打印是否为null值print "-------------------------" age_null_true = age[age_is_null] print age_null_true//打印null值print "-------------------------" age_null_count = len(age_null_true) print age_null_count//统计null值个数Out[8]:0 False 1 False 2 False 3 False 4 False 5 True 6 False ------------------------- 5 NaN 17 NaN 19 NaN 26 NaN 28 NaN 29 NaN 31 NaN ------------------------- 177In [9]: good_ages = titanic_survival["Age"][age_is_null == False]//去除null后求平均年龄 mean_age = sum(good_ages)/len(good_ages) print mean_ageOut[9]:29.6991176471In [10]: correct_mean_age = titanic_survival["Age"].mean()//直接求平均年龄 print correct_mean_ageOut[10]: 29.6991176471//pivot_table数据透视表函数,统计每个等级仓位的获救概率In [11]: passenger_survival = titanic_survival.pivot_table(index="Pclass",values="Survived",aggfunc=np.mean) print passenger_survivalOut[11]: Survived Pclass 1 0.629630 2 0.472826 3 0.242363//统计每个等级仓位人的平均年龄In [12]: passenger_mean_age = titanic_survival.pivot_table(index="Pclass",values="Age") print passenger_mean_ageOut[12]: Age Pclass 1 38.233441 2 29.877630 3 25.140620//求出变量"Embarked"和"Fare","Survived"之间的关系求和In [13]: port_stats = titanic_survival.pivot_table(index="Embarked",values=["Fare","Survived"],aggfunc=np.sum) print port_statsOut[13]: Fare Survived Embarked C 10072.2962 93 Q 1022.2543 30 S 17439.3988 217In [14]: drop_na_columns = titanic_survival.dropna(axis=1)//去掉有空值的列 print drop_na_columns new_titanic_survival = titanic_survival.dropna(axis=0,subset=["Age","Sex"])//去掉"Age","Sex"列有空值的行 print new_titanic_survivalOut[14]: 883 C.A./SOTON 34068 10.5000 884 SOTON/OQ 392076 7.0500 885 382652 29.1250 886 211536 13.0000 887 112053 30.0000 888 W./C. 6607 23.4500 889 111369 30.0000 890 370376 7.7500 [891 rows x 9 columns] 885 5 382652 29.1250 NaN Q 886 0 211536 13.0000 NaN S 887 0 112053 30.0000 B42 S 889 0 111369 30.0000 C148 C 890 0 370376 7.7500 NaN Q [714 rows x 12 columns]
阅读全文
0 0
- Pandas学习笔记:pandas基础
- pandas 学习笔记
- pandas学习笔记
- [pandas] 数据类型学习笔记
- pandas学习笔记
- pandas学习笔记
- pandas学习笔记
- Pandas学习笔记
- pandas学习笔记
- pandas numpy学习笔记
- pandas学习笔记-Series
- Pandas学习笔记
- python pandas学习笔记
- pandas学习笔记
- pandas学习笔记
- Pandas学习笔记
- pandas学习笔记
- Pandas学习笔记
- Vim折叠设置
- http://geek.csdn.net/news/detail/240577
- Android学习-ViewPager实现导航
- 51nod 1287加农炮【线段树*好题】
- 强大的矩阵奇异值分解(SVD)及其应用
- Pandas学习笔记
- http://geek.csdn.net/news/detail/240577
- 怎么在CentOs上安装Tomcat?
- 文章标题
- 构建用户管理微服务
- SSDT Hook实现内核级的进程保护
- LeetCode中Reverse Integer
- angular学习总结八-请求service封装
- 条款21:必须返回对象时,别妄想返回其reference