Normalizing&Sorting DataFrame Column
来源:互联网 发布:查看进程占用的端口 编辑:程序博客网 时间:2024/05/29 18:59
Dataset
本实验的目的将高蛋白低脂肪的食物打分,公式如下:
Score=2×(Protein_(g))−0.75×(Lipid_Tot_(g))
- 食品营养表
food_info是个DataFrame对象,food_info.columns得到的是DataFrame的列标签对象(
# 读入数据import pandas as pdfood_info = pd.read_csv("food_info.csv")cols = food_info.columns.tolist()
Transforming A Column
- 为pandas可以对数值型数据做任何算术运算
div_100 = food_info["Iron_(mg)"] / 1000add_100 = food_info["Iron_(mg)"] + 100sub_100 = food_info["Iron_(mg)"] - 100mult_2 = food_info["Iron_(mg)"]*2sodium_grams = food_info["Sodium_(mg)"] / 1000sugar_milligrams = food_info["Sugar_Tot_(g)"] * 1000
- 不仅可以做算术运算来修改列值,还可以进行列之间的运算
water_energy = food_info["Water_(g)"] * food_info["Energ_Kcal"]grams_of_protein_per_gram_of_water = food_info["Protein_(g)"] / food_info["Water_(g)"]milligrams_of_calcium_and_iron = food_info["Calcium_(mg)"] + food_info["Iron_(mg)"]
Nutritional Index
现在开始利用上面那个公式来计算每个food的score:Score=2×(Protein_(g))−0.75×(Lipid_Tot_(g))
weighted_protein = food_info["Protein_(g)"] * 2weighted_fat = -0.75 * food_info["Lipid_Tot_(g)"]initial_rating = weighted_protein + weighted_fat
Normalizing Columns
由于每列的属性不同,单位不同,取值范围也有很大差异,在进行某些运算时,如果直接使用原始值将带来一些偏差,比如”Vit_A_IU”这个属性的取值范围较大(0~100000),因此该列的值在计算时影响力要远远大于”Fiber_TD_(g)”(取值范围:0~79),因此需要对数据进行规范化。
- 此处我们进行规范化的方法是,将某列所有的值除以该列最大值
max_protein = food_info["Protein_(g)"].max()normalized_protein = food_info["Protein_(g)"] / max_proteinnormalized_fat = food_info["Lipid_Tot_(g)"] / food_info["Lipid_Tot_(g)"].max()
Creating A New Column
- 前面将修改的列数据(Series )都赋给了一个变量,实际上也可以直接添加到DataFrame对象中,添加的方式如下(此时该数据多了两列,原来的两列依旧存在):
ormalized_protein = food_info["Protein_(g)"] / food_info["Protein_(g)"].max()normalized_fat = food_info["Lipid_Tot_(g)"] / food_info["Lipid_Tot_(g)"].max()food_info["Normalized_Protein"] = normalized_proteinfood_info["Normalized_Fat"] = normalized_fat
Normalized Nutritional Index
因此现在在公式中用于计算的就不是原始数据,而是规范化的数据:
food_info["Normalized_Protein"] = food_info["Protein_(g)"] / food_info["Protein_(g)"].max()food_info["Normalized_Fat"] = food_info["Lipid_Tot_(g)"] / food_info["Lipid_Tot_(g)"].max()food_info["Norm_Nutr_Index"] = 2*food_info["Normalized_Protein"] + (-0.75*food_info["Normalized_Fat"])
Sorting A DataFrame By A Column
原始数据是由NDB_No行号进行索引的,这个是唯一标示的index.DataFrame有一个sort()函数可以对它的列数据进行排序(默认是升序),返回一个新的DataFrame变量。
food_info["Normalized_Protein"] = food_info["Protein_(g)"] / food_info["Protein_(g)"].max()food_info["Normalized_Fat"] = food_info["Lipid_Tot_(g)"] / food_info["Lipid_Tot_(g)"].max()food_info["Norm_Nutr_Index"] = 2*food_info["Normalized_Protein"] + (-0.75*food_info["Lipid_Tot_(g)"])food_info.sort("Norm_Nutr_Index", inplace=True, ascending=False)
0 0
- Normalizing&Sorting DataFrame Column
- jmesa - column sorting
- 编程珠玑 column 11 sorting
- 编程珠玑总结—column 11 Sorting
- 2017.06.05回顾 dataframe找到喊缺失值的column list组成dataframe
- How to disable datagridview column header to alllow sorting
- RadGridView多列排序(Multiple Column Sorting)
- normalizing the database.
- Self-Normalizing Neural Networkslf
- 【跟着stackoverflow学Pandas】Delete column from pandas DataFrame-删除列
- 【跟着stackoverflow学Pandas】 -Get list from pandas DataFrame column headers
- Sorting
- Sorting:
- Sorting
- Sorting
- Sorting
- Sorting
- Sorting
- 二分查找
- new/delete和malloc/free的区别
- hdoj 5667 Sequence 【矩阵快速幂 + 费马小定理】
- OSG学习笔记3-使用回调实现旋转动画
- thinkphp 框架下 事务处理+琐行 亲自测试 可用
- Normalizing&Sorting DataFrame Column
- JAVA代理模式
- Eclipse去除js(JavaScript)验证错误
- 关于vs2010调用python中Py_Initialize函数报错的原因
- Synaptics触摸板驱动以及安装步骤
- Java 字符串全面解析
- java代码调用.bat文件
- 快启动制作U盘启动盘详细教程
- json解析入门(jsoncpp库)