pandas exercise one
来源:互联网 发布:windows xp精简版 编辑:程序博客网 时间:2024/06/05 18:38
pandas基础应用 入门
# -*- coding: utf-8 -*-import jsonimport pprintfrom collections import defaultdict,Counterimport pandas as pdimport numpy as npimport pylab as plfrom pandas import DataFrame,Series #DataFrame将数据表示为一个列表def get_counts(sequence): counts=defaultdict(int) #所有的值都会被初始化为0 for x in sequence: counts[x]+=1 return countsdef top_counts(count_dict,n=10): # value_key_pairs=[(count,tz) for tz,count in count_dict.items()] #列表排序,默认以第一个单位 # value_key_pairs.sort(key=lambda x:x[0]) # return value_key_pairs[-n:] counts=Counter(count_dict) top10=counts.most_common(10) return top10if __name__ == '__main__': path = 'datas/usagov_bitly_data2012-03-16-1331923249.txt' records = [json.loads(line) for line in open(path)] # time_zones = [rec['tz'] for rec in records if 'tz' in rec] # count_dict=get_counts(time_zones) # tz_top10=top_counts(count_dict) frame=DataFrame(records) clean_tz=frame['tz'].fillna('Missing') #fillna函数可以替换缺失值(NA),而未知值(空字符串)则可以通过布尔型数组索引加以替换 clean_tz[clean_tz=='']='Unknown' tz_counts = clean_tz.value_counts() # tz_counts[:10].plot(kind='barh',rot=0) results= Series([x.split()[0] for x in frame.a.dropna()]) results.value_counts() cframe=frame[frame.a.notnull()] #将agent缺失的数据剔除 operating_system=np.where(cframe['a'].str.contains('Windows'),'Windows','Not Windows') #计算各行中是否有‘Windows by_tz_os=cframe.groupby(['tz',operating_system]) agg_counts=by_tz_os.size().unstack().fillna(0) #size 计数,unstack 对计数结果进行重塑 indexer=agg_counts.sum(1).argsort() #根据agg_counts中的行数构造一个间接索引数组 count_subset=agg_counts.take(indexer)[-10:] # count_subset.plot(kind='barh',stacked=True) print(count_subset)
阅读全文
0 0
- pandas exercise one
- pandas tips one
- Exercise One in Carnegie Course SSD Six
- pandas使用get_dummies进行one-hot编码
- pandas使用get_dummies进行one-hot编码
- pandas使用get_dummies进行one-hot编码
- Exercise
- exercise
- Exercise
- exercise
- 【跟着stackoverflow学Pandas】add one row in a pandas.DataFrame -DataFrame添加行
- Machine Learning week 4 programming exercise One vs All and Neural network
- 机器学习(Machine Learning)心得体会(3)逻辑回归&神经网络Exercise 3:One-vs-all & Neural Networks
- The C Programming Language——Exercise solutions of the chapter one (1st)
- pandas
- pandas
- Pandas
- pandas
- pxe安装KS文件--实现优化IP分配
- c3p0的配置方式
- XML问题积累(1)
- Py-faster-rcnn配置模型开发环境
- python loggin模块
- pandas exercise one
- 动态规划训练专题
- SVN总结(1)
- Spring 中 java 获取方法参数名称
- 关于scrapy爬虫框架
- Linux 引导过程内幕
- CodeVS1961躲避大龙 题解【图论】【SPFA】【搜索】
- SVN问题积累(1)
- LeetCode40 Combination Sum II