【kaggle】Celebrity Death
来源:互联网 发布:宁波淘宝美工培训 编辑:程序博客网 时间:2024/06/09 17:08
数据集:celebrity_deaths_2016.csv
https://www.kaggle.com/hugodarwood/celebrity-deaths
读取数据集:
# -*-coding:utf-8-*-import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom matplotlib.gridspec import GridSpec##################### Does the number of celebrities death is highest in 2016 ?# Is there something interesting in the number of deaths by month?# Does most of the celebrity die during their young age or old age ?# What would be the main causes of death?# What would be the main causes of death for each age category?####################death = pd.read_csv("../DataSet/celebrity_deaths_2016.csv")print death.head()# Q1:Does the number of celebrities death is highest in 2016 ?death_by_year = death.groupby('death_year')['name'].count()plt.figure()death_by_year.plot(kind='bar') # 柱状图plt.title('Number of deaths every year')plt.show()########################################### Q2:Is there something interesting in the number of deaths by month?death_by_month = death.groupby('death_month')['name'].count().sort_values()plt.figure()death_by_month.plot(kind='line') # 折线图plt.title('Number of deaths every month')plt.show()########################################### Q3:Does most of the celebrity die during their young age or old age ?fig = plt.figure()ax = fig.add_subplot(111)ax.boxplot(death['age']) # 箱线图plt.show()########################################### Q4:What would be the main causes of death?def group_deathcause(cause): mod_cause = '' cause = str(cause) if 'cancer' in cause: mod_cause = 'cancer' elif 'heart' in cause or 'cardiac' in cause: mod_cause = 'heart disease' else: mod_cause = cause return mod_causedeath['cause_of_death'].fillna('', inplace=True)death['cause_of_death'] = death.apply(lambda row: group_deathcause(row['cause_of_death']), axis=1)death_cause = death.groupby('cause_of_death')['name'].count().sort_values(ascending=False)comp = death_cause.ix[1:20]y = death_cause.ix[21:1].sum()comp['others'] = yplt.figure()plt.pie(comp, labels=comp.index, autopct='%1.1f%%', startangle=310) # 饼图plt.tight_layout()plt.axis('equal')plt.title('composition of known cause of death', y=1.08, fontweight='bold')plt.show()# -------------------death['cause_of_death'].fillna('unknown', inplace=True)death_cause = death.groupby('cause_of_death')['name'].count().sort_values(ascending=False)print death_cause.head(20)########################################### Q5:What would be the main causes of death for each age category?def age_categorizer(age): category = "" if (age < 18): category = "child" elif (age < 30): category = "young" elif (age < 60): category = "adult" else: category = "old" return categorydeath["age_category"] = death.apply(lambda row: age_categorizer(row["age"]), axis=1)age_category_rep = death.groupby(["age_category", "cause_of_death"])["name"].count().sort_values(ascending=False)f = plt.figure(figsize=(8, 15))the_grid = GridSpec(4, 1)for cat in [("child", 0, 0), ("young", 1, 0), ("adult", 2, 0), ("old", 3, 0)]: x = age_category_rep[cat[0]][1:10] y = age_category_rep[cat[0]][11:].sum() plt.subplot(the_grid[cat[1], cat[2]], aspect=1) x["others"] = y plt.pie(x, labels=x.index, autopct='%1.1f%%', startangle=10) plt.axis('equal') plt.title(cat[0], y=1.08, fontweight="bold") plt.tight_layout()f.suptitle("Composition of known cause of death for every category", y=1.03)plt.show()
output:
Q1:Does the number of celebrities death is highest in 2016 ?
Q2:Is there something interesting in the number of deaths by month?
Q3:Does most of the celebrity die during their young age or old age ?
Q4:What would be the main causes of death?
Q5:What would be the main causes of death for each age category?
https://www.kaggle.com/veereshelango/d/hugodarwood/celebrity-deaths/celebrity-death-analysis/notebook
by Veereshelango
0 0
- 【kaggle】Celebrity Death
- death
- Kaggle
- KAGGLE
- kaggle
- 名人 (Celebrity)
- Celebrity Problem
- Death Note
- POJ 3062 Celebrity jeopardy
- [POJ]3062 Celebrity jeopardy
- zoj 3124 Celebrity jeopardy
- poj 3062 Celebrity jeopardy
- 名人问题 (Celebrity problem)
- 名人问题 (Celebrity problem)
- POJ3062:Celebrity jeopardy
- 1124 - Celebrity jeopardy
- poj 3062 Celebrity jeopardy
- Problem C: Celebrity Split
- windows shell+共享文件+本地设备名正在使用中无法打开
- 谷歌地图的部分功能使用笔记
- android:TableLayout表格布局详解
- Java登录界面的实现(注册、登录、背景图片)
- C#字符串压缩和解压
- 【kaggle】Celebrity Death
- SpringMVC-数据绑定流程分析
- 安卓Android布局中宽高、权重及其他属性的使用方法
- android中解析text配置文件
- for循环里,先执行完block代码再执行下一次循环
- netstat -an,提示:不是内部或外部命令,也不是可运行的程序或批处理文件。
- 【LeetCode014-015算法/编程练习C++】最长共同前缀,3Sum(和为0) //用到了map的自动排序
- Android 如何代码修改SeekBar进度颜色,不用xml
- 40. springboot + devtools(热部署)【从零开始学Spring Boot】