class Intermediate Python for Data Science
来源:互联网 发布:java不等于符号 编辑:程序博客网 时间:2024/05/30 04:20
通过字典创建Dataframe
- Import
pandas
aspd
. - Use the pre-defined lists to create a dictionary called
my_dict
. There should be three key value pairs:- key
'country'
and valuenames
. - key
'drives_right'
and valuedr
. - key
'cars_per_cap'
and valuecpc
.
- key
- Use
pd.DataFrame()
to turn your dict into a DataFrame calledcars
.
- Print out
cars
and see how beautiful it is.
# Pre-defined lists
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr = [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
# Import pandas as pd
import pandas as pd
# Create dictionary my_dict with three key:value pairs: my_dict
my_dict = {'country':names, 'drives_right':dr, 'cars_per_cap':cpc}
# Build a DataFrame cars from my_dict: cars
cars = pd.DataFrame(my_dict)
# Print cars
print(cars)
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr = [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
# Import pandas as pd
import pandas as pd
# Create dictionary my_dict with three key:value pairs: my_dict
my_dict = {'country':names, 'drives_right':dr, 'cars_per_cap':cpc}
# Build a DataFrame cars from my_dict: cars
cars = pd.DataFrame(my_dict)
# Print cars
print(cars)
改变索引的名称
# Definition of row_labels
row_labels = ['US', 'AUS', 'JAP', 'IN', 'RU', 'MOR', 'EG']
# Specify row labels of cars
cars.index = row_labels
# Print cars again
print(cars)
row_labels = ['US', 'AUS', 'JAP', 'IN', 'RU', 'MOR', 'EG']
# Specify row labels of cars
cars.index = row_labels
# Print cars again
print(cars)
从CSV文件中读取数据
# Import pandas as pd
import pandas as pd
# Import the cars.csv data: cars
cars = pd.read_csv('cars.csv')
# Print out cars
print(cars)
import pandas as pd
# Import the cars.csv data: cars
cars = pd.read_csv('cars.csv')
# Print out cars
print(cars)
在csv文件中读取数据时,防止读入Index列,导致多了uname列:
# Import pandas as pd
import pandas as pd
# Fix import by including index_col
cars = pd.read_csv('cars.csv' , index_col = 0)
# Print out cars
print(cars)
读取Dataframe中的部分数据:
- Use single square brackets to print out the
country
column ofcars
as a Pandas Series. - Use double square brackets to print out the
country
column ofcars
as a Pandas DataFrame. - Use double square brackets to print out a DataFrame with both the
country
anddrives_right
columns ofcars
, in this order.
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Print out country column as Pandas Series
print(cars['country'])
# Print out country column as Pandas DataFrame
print(cars[['country']])
# Print out DataFrame with country and drives_right columns
print(cars[['country', 'drives_right']])
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Print out country column as Pandas Series
print(cars['country'])
# Print out country column as Pandas DataFrame
print(cars[['country']])
# Print out DataFrame with country and drives_right columns
print(cars[['country', 'drives_right']])
Dataframe数据定位的两种方法:
- Use
loc
oriloc
to select the observation corresponding to Japan as a Series. The label of this row isJAP
, the index is2
. Make sure to print the resulting Series. - Use
loc
oriloc
to select the observations for Australia and Egypt as a DataFrame. You can find out about the labels/indexes of these rows by inspectingcars
in the IPython Shell. Make sure to print the resulting DataFrame.
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Print out observation for Japan
print(cars.loc['JAP'])
print(cars.iloc[2])
# Print out observations for Australia and Egypt
print(cars.loc[['AUS', 'EG']])
print(cars.iloc[[1,6]])
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Print out observation for Japan
print(cars.loc['JAP'])
print(cars.iloc[2])
# Print out observations for Australia and Egypt
print(cars.loc[['AUS', 'EG']])
print(cars.iloc[[1,6]])
- Print out the
drives_right
value of the row corresponding to Morocco (its row label isMOR
) - Print out a sub-DataFrame, containing the observations for Russia and Morocco and the columns
country
anddrives_right
.
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Print out drives_right value of Morocco
print(cars.loc['MOR','drives_right'])
# Print sub-DataFrame
print(cars.loc[['RU','MOR'], ['country', 'drives_right']])
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Print out drives_right value of Morocco
print(cars.loc['MOR','drives_right'])
# Print sub-DataFrame
print(cars.loc[['RU','MOR'], ['country', 'drives_right']])
判断布尔类型的数字:
Generate boolean arrays that answer the following questions:
# Create arrays- Which areas in
my_house
are greater than18.5
or smaller than10
? - Which areas are smaller than
11
in bothmy_house
andyour_house
? Make sure to wrap both commands inprint()
statement, so that you can inspect the output.
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])
# my_house greater than 18.5 or smaller than 10
print(np.logical_or(my_house > 18.5,my_house < 10))
# Both my_house and your_house smaller than 11
print(np.logical_and(my_house < 11 , your_house <11))
使用某列数据的条件筛选dataframe数据:
- Select the
cars_per_cap
column fromcars
as a Pandas Series and store it ascpc
. - Use
cpc
in combination with a comparison operator and500
. You want to end up with a boolean Series that'sTrue
if the corresponding country has acars_per_cap
of more than500
andFalse
otherwise. Store this boolean Series asmany_cars
. - Use
many_cars
to subsetcars
, similar to what you did before. Store the result ascar_maniac
. - Print out
car_maniac
to see if you got it right.
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Create car_maniac: observations that have a cars_per_cap over 500
cpc = cars.cars_per_cap
many_cars = cpc > 500
car_maniac = cars[many_cars]
# Print car_maniac
print(car_maniac)
在np.array格式中进行迭代
- Import the
numpy
package under the local aliasnp
. - Write a
for
loop that iterates over all elements innp_height
and prints out"x inches"
for each element, where x is the value in the array. - Write a
for
loop that visits every element of thenp_baseball
array and prints it out.
# Import numpy as np
import numpy as np
# For loop over np_height(1D)
for each in np_height:
print(str(each) + " inches")
# For loop over np_baseball(2-D)
for each in np.nditer(np_baseball):
print(each)
import numpy as np
# For loop over np_height(1D)
for each in np_height:
print(str(each) + " inches")
# For loop over np_baseball(2-D)
for each in np.nditer(np_baseball):
print(each)
阅读全文
0 0
- class Intermediate Python for Data Science
- Python for data science
- class Python Data Science Toolbox
- Data Science in Python
- 翻译:Getting Started With Python For Data Science
- Should you teach Python or R for data science?
- something useful for data science.
- Using scatter plots for multivariate data —— python data science cookbook
- Python Scripting for Computational Science
- Python Scripting for Computational Science
- 02_R Programming for Data Science
- Data Science完整学习路径Python版
- 7 Steps for Learning Data Mining and Data Science
- Cloudera数据科学平台Cloudera Data Science Workbench: Self-Service Data Science for the Enterprise
- data imputation —— Python Data Science Cookbook
- Data Mining for Global Change: Furthering Science, Knowledge
- Hadoop for Data Science 演讲(Slides & Video)(上)
- RStudio presents Essential Tools for Data Science with R
- C
- LA4256 Salesmen
- Android利用Binder进行通信
- 九度[1009]-二叉搜索树
- Python内置全局变量
- class Intermediate Python for Data Science
- JavaScript 的使用基础总结③
- 编程项目构建工具简介
- JSONKit无法解析Unicode字符\u0000
- 【数字的可视化:python画图之散点图sactter函数详解】
- java重要api个人笔记
- [DB] MySQL UPDATE查询
- D
- [DB] 数据库将一个表中数据插入到另外一张表中