pandas module 1 - 0.简介

来源：互联网发布：淘宝助理一键复制宝贝编辑：程序博客网时间：2024/05/29 15:49

0. Overview

1. pandas consists of the following things:

A set of labeled array data structures, the primary of which areSeries and DataFrame

Index objects enabling both simple axis indexing and multi-level / hierarchical axis indexing

An integrated group by engine for aggregating and transforming data sets

Date range generation (date_range) and custom date offsets enabling the implementation of customized frequencies

Input/Output tools: loading tabular data from flat files (CSV, delimited, Excel 2003), and saving and loading pandas objects from the fast and efficient PyTables/HDF5 format.

Memory-efficient “sparse” versions of the standard data structures for storing data that is mostly missing or mostly constant (some fixed value)

Moving window statistics (rolling mean, rolling standard deviation, etc.)

Static and moving window linear and panel regression

2. Data structures

DimensionsNameDescription1Series1D labeled homogeneously-typed(同类型) array2DataFrameGeneral 2D labeled, size-mutable tabular structure with potentially heterogeneously-typed(不同类型) columns3PanelGeneral 3D labeled, also size-mutable arrayDataFrame is a container for Series, and Panel is a container for DataFrame objects.

We can insert and remove objects from these containers in a dictionary-like fashion.

With tabular(列表) data (DataFrame) it is more helpful to think of the index (the rows) and the columns rather than axis 0 and axis 1(readable code).

All pandas data structures are value-mutable(值可变) but not always size-mutable(尺寸不可变).

1. Object Creation

1. Creating a Series by passing a list of values, letting pandas create a default integer index:

2. Creating a DataFrame by passing a numpy array, with a datetime index andlabeled columns:

3. Creating a DataFrame by passing a dict of objects:

2. Viewing Data

1. See the top & bottom rows of the frame:

2. Display the index, columns, and the underlying numpy data:

3. operate the data:

3. Selection

the methods like these : .at,.iat, .loc, .iloc and .ix.

(1) slecting

(2) using label

（3） by position

（4）boolean indexing

（5）setting values

4. Missing Data

（1）pandas primarily uses the value np.nan to represent missing data.

（2）drop any rows that have missing data

（3）Filling missing data

（4）get the boolean mask where values are nan

5. Operations

Operations in general exclude missing data.

（1）Stats（状态信息）

（2）Apply ：Apply functions to the data.

（3）Histogramming(直方图)

（4）String Methods

Series is equipped with a set of string processing methods in the strattribute.

6. Merge

1. Concat

（2） Join like SQL.

（3）Append rows

7. Grouping

following these steps:

Splitting the data into groups based on some criteria
Applying a function to each group independently
Combining the results into a data structure