python数据分析学习笔记二

来源：互联网发布：relief算法 python 编辑：程序博客网时间：2024/04/18 19:11

第二章 Numpy数组

Numpy数组优势

#创建数组

In [16]: a=arange(5)

In [17]: a.dtype

Out[17]: dtype('int32')

In [18]: a

Out[18]: array([0, 1, 2, 3, 4])

#返回一个元组,存放每一个维度的长度

In [19]: a.shape

Out[19]: (5,)

创建多维数组

In [20]: m=array([arange(2),arange(2)])

In [21]: m

Out[21]:

array([[0, 1],

[0, 1]])

In [22]: m.shape

Out[22]: (2, 2)

选择numpy数组元素

In [24]: a=array([[1,2],[3,4]])

In [25]: a

Out[25]:

array([[1, 2],

[3, 4]])

In [26]: a[0,0]

Out[26]: 1

In [27]: a[0,1]

Out[27]: 2

In [28]: a[1,0]

Out[28]: 3

In [29]: a[1,1]

Out[29]: 4

Numpy的数值类型

Bool 布尔

Inti 基于平台的整数

Int8 字节类型

Int16 整型-32768~32767

Int32 整型-2(31)~2(31)-1

Int64 整型-2(63)~2(63)-1

Uint8 无符号整型0-255

Uint16 无符号整型

Uint32 无符号整型

Uint64 无符号整型

Float16 半精度浮点型

Float32 单精度浮点型

Float64 双精度浮点型

Complex64 复数类型

Complex128复数类型

#数据类型字串

In [30]: a.dtype.itemsize

Out[30]: 4

In [31]: a.dtype

Out[31]: dtype('int32')

注:pycharm中,如果运行时,python console中自动运行ipython,可以作如下修改:

File->settings->consloe->取消use ipythonif available的选择

字符码

i 整型

u 无符号整型

f 单精度浮点型

d 双精度浮点型

b 布尔型

D 复数型

S 字符型

U 万国码

V 空类型

In [1]: arange(7,dtype='f')

Out[1]: array([ 0., 1., 2., 3., 4., 5., 6.], dtype=float32)

In [3]: arange(7,dtype='D')

Out[3]: array([ 0.+0.j, 1.+0.j, 2.+0.j, 3.+0.j, 4.+0.j, 5.+0.j, 6.+0.j])

Dtype构造函数

#python自带常规浮点型

In [4]: dtype(float)

Out[4]: dtype('float64')

In [5]: dtype('f')

Out[5]: dtype('float32')

In [6]: dtype('d')

Out[6]: dtype('float64')

In [7]: dtype('f8')

Out[7]: dtype('float64')

#列出所有类型的字符码

In [8]: sctypeDict.keys()

Out[8]:

[0,

10,

11,

12,

13,

14,

15,

16,

17,

18,

19,

20,

21,

'unicode',

23,

'cfloat',

'longfloat',

'Int32',

'Complex64',

'unicode_',

'complex',

'timedelta64',

'uint16',

'c16',

'float32',

'int32',

'D',

'H',

'void',

'unicode0',

'L',

'P',

'half',

'void0',

'd',

'h',

'l',

'p',

22,

'Timedelta64',

'object0',

'b1',

'M8',

'String0',

'float16',

'ulonglong',

'i1',

'uint32',

'?',

'Void0',

'complex64',

'G',

'O',

'UInt8',

'S',

'byte',

'UInt64',

'g',

'float64',

'ushort',

'float_',

'uint',

'object_',

'Float16',

'complex_',

'Unicode0',

'uintp',

'intc',

'csingle',

'datetime64',

'float',

'bool8',

'Bool',

'intp',

'uintc',

'bytes_',

'u8',

'u4',

'int_',

'cdouble',

'u1',

'complex128',

'u2',

'f8',

'Datetime64',

'ubyte',

'm8',

'B',

'uint0',

'F',

'bool_',

'uint8',

'c8',

'Int64',

'Int8',

'Complex32',

'V',

'int8',

'uint64',

'b',

'f',

'double',

'UInt32',

'clongdouble',

'str',

'f2',

'f4',

'int',

'longdouble',

'single',

'string',

'q',

'Int16',

'Float64',

'longcomplex',

'UInt16',

'bool',

'Float32',

'string0',

'longlong',

'i8',

'int16',

'str_',

'I',

'object',

'M',

'i4',

'singlecomplex',

'Q',

'string_',

'U',

'a',

'short',

'e',

'i',

'clongfloat',

'm',

'Object0',

'int64',

'i2',

'int0']

Dtype属性

#取得类型对应的字符码

In [9]: t=dtype('Float64')

In [10]: t.char

Out[10]: 'd'

#类型属性相当于数组对象的类型

In [11]: t.type

Out[11]: numpy.float64

#取得数据类型字符串.<表示字节顺序,f表示字符码,8表示每个元素所需字节数

In [12]: t.str

Out[12]: '<f8'

一维数组的切片和索引

In [13]: a=arange(9)

#3-7

In [14]: a[3:7]

Out[14]: array([3, 4, 5, 6])

#0-7步长是2

In [15]: a[:7:2]

Out[15]: array([0, 2, 4, 6])

#数组反转

In [16]: a[::-1]

Out[16]: array([8, 7, 6, 5, 4, 3, 2, 1, 0])

处理数组的型状

示例代码如下:

#!/usr/bin/env python# -*- coding: utf-8 -*-# @Time    : 2016/12/7 11:45# @Author  : Retacn# @Site    : 数组形状的调整# @File    : array_reshap.py# @Software: PyCharm__author__ = "retacn"__copyright__ = "property of mankind."__license__ = "CN"__version__ = "0.0.1"__maintainer__ = "retacn"__email__ = "zhenhuayue@sina.com"__status__ = "Development"import numpy as npprint('In:b =arange(24).reshape(2,3,4)')b = np.arange(24).reshape(2, 3, 4)print('In:b')#print(b)## [[[ 0  1  2  3]#  [ 4  5  6  7]#  [ 8  9 10 11]]## [[12 13 14 15]#  [16 17 18 19]#  [20 21 22 23]]]#拆解 将多维数组变成一维数组print('In:b.ravel()')#print(b.ravel())#[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]#拉直 同上print('In:b.flatten()')#print(b.flatten())#[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]#用元数组指定数组形状print('In:b.shape(6,4)')b.shape=(6,4)# print(b)# [[ 0  1  2  3]#  [ 4  5  6  7]#  [ 8  9 10 11]#  [12 13 14 15]#  [16 17 18 19]#  [20 21 22 23]]#转置 行变列,列变行print('In:b.transpose()')#print(b.transpose())# [[ 0  4  8 12 16 20]#  [ 1  5  9 13 17 21]#  [ 2  6 10 14 18 22]#  [ 3  7 11 15 19 23]]#调整大小print('In:b.resize((2,12))')b.resize((2,12))#print(b)# [[ 0  1  2  3  4  5  6  7  8  9 10 11]#  [12 13 14 15 16 17 18 19 20 21 22 23]]

堆叠数组

In [17]: a=arange(9).reshape(3,3)

In [18]: a

Out[18]:

array([[0, 1, 2],

[3, 4, 5],

[6, 7, 8]])

In [19]: b=2*a

In [20]: b

Out[20]:

array([[ 0, 2, 4],

[ 6, 8, 10],

[12, 14, 16]])

水平叠加

In [21]: hstack((a,b))

Out[21]:

array([[ 0, 1, 2, 0, 2, 4],

[ 3, 4, 5, 6, 8, 10],

[ 6, 7, 8, 12, 14, 16]])

In [22]: concatenate((a,b),axis=1)

Out[22]:

array([[ 0, 1, 2, 0, 2, 4],

[ 3, 4, 5, 6, 8, 10],

[ 6, 7, 8, 12, 14, 16]])

垂直叠加

In [23]: vstack((a,b))

Out[23]:

array([[ 0, 1, 2],

[ 3, 4, 5],

[ 6, 7, 8],

[ 0, 2, 4],

[ 6, 8, 10],

[12, 14, 16]])

In [24]: concatenate((a,b),axis=0)

Out[24]:

array([[ 0, 1, 2],

[ 3, 4, 5],

[ 6, 7, 8],

[ 0, 2, 4],

[ 6, 8, 10],

[12, 14, 16]])

深度叠加

In [25]: dstack((a,b))

Out[25]:

array([[[ 0, 0],

[ 1, 2],

[ 2, 4]],

[[ 3, 6],

[ 4, 8],

[ 5, 10]],

[[ 6, 12],

[ 7, 14],

[ 8, 16]]])

列式堆叠

#一维数组

In [26]: oned=arange(2)

In [27]: oned

Out[27]: array([0, 1])

In [29]: twice_oned=2*oned

In [30]: twice_oned

Out[30]: array([0, 2])

In [31]: column_stack((oned,twice_oned))

Out[31]:

array([[0, 0],

[1, 2]])

#二维数组

In [32]: column_stack((a,b))

Out[32]:

array([[ 0, 1, 2, 0, 2, 4],

[ 3, 4, 5, 6, 8, 10],

[ 6, 7, 8, 12, 14, 16]])

In [33]: column_stack((a,b))==hstack((a,b))

Out[33]:

array([[ True, True, True, True, True, True],

[ True, True, True, True, True, True],

[ True, True, True, True, True, True]], dtype=bool)

行式堆叠

#一维数组

In [34]: row_stack((oned,twice_oned))

Out[34]:

array([[0, 1],

[0, 2]])

#二维数组

In [35]: row_stack((a,b))

Out[35]:

array([[ 0, 1, 2],

[ 3, 4, 5],

[ 6, 7, 8],

[ 0, 2, 4],

[ 6, 8, 10],

[12, 14, 16]])

拆分numpy数组

纵向拆分

In [39]: vsplit(a,3)

Out[39]: [array([[0, 1, 2]]), array([[3, 4,5]]), array([[6, 7, 8]])]

In [41]: split(a,3,axis=0)

Out[41]: [array([[0, 1, 2]]), array([[3, 4,5]]), array([[6, 7, 8]])]

横向拆分

In [36]: a

Out[36]:

array([[0, 1, 2],

[3, 4, 5],

[6, 7, 8]])

In [37]: hsplit(a,3)

Out[37]:

[array([[0],

[3],

[6]]), array([[1],

[4],

[7]]), array([[2],

[5],

[8]])]

In [38]: split(a,3,axis=1)

Out[38]:

[array([[0],

[3],

[6]]), array([[1],

[4],

[7]]), array([[2],

[5],

[8]])]

深度方向拆分

In [42]: c=arange(27).reshape(3,3,3)

In [43]: c

Out[43]:

array([[[ 0, 1, 2],

[ 3, 4, 5],

[ 6, 7, 8]],

[[ 9, 10, 11],

[12, 13, 14],

[15, 16, 17]],

[[18, 19, 20],

[21, 22, 23],

[24, 25, 26]]])

In [44]: dsplit(c,3)

Out[44]:

[array([[[ 0],

[ 3],

[ 6]],

[[ 9],

[12],

[15]],

[[18],

[21],

[24]]]), array([[[ 1],

[ 4],

[ 7]],

[[10],

[13],

[16]],

[[19],

[22],

[25]]]), array([[[ 2],

[ 5],

[ 8]],

[[11],

[14],

[17]],

[[20],

[23],

[26]]])]

Numpy数组的属性

#!/usr/bin/env python# -*- coding: utf-8 -*-# @Time    : 2016/12/7 13:32# @Author  : Retacn# @Site    : 数组的属性# @File    : array_attribute.py# @Software: PyCharm__author__ = "retacn"__copyright__ = "property of mankind."__license__ = "CN"__version__ = "0.0.1"__maintainer__ = "retacn"__email__ = "zhenhuayue@sina.com"__status__ = "Development"import numpy as npb = np.arange(24).reshape(2, 12)print('In:b')# print(b)# [[ 0  1  2  3  4  5  6  7  8  9 10 11]#  [12 13 14 15 16 17 18 19 20 21 22 23]]# 取得数组的维度print('In:b.ndim')# print(b.ndim)# 2# 元素的数量print('In:b.size')# print(b.size)# 24# 各个元素所占用的字节数print('In:b.itemsize')# print(b.itemsize)# 4# 要存取整个数组所需要的字节数print('In:b.nbytes')# print(b.nbytes)# 96print('In:b.size*b.itemsize')# print(b.size * b.itemsize)# 96print('In:b.resize(6,4)')# b.resize(6, 4)# print(b)# [[ 0  1  2  3]#  [ 4  5  6  7]#  [ 8  9 10 11]#  [12 13 14 15]#  [16 17 18 19]#  [20 21 22 23]]# 与transpose()函数相同print('In:b.T')# print(b.T)# [[ 0  4  8 12 16 20]#  [ 1  5  9 13 17 21]#  [ 2  6 10 14 18 22]#  [ 3  7 11 15 19 23]]# 生成一个复数数组print('In:b.=array([1.j+1,2.j+3])')b = np.array([1.j + 1, 2.j + 3])# print(b)# [ 1.+1.j  3.+2.j]# 返回数组的实部print('In:b.real')# print(b.real)# [ 1.  3.]# 数组的虚部print('In:b.imag')# print(b.imag)# [ 1.  2.]print('In:b.dtype')# print(b.dtype)# complex128# 如果数组含有复数,数据类型将自动变为复数类型print('In:b.dtype,str')# print(b.dtype.str)# <c16print('In:b=arange(4).reshape(2,2)')b = np.arange(4).reshape(2, 2)# print(b)# [[0 1]#  [2 3]]# 返回 numpy.flatiterprint('In:f=b.flat')f = b.flat# print(f)# <numpy.flatiter object at 0x029B8438>print('In:for it in f:print(it)')# for it in f:# print(it)# 0# 1# 2# 3# 查询单个元素print('In:b.flat[2]')# print(b.flat[2])# 2# 查询多个元素print('In:b.flat[[1,3]]')print(b.flat[[1, 3]])# [1 3]print('In:b')print(b)# [[0 1]#  [2 3]]# 赋值print('In:b.flat[[1,3]]=1')b.flat[[1, 3]] = 1print('In:b')print(b)# [[0 1]#  [2 1]]

数组的转换

import numpy as npb = np.array([1.j + 1, 2.j + 3])print(b)#[ 1.+1.j  3.+2.j]#numpy数组转换成python列表b.tolist()print(b)#[ 1.+1.j  3.+2.j]#把数组元素转换为指定类型b.astype(int)print(b)#[ 1.+1.j  3.+2.j]#转换为int类型时,虚部将被替换b.astype('complex')print(b)#[ 1.+1.j  3.+2.j]

创建数组的视图和拷贝

from scipy import miscimport matplotlib.pyplot as pltascent= misc.ascent()# 创建一份视图的拷贝acopy = ascent.copy()# 为该数组创建一个视图aview = ascent.view()# 显示图像plt.subplot(221), plt.imshow(ascent)plt.title(ascent), plt.xticks([]), plt.yticks([])plt.subplot(222), plt.imshow(acopy)plt.title('acopy'), plt.xticks([]), plt.yticks([])plt.subplot(223), plt.imshow(aview)plt.title('aview'), plt.xticks([]), plt.yticks([])# 通过flat迭代器,将视图中所有值全部设为0aview.flat = 0plt.subplot(224), plt.imshow(aview)plt.title('aview1'), plt.xticks([]), plt.yticks([])plt.show()

花式索引

from scipy import miscfrom matplotlib import pyplot as plt# 读入图像ascent = misc.ascent()# print(ascent)# 取得x轴y轴的长度xmax = ascent.shape[0]ymax = ascent.shape[1]# print(range(xmax))# print(range(ymax))# print(range(xmax - 1, -1, -1))# print(ascent[range(xmax), range(ymax)])# 将一条对角线上的值设为0ascent[range(xmax), range(ymax)] = 0# print(ascent[range(xmax), range(ymax)])# 将别一条对角线上的值设为0ascent[range(xmax - 1, -1, -1), range(ymax)] = 0plt.imshow(ascent)plt.show()

基于位置列表的索引方法

from scipy import miscfrom matplotlib import pyplot as pltimport numpy as np# 读入图像ascent = misc.ascent()# 取得图像的大小xmax = ascent.shape[0]ymax = ascent.shape[1]# 打乱数组的索引def shuffle_indices(size):    arr = np.arange(size)    np.random.shuffle(arr)    return arrxindices = shuffle_indices(xmax)print(xindices, len(xindices), xmax)np.testing.assert_equal(len(xindices), xmax)yindices = shuffle_indices(ymax)np.testing.assert_equal(len(yindices), ymax)# 显示打乱后的图像,实际打乱的是位置索引plt.imshow(ascent[np.ix_(xindices, yindices)])plt.show()

使用布尔变量索引numpy数组

from scipy import miscfrom matplotlib import pyplot as pltimport numpy as npascent = misc.ascent()def get_indices(size):    arr = np.arange(size)    return arr % 4 == 0# 对角线上可以被4整除的点ascent1 = ascent.copy()xindices = get_indices(ascent.shape[0])yindices = get_indices(ascent.shape[1])ascent1[xindices, yindices] = 0# 将数组中值大于1/4到3/4的值 设为0ascent2 = ascent.copy()ascent1[(ascent > ascent.max() / 4) & (ascent < 3 * ascent.max() / 4)] = 0# 显示图像 plt.subplot(131), plt.imshow(ascent)plt.title('ascent'), plt.xticks([]), plt.yticks([])plt.subplot(132), plt.imshow(ascent1)plt.title('ascent1'), plt.xticks([]), plt.yticks([])plt.subplot(133), plt.imshow(ascent2)plt.title('ascent2'), plt.xticks([]), plt.yticks([])plt.show()

Numpy数组的广播

Python读取wave文件,示例代码如下:

from tkinter import *import wavefrom matplotlib import pyplot as pltimport numpy as np# 打开文件f = wave.open(r"si2323.wav", 'rb')# 读取格式信息params = f.getparams()nchannels, sampwidth, framerate, nframes = params[:4]# 读取波型数据str_data = f.readframes(nframes)f.close()# 将wav波型数据转换为array数组wave_data = np.fromstring(str_data, dtype=np.short)wave_data.shape = -1, 2wave_data = wave_data.Ttime = np.arange(1, nframes) * (1.0 / framerate)# 解决time和wave_data[0]在plot维度不同的问题len_time = len(time) / 2 + 1time = time[:int(len_time)]# 显示声音波型plt.subplot(211)plt.plot(time, wave_data[0])plt.subplot(212)plt.plot(time, wave_data[1], c='r')plt.xlabel('time')plt.show()

示例代码如下:

from scipy.io import wavfilefrom matplotlib import pyplot as pltimport urllibimport numpy as np# response = urllib.request.urlopen('http://www.thesoundarchive.com/austinpowers/smashingbaby.wav')# print(response.info())# WAV_FILE =r'si2323.wav'# filehandle = open(WAV_FILE, 'w')# filehandle.write(response.read())# filehandle.close()# 读取音频文件sample_rate, data = wavfile.read('si2323.wav')print('Data type', data.dtype, 'Shape', data.shape)# 显示原始声音图像plt.subplot(211), plt.title('Original')plt.plot(data)# 保存wav文件newdata = data * 0.2newdata = newdata.astype(np.int16)print('Data type', newdata.dtype, 'Shape', newdata.shape)wavfile.write('quite.wav', sample_rate, newdata)# 显示保存声音图像plt.subplot(212), plt.title('Quiet')plt.plot(newdata)plt.show()

0 0