python:利用pandas进行绘图(总结)绘图工具
来源:互联网 发布:网络配置代码 编辑:程序博客网 时间:2024/05/29 17:12
利用python进行数据分析
第八章:绘图和可视化
pandas绘图工具
>>> from pandas.plotting import scatter_matrix
>>> from pandas import Series, DataFrame
>>> import numpy as np
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
1,散点图矩阵(Scatter Matrix Plot)
These functions can be imported from pandas.plotting and take a Series or DataFrame as an argument.
利用绘图工具绘图,需要引入pandas.plotting模块,以Series和DataFrame作为参数 >>> df = pd.DataFrame(np.random.randn(1000, 4), columns=['a', 'b', 'c', 'd'])
>>> scatter_matrix(df, alpha=0.2, figsize=(6, 6), diagonal='kde')
>>> plt.show()
生成4X4的共16个图片,对角线是密度图,其他的为散点图
2,密度图(Density Plot)
You can create density plots using the Series.plot.kde() and DataFrame.plot.kde() methods
利用Series.plot.kde()或DataFrame.plot.kde()方法绘制密度图
np.random.randn(1000)生成的是一个正太分布曲线 >>> ser = pd.Series(np.random.randn(1000))
>>> ser.plot.kde()
生成一个正太分布曲线图
3,安德鲁斯曲线(Andrews Curves)
Andrews curves allow one to plot multivariate data as a large number of curves that are created using the attributes of samples as coefficients for Fourier series. By coloring these curves differently for each class it is possible to visualize data clustering. Curves belonging to samples of the same class will usually be closer together and form larger structures.
安德鲁斯曲线是在一个绘图中存在大量的曲线,这些曲线是不同样本之间存在的不同属性而产生的分类结果;所以在绘图时利用不同的颜色来区分不同的分组,不同分类的曲线在绘图时会靠近并形成一个更大的结构体系。使用andrews_curves()方法进行绘图 >>> from pandas.plotting import andrews_curves
>>> df=DataFrame(np.random.rand(10,10), columns=range(1,11))
>>> df
1 2 3 4 5 6 7 \
0 0.657668 0.234840 0.187963 0.480384 0.676935 0.644506 0.849955
1 0.347819 0.278945 0.482548 0.856854 0.369824 0.921871 0.195208
2 0.481188 0.886892 0.269874 0.992266 0.663039 0.285274 0.222589
3 0.999133 0.932073 0.656683 0.607936 0.362180 0.756532 0.479407
4 0.918229 0.965718 0.243416 0.042666 0.932310 0.734750 0.142455
5 0.393881 0.821673 0.598786 0.715335 0.525187 0.763766 0.570982
6 0.998222 0.770152 0.803504 0.932111 0.629249 0.632741 0.230093
7 0.730399 0.127948 0.586990 0.890208 0.885532 0.821200 0.216378
8 0.823925 0.741674 0.690356 0.269986 0.530224 0.446307 0.265048
9 0.497035 0.830702 0.399065 0.242242 0.192078 0.622756 0.867983
8 9 10
0 0.428669 0.921396 0.865082
1 0.897575 0.000369 0.019511
2 0.004554 0.093646 0.152874
3 0.376975 0.512618 0.385439
4 0.314657 0.032770 0.406077
5 0.087637 0.525262 0.095010
6 0.841192 0.115266 0.358726
7 0.957213 0.709480 0.013137
8 0.483483 0.687900 0.431011
9 0.924797 0.119433 0.386189 >>> plt.figure()
>>> andrews_curves(df, 1)
df这个DataFrame对象的第一列,每一个index的数值都绘制出一条曲线
4,平行坐标(Parallel Coordinates)
Parallel coordinates is a plotting technique for plotting multivariate data. It allows one to see clusters in data and to estimate other statistics visually. Using parallel coordinates points are represented as connected line segments. Each vertical line represents one attribute. One set of connected line segments represents one data point. Points that tend to cluster will appear closer together. >>> from pandas.plotting import parallel_coordinates
>>> df=DataFrame(np.random.rand(10,10), columns=range(1,11))
>>> df
1 2 3 4 5 6 7 \
0 0.467659 0.978732 0.179538 0.685182 0.229915 0.882398 0.924433
1 0.863878 0.992446 0.732572 0.543559 0.164539 0.710433 0.220690
2 0.816937 0.866524 0.561880 0.136630 0.972659 0.352004 0.650383
3 0.351081 0.341353 0.004663 0.600008 0.880758 0.440976 0.111892
4 0.226553 0.014078 0.379845 0.598606 0.341625 0.675299 0.708234
5 0.170063 0.342096 0.813045 0.860868 0.905096 0.737247 0.652726
6 0.797142 0.777763 0.737259 0.100391 0.551292 0.739408 0.266556
7 0.130778 0.201388 0.896418 0.549645 0.587309 0.548748 0.009598
8 0.467129 0.298170 0.861704 0.217054 0.761984 0.110673 0.493671
9 0.778196 0.456548 0.171519 0.745076 0.905559 0.390150 0.727006
8 9 10
0 0.494924 0.612457 0.026332
1 0.430576 0.064443 0.970996
2 0.776737 0.251197 0.410517
3 0.763297 0.365974 0.889982
4 0.947055 0.200605 0.179035
5 0.435712 0.694421 0.101725
6 0.581694 0.719693 0.588572
7 0.998294 0.138834 0.059504
8 0.549928 0.096064 0.312498
9 0.854901 0.985777 0.691980 >>> plt.figure()
>>> parallel_coordinates(df, 1)
最终结果是df这个DataFrame对象的第一列,每一个index的数值都绘制出一条线并通过2-10这些线段进行分隔
5,Lag Plot
Lag plots are used to check if a data set or time series is random. Random data should not exhibit any structure in the lag plot. Non-random structure implies that the underlying data are not random.
Lag plots用于查看随机数据,随机数据不会在lag plot当中展示,非随机体系,意味着潜在数据不是随机的。 >>> from pandas.plotting import lag_plot
>>> plt.figure()
>>> data = pd.Series(0.1 * np.random.rand(1000) + 0.9 * np.sin(np.linspace(-99 * np.pi, 99 * np.pi, num=1000)))
>>> lag_plot(data)
绘制图形的X轴是y(t),Y轴是y(t+1)
6,自相关图(Autocorrelation Plot)
Autocorrelation plots are often used for checking randomness in time series. This is done by computing autocorrelations for data values at varying time lags. If time series is random, such autocorrelations should be near zero for any and all time-lag separations. If time series is non-random then one or more of the autocorrelations will be significantly non-zero. The horizontal lines displayed in the plot correspond to 95% and 99% confidence bands. The dashed line is 99% confidence band. >>> from pandas.plotting import autocorrelation_plot
>>> data = pandas.Series(0.7 * np.random.rand(1000) + 0.3 * np.sin(np.linspace(-9 * np.pi, 9 * np.pi, num=1000)))
>>> autocorrelation_plot(data)
生成图片的横轴是label是Lag,纵轴label是Autocorrelation
7,Bootstrap Plot
Bootstrap plots are used to visually assess the uncertainty of a statistic, such as mean, median, midrange, etc. A random subset of a specified size is selected from a data set, the statistic in question is computed for this subset and the process is repeated a specified number of times. Resulting plots and histograms are what constitutes the bootstrap plot. >>> from pandas.plotting import bootstrap_plot
>>> data = pd.Series(np.random.rand(1000))
>>> bootstrap_plot(data, size=50, samples=500, color='green')
8,RadViz
RadViz is a way of visualizing multi-variate data. It is based on a simple spring tension minimization algorithm. Basically you set up a bunch of points in a plane. In our case they are equally spaced on a unit circle. Each point represents a single attribute. You then pretend that each sample in the data set is attached to each of these points by a spring, the stiffness of which is proportional to the numerical value of that attribute (they are normalized to unit interval). The point in the plane, where our sample settles to (where the forces acting on our sample are at an equilibrium) is where a dot representing our sample will be drawn. Depending on which class that sample belongs it will be colored differently. >>> df=DataFrame(np.array([[2,4,6,79,23,190,552,1314,23457], [4,9,6,97,32,110,555,1210,4325]]).T, columns=['a','b'])
>>> radviz(df, 'a')
- python:利用pandas进行绘图(总结)绘图工具
- python:利用pandas进行绘图(总结)绘图格式
- python:利用pandas进行绘图(总结)基础篇
- python-pandas绘图
- python:matplotlib及pandas绘图(1)
- python:matplotlib及pandas绘图(2)
- 利用python进行数据分析(六):绘图和可视化
- Pandas绘图
- 《利用Python 进行数据分析》pandas 总结
- 绘图和可视化(pandas)
- 利用Python进行数据分析--绘图和可视化
- 利用python进行数据分析-绘图和可视化1
- 利用python进行数据分析-绘图和可视化2
- 利用opencv进行鼠标绘图
- python绘图:matplotlib和pandas的应用
- python绘图:matplotlib和pandas的应用
- python绘图:matplotlib和pandas的应用
- python绘图工具reportlab介绍
- SSL2454 NOIP2015提高组模拟题八 采药(背包dp)
- 匹配 (KMP)
- Lua 字符串的使用
- Previous operation has not finished; run 'cleanup' if it was interrupted
- 预测数值型数据:回归 源码分析(2)
- python:利用pandas进行绘图(总结)绘图工具
- Android蓝牙socket实现视频实时传输,以及图片和文本传输
- js获取手机ip
- easyui加载combobox
- 链表问题 在单链表和双链表中删除倒数第K个节点
- 3Sum
- Linux网络编程--(1)网络体系架构
- Kafka LEADER_NOT_AVAILABLE问题解决
- HTTP请求行、请求头、请求体详解