Using scatter plots for multivariate data —— python data science cookbook

来源:互联网 发布:数据分析师修炼 编辑:程序博客网 时间:2024/05/17 07:09

以下均来自《python数据科学指南》课本案例

In multivariate data analysis, we are interested in seeing if there any relationships between the columns that we are analyzing. In two column/variable cases, the best place to start is a standard scatter plot. There can be four types of relationships, as follows:

  1. No relationship
  2. Strong
  3. Simple
  4. Multivariate (not simple) relationship

Eample:

We will use the Iris dataset. It’s a multivariate dataset introduced by Sir Ronald Fisher. Refer tohttps://archive.ics.uci.edu/ml/datasets/Iris for more information. The Iris dataset has 150 instances and four attributes/columns. The 150 instances are composed of 50 records from each of the three species of the Iris flower (Setosa, virginica, and versicolor). The four attributes are the sepal length in cm, sepal width in cm, petal length in cm, and petal width in cm. Thus, the Iris dataset also serves as a great classification dataset. 

#!/usr/bin/env python2# -*- coding: utf-8 -*-"""@author: snaildove"""# Load Librariresfrom sklearn.datasets import load_irisimport matplotlib.pyplot as pltimport itertools# 1. Load Iris datasetdata = load_iris()x = data['data']y = data['target']feature_names = data['feature_names']#We will proceed with demonstrating with a scatter plot:# 2.Perform a simple scatter plot.# Plot 6 graphs, combinations of our features, sepal length, sepal width, petal length and petal width.plt.close('all')plt.figure(1)# We want a plot with# 3 rows and 2 columns, 3 and 2 in# below variable signifies that.subplot_start = 321col_numbers = range(0,4)# Need it for labeling the graphcol_pairs = itertools.combinations(col_numbers,2)plt.subplots_adjust(wspace = 0.5)for col_pair in col_pairs:    plt.subplot(subplot_start)    plt.scatter(x[:,col_pair[0]],x[:,col_pair[1]],c=y)    plt.xlabel(feature_names[col_pair[0]])    plt.ylabel(feature_names[col_pair[1]])    subplot_start+=1plt.show()


As you can see, we have plotted two combinations of our columns. We also have the class labels represented using three different colors. Let’s look at the bottom left plot, petal length versus petal width. We see that different range of values belong to different class labels. Now, this gives us a great clue for classification; the petal width and length variables are good candidates if the problem in hand is classification.

Note

For the Iris dataset, the petal width and length can alone classify the records in their respective flower family.
These kinds of observations can be quickly made during the feature selection process with the help of bivariate scatter plots.

0 0
原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 房东要收回店面怎么办 天猫字迹模糊怎么办 淘宝类目不叫上架怎么办 淘宝直播有延迟怎么办 淘宝直播间中奖怎么办 微信扫码付款后卖家不发货怎么办 淘宝打骚扰电话怎么办 淘宝卖家打骚扰电话怎么办 被商家打了怎么办 保底消费入坑怎么办 留党查看到期怎么办 遭遇淘宝控价怎么办 淘宝店没有了怎么办 淘宝店铺运费险不出单怎么办 闲鱼定金被骗怎么办 肯德基团购过期怎么办 word不可以修改怎么办 店铺预售不发货怎么办 埋件设置不符合怎么办 闲鱼付了款卖家不发货怎么办 微信里付了款卖家不发货怎么办 运动鞋穿臭了怎么办 小车陷泥土了怎么办 孩子有心事不说怎么办 网状运动鞋乱了怎么办 运动鞋布面坏了怎么办 运动鞋面破了怎么办 脚磨烂了怎么办小妙招 网眼运动鞋破了怎么办 运动鞋后面烂了怎么办 运动鞋面坏了怎么办 磨档磨的特别疼怎么办 夏天高跟鞋里面脏了怎么办 走路鞋底有声音怎么办 鞋后跟海绵塌了怎么办 鞋后跟凹进去了怎么办 穿高跟鞋臭脚怎么办 运动鞋磨后脚跟怎么办 鞋两边磨脚踝怎么办 新鞋子磨脚踝怎么办 耐克鞋两边挤脚怎么办