机器学习— 获取数据,绘制图表

来源:互联网 发布:腾讯数据分析师面经 编辑:程序博客网 时间:2024/05/16 15:29

获取数据


用requests,请求网络数据,然后写入csv文件。

import requests#从远程获取数据url = "http://aima.cs.berkeley.edu/data/iris.csv"response = requests.get(url)#写入文件local_file = open("iris.csv","w")local_file.write(response.text)local_file.close()

从文件中读取数据


from numpy import genfromtxt,zeros#特征集data = genfromtxt("iris.csv",delimiter=",",usecols=(0,1,2,3))#分类标签labels = genfromtxt("iris.csv",delimiter=",",usecols=(4),dtype=str)

数据为csv格式,以“,”分隔,分别为花萼的长宽,花瓣的长宽。
这里为了便于分析,学习,就叫特征0,1,2,3,最后一列为分类标签,数据格式如下:


数据样式

查看样本数据的维度,标签的种类


#查看矩阵大小,标签种类print(data.shape)print(labels.shape)print(set(labels))

结果:


矩阵大小,标签种类

根据数据开始绘图


这里需要引入package

from pylab import plot,show,figure,subplot,hist,xlimimport matplotlib.pyplot as plt

二维散点图

  • 特征0,特征1
      plot(data[labels=="setosa",0],data[labels=="setosa",1],"bo")  plot(data[labels=="virginica",0],data[labels=="virginica",1],"ro")  plot(data[labels=="versicolor",0],data[labels=="versicolor",1],"go")  show()
    结果:

特征0,1


从上图看,好像并不能很好的区分种类。

  • 特征0,2
      plot(data[labels=="setosa",0],data[labels=="setosa",2],"bo")  plot(data[labels=="virginica",0],data[labels=="virginica",2],"ro")  plot(data[labels=="versicolor",0],data[labels=="versicolor",2],"go")  show()
    结果:

特征0,2


这个就比较明显了

  • 特征0,3
      plot(data[labels=="setosa",0],data[labels=="setosa",3],"bo")  plot(data[labels=="virginica",0],data[labels=="virginica",3],"ro")  plot(data[labels=="versicolor",0],data[labels=="versicolor",3],"go")  show()
    结果:

特征0,3


同理,可以用其他的特征量来生成二位散点图

三维散点图


  • 特征0,2,3
ax = plt.subplot(111,projection="3d")ax.scatter(data[labels=="setosa",0],data[labels=="setosa",2],data[labels=="setosa",3],c="b")ax.scatter(data[labels=="virginica",0],data[labels=="virginica",2],data[labels=="virginica",3],c="r")ax.scatter(data[labels=="versicolor",0],data[labels=="versicolor",2],data[labels=="versicolor",3],c="g")plt.show()

结果:


特征0,2,3

直方图


#特征0subplot(441)hist(data[labels=="setosa",0],color="b",alpha=0.7)xlim(xmin0,xmax0)subplot(445)hist(data[labels=="virginica",0],color="r",alpha=0.7)xlim(xmin0,xmax0)subplot(449)hist(data[labels=="versicolor",0],color="g",alpha=0.7)xlim(xmin0,xmax0)subplot(4,4,13)hist(data[:,0],color="y",alpha=0.7)xlim(xmin0,xmax0)#特征1xmin1 = min(data[:,1])xmax1 = max(data[:,1])subplot(442)hist(data[labels=="setosa",1],color="b",alpha=0.7)xlim(xmin1,xmax1)subplot(446)hist(data[labels=="virginica",1],color="r",alpha=0.7)xlim(xmin1,xmax1)subplot(4,4,10)hist(data[labels=="versicolor",1],color="g",alpha=0.7)xlim(xmin1,xmax1)subplot(4,4,14)hist(data[:,1],color="y",alpha=0.7)xlim(xmin1,xmax1)#特征2xmin2 = min(data[:,2])xmax2 = max(data[:,2])subplot(443)hist(data[labels=="setosa",2],color="b",alpha=0.7)xlim(xmin2,xmax2)subplot(447)hist(data[labels=="virginica",2],color="r",alpha=0.7)xlim(xmin2,xmax2)subplot(4,4,11)hist(data[labels=="versicolor",2],color="g",alpha=0.7)xlim(xmin2,xmax2)subplot(4,4,15)hist(data[:,2],color="y",alpha=0.7)xlim(xmin2,xmax2)#特征3xmin3 = min(data[:,3])xmax3 = max(data[:,3])subplot(444)hist(data[labels=="setosa",3],color="b",alpha=0.7)xlim(xmin3,xmax3)subplot(448)hist(data[labels=="virginica",3],color="r",alpha=0.7)xlim(xmin3,xmax3)subplot(4,4,12)hist(data[labels=="versicolor",3],color="g",alpha=0.7)xlim(xmin3,xmax3)subplot(4,4,16)hist(data[:,3],color="y",alpha=0.7)xlim(xmin3,xmax3)show()

这里代码为了便于理解,学习,就没进行封装


直方图
原创粉丝点击