data science cs109 homework1

来源:互联网 发布:sql nvl函数用法 编辑:程序博客网 时间:2024/05/17 06:33

1.python及其第三方库的引入

python和pip的安装会出现各种错误,在安装python之前一定要安装openssl、openssl-devel、libxml,libxml-devel,然后才能以此安装setuptools和pip,pip是python的管理器,能够自动下载安装很多python第三方包,非常方便。
小编捣鼓了好几天,事实证明,python3还不是很成熟,第三方包非常难安装,一直会出错,最好还是使用python2.7来学习机器学习
data science的老师在homework中给出了详细的需要下载的第三方库的列表:

#IPython is what you are using now to run the notebookimport IPythonprint "IPython version:      %6.6s (need at least 1.0)" % IPython.__version__# Numpy is a library for working with Arraysimport numpy as npprint "Numpy version:        %6.6s (need at least 1.7.1)" % np.__version__# SciPy implements many different numerical algorithmsimport scipy as spprint "SciPy version:        %6.6s (need at least 0.12.0)" % sp.__version__# Pandas makes working with data tables easierimport pandas as pdprint "Pandas version:       %6.6s (need at least 0.11.0)" % pd.__version__# Module for plottingimport matplotlibprint "Mapltolib version:    %6.6s (need at least 1.2.1)" % matplotlib.__version__# SciKit Learn implements several Machine Learning algorithmsimport sklearnprint "Scikit-Learn version: %6.6s (need at least 0.13.1)" % sklearn.__version__# Requests is a library for getting data from the Webimport requestsprint "requests version:     %6.6s (need at least 1.2.3)" % requests.__version__# Networkx is a library for working with networksimport networkx as nxprint "NetworkX version:     %6.6s (need at least 1.7)" % nx.__version__#BeautifulSoup is a library to parse HTML and XML documentsimport BeautifulSoupprint "BeautifulSoup version:%6.6s (need at least 3.2)" % BeautifulSoup.__version__#MrJob is a library to run map reduce jobs on Amazon's computersimport mrjobprint "Mr Job version:       %6.6s (need at least 0.4)" % mrjob.__version__#Pattern has lots of tools for working with data from the internetimport patternprint "Pattern version:      %6.6s (need at least 2.6)" % pattern.__version__

(1)这里面matplotlib安装出现了很多问题,其中matplotlib安装必须依赖freetype、pnglib,还有yum install gcc-c++
(2)pandas安装需要依赖 yum install cython,安装起来会死机,还不知道为什么
(3)beautiful4就是bs4啊啊
(4)pattern安装时候报错,问题出在setup.py中的print是Python2.x版本的,而python是3.x版本的,需要将setup.py文件中所有的print换为print();

此上是我总结的血泪史,感觉错误的关键是版本的不匹配,由于我对linux不是很熟悉,所以只能慢慢摸索了,好在在自己的电脑上成功的安装了所有的第三方包,实验室的还有其他奇葩的错误,再接再厉,那么下面开始简单的代码吧。

2.代码例子

#this line prepares IPython for working with matplotlib%matplotlib inline  # this actually imports matplotlibimport matplotlib.pyplot as plt  x = np.linspace(0, 10, 30)  #array of 30 points from 0 to 10y = np.sin(x)z = y + np.random.normal(size=30) * .2plt.plot(x, y, 'ro-', label='A sine wave')plt.plot(x, z, 'b-', label='Noisy sine')plt.legend(loc = 'lower right')plt.xlabel("X axis")plt.ylabel("Y axis")  

问题:
(1)No module named _tkinter
解决方案:
安装tck-devel、tk-devel,重新编译python
(2)no display name and no $DISPLAY environment variable
解决方案:
在文件头添加implotlib.use()
或者import matplotlib >>> matplotlib.matplotlib_fname() # This is the file location in Ubuntu ‘/etc/matplotlibrc’
找到matplotlibrc之后,将backend从tkAGG修改为AGG
上面的方法会导致show函数不能显示图像,解决的根本方法还不知道,但是后续会继续寻找方法
(3)libpng16.so.16: cannot open shared object file: No such file or directory
[root@root python]
解决方案:

0 0