初学pandas与seaborn（二）随机点…

来源：互联网发布：青岛软件评测中心编辑：程序博客网时间：2024/05/16 14:42

　　计算机一般只能生成一定范围内平均分布的随机数，例如在区间[0,1]的白噪声。而如果需要其它概率分布，那就得用这个“白噪声”进行一系列变换和投影了。

　　生成正太分布函数（也可以用库函数），这里暂时使用Box-Muller方法：

def boxmullersampling(mu=0,sigma=1, size=1):

u = np.random.uniform(size=size)

v = np.random.uniform(size=size)

z = np.sqrt(-2 * np.log(u)) * np.cos(2 * np.pi *v)

return mu + z * sigma

然后我们做了一个简易的函数来生成一个长度为size的列表？（数组？），并把里面装满生成的随机数：

def getlist(size,sigma=1):

resultList = [0 for x in range(0,size)]

for i in range(size):

resultList[i] = boxmullersampling(sigma)[0]

return resultList

最后我们调用一系列seaborn方法，生成核密度估计图：

x = getlist(10, 1)

y = getlist(10, 8)

dataframes = pd.DataFrame({'x':x,

'y': y})

sns.jointplot(x="x", y="y",data=dataframes, kind="kde")

sns.plt.show()

因为取的点比较少，效果图比较“惊悚”：

在做这项任务的时候，出现了一些bug：

Traceback (most recent calllast):

File"fourth.py", line 177, in

sns.jointplot(x="x", y="y", data=dataframes,kind="kde")

File"distributions.py", line 863, injointplot

grid.annotate(stat_func,**annot_kws)

File"axisgrid.py", line 1808, inannotate

annotation = template.format(stat=stat, val=val,p=p)

ValueError:Unknown format code 'g' for object of type 'str'

完全无法排错，在str类型里，有未知的格式化代码g…… ……尼玛，g是啥？str是字符串的意思吗？

fourth.py是我自己写的文件，里面的出错行的语句是：

dataframes = pd.DataFrame({'x':x,

'y': y})

sns.jointplot(x="x", y="y",data=dataframes, kind="kde")

哪里来的g？

然后点击distributions.py，里面指示出毛病的语句是：

if stat_func is notNone:

grid.annotate(stat_func, **annot_kws)

g呢？这里只有stat_func和annot_kws。最后顺藤摸瓜到axisgrid.py，出错的语句是（粗体）：

if p is None:

annotation= template.format(stat=stat, val=val)

else:

annotation= template.format(stat=stat, val=val, p=p)

…… …………完全无法排错。python并不如其宣传的那样容易上手，各种方便的方法一开始很赞，到后面复杂就悲剧了。就像学数学，先学了“巧解”，没弄明白“通解”，很容易被变换过的题目卡死。

　　回归原问题，出错率，但是错误提示又没定位出哪个变量出问题，到底是什么问题，只能通过“printdebug方法”，将此过程中产生的变量一步步打印出来，然后将正确答案的变量也一步步打印出来，并进行对比。结果发现，自己做的函数生成的x和y……居然是数组中的数组？！（或者在python里叫列表中的列表）

正确的dataframe错误的dataframe x y

0 1.964398 5.726021

1 0.710647 8.656289

2 0.718874 8.391393

3 0.511813 7.836103

4 1.332608 7.830148

5 1.176651 8.083313

6 1.563887 9.849276

7 2.767148 7.428699

8 2.427337 7.429915

9 1.901562 6.614786

x y

0 [1.09405811549] [8.87066374625]

1 [1.12106817441] [7.33795699011]

2 [0.620672589912] [7.11607203089]

3 [1.74841276741] [8.89951559785]

4 [1.35134847882] [6.52366881093]

5 [0.835977385983] [9.40627226732]

6 [0.981331289932] [7.83305060896]

7 [1.79729857857] [5.89704517571]

8 [1.02010129785] [7.47180988343]

9 [0.963286962866] [7.53969232683]

卧槽，每个单元的数居然还包含一个方括号！类型不对有木有！

赶紧顺藤摸瓜看看这个罪魁祸首是谁：

def boxmullersampling(mu=0, sigma=1, size=1):

u =np.random.uniform(size=size)

v =np.random.uniform(size=size)

z = np.sqrt(-2 *np.log(u)) * np.cos(2 * np.pi * v)

return mu + z *sigma

def getlist(size, sigma=1):

resultList = [0 for x inrange(0, size)]

for i inrange(size):

resultList[i] = boxmullersampling(sigma)

return resultList

————————————————————华丽的分割线————————————————————

　　原来是boxmullersampling函数生成了单个元素的数组……但我还是不明白，为什么mu+z*sigma这种程度的运算会得个数组类型出来。这种没有类型约束的定义简直令人摸不着头脑。解决方法，把数组中第一个元素取出来作为一个数塞进数组里，而不是把数组塞进数组里。（好吧，那叫列表，不叫数组？）

def getlist(size, sigma=1):

resultList = [0 for x inrange(0, size)]

for i inrange(size):

resultList[i] =boxmullersampling(sigma)[0]

return resultList

加上粗体字的[0]就OK

0 0