Dataquest学习总结[7]

来源:互联网 发布:齐次坐标 知乎 编辑:程序博客网 时间:2024/06/06 01:09

继续Step 5: Statistics And Linear Algebra/Probability And Statistics In Python: Intermediate

 Introduction to probability

Calculating Probabilities

>>数据集bike sharing Dataset,地址here  

地板除//,5//4=1

计算阶乘:math.factorial(N)

p = .6q = .4import mathdef calc_prob(total,days):    per_pro=(p**(days))*(q**(total-days))    num=math.factorial(total)/math.factorial(days)/math.factorial(total-days)    return per_pro*numprob_8=calc_prob(10,8)
Probability distributions

import math# Each item in this list represents one k, starting from 0 and going up to and including 30.outcome_counts = list(range(31))#手写二项分布的代码def calc_prob(N,k,p,q):    prob=(p**k)*(q**(N-k))    count=math.factorial(N)/math.factorial(k)/math.factorial(N-k)    return prob*countoutcome_probs=[]for i in outcome_counts:    outcome_probs.append(calc_prob(30,i,.39,.61))#利用scipy库进行二项分布求解import scipyfrom scipy import linspacefrom scipy.stats import binom# Create a range of numbers from 0 to 30, with 31 elements (each number has one entry).outcome_counts = linspace(0,30,31)outcome_probs=binom.pmf(outcome_counts,30,0.39)plt.bar(outcome_counts,outcome_probs)plt.show()#二项分布均值Np,方差Npq#进行试验的测试足够多时,二项分布近似正态分布#累计概率密度,binom.cdf()# The sum of all the probabilities to the left of k, including k.left = binom.cdf(k,N,p)# The sum of all probabilities to the right of k.right = 1 - left
Significance Testing : p-value,置信区间的概念

Chi-squared tests

产生0.0~1.0之间随机数numpy.random.random(a,b),返回a*b维的ndarray

#手动产生卡方分布chi_squared_values = []for i in range(1000):    numbers=numpy.random.random(32561,)    for i in range(len(numbers)):        if numbers[i]<0.5:            numbers[i]=0        else:            numbers[i]=1    mal=32561-numpy.sum(numbers)    femal=numpy.sum(numbers)    male_diff=(mal-16280.5)**2/16280.5    female_diff=(femal-16280.5)**2/16280.5    chi_squared_values.append(male_diff+female_diff)plt.hist(chi_squared_values)plt.show()#利用scipy产生卡方值from scipy.stats import chisquareobserved = np.array([5, 10, 15])expected = np.array([7, 11, 12])chisquare_value, pvalue = chisquare(observed, expected)
Multi category chi-squared tests
pandas.crosstab 计算DataFrame表中的各项频次关系

import pandastable = pandas.crosstab(income["sex"], [income["race"]])print(table)
scipy.stats.chi2_contingency   函数返回一些卡方分布参数

from scipy.stats import chi2_contingencytable=pandas.crosstab(income['sex'],[income['race']])chisq_value, pvalue, df, expected= chi2_contingency(table)pvalue_gender_race=pvalue
Guided Project: Winning Jeopardy

代码 here    数据集here

list.remove()  可以直接修改list,移除第一个匹配项,但是没有返回值

Solving Systems of Equations with Matrices/vectors

#矩阵行变换import numpy as npmatrix = np.asarray([    [2, 1, 25],    [3, 2, 40]  ], dtype=np.float32)matrix[0]*=2matrix[0]-=matrix[1]matrix[1]-=(matrix[0]*3)matrix[1]/=2#行与行进行交换matrix[[0,2]] = matrix[[2,0]]#对多个向量作图import numpy as npimport matplotlib.pyplot as plt# We're going to plot two vectors# The first will start at origin 0,0, then go over 1 and up 2# The second will start at origin 1,2, then go over 3 and up 2# The third will start at origin 0,0, then go over 4 and up 4X = [0,1,0]Y = [0,2,0]U = [1,3,4]V = [2,2,4]plt.quiver(X, Y, U, V, angles='xy', scale_units='xy', scale=1)plt.xlim([0,6])plt.ylim([0,6])plt.show()#矩阵相乘numpy.dot(A,B)
原创粉丝点击