Dataquest学习总结[7]
来源:互联网 发布:齐次坐标 知乎 编辑:程序博客网 时间:2024/06/06 01:09
继续Step 5: Statistics And Linear Algebra/Probability And Statistics In Python: Intermediate
Introduction to probability
Calculating Probabilities
>>数据集bike sharing Dataset,地址here
地板除//,5//4=1
计算阶乘:math.factorial(N)
p = .6q = .4import mathdef calc_prob(total,days): per_pro=(p**(days))*(q**(total-days)) num=math.factorial(total)/math.factorial(days)/math.factorial(total-days) return per_pro*numprob_8=calc_prob(10,8)Probability distributions
import math# Each item in this list represents one k, starting from 0 and going up to and including 30.outcome_counts = list(range(31))#手写二项分布的代码def calc_prob(N,k,p,q): prob=(p**k)*(q**(N-k)) count=math.factorial(N)/math.factorial(k)/math.factorial(N-k) return prob*countoutcome_probs=[]for i in outcome_counts: outcome_probs.append(calc_prob(30,i,.39,.61))#利用scipy库进行二项分布求解import scipyfrom scipy import linspacefrom scipy.stats import binom# Create a range of numbers from 0 to 30, with 31 elements (each number has one entry).outcome_counts = linspace(0,30,31)outcome_probs=binom.pmf(outcome_counts,30,0.39)plt.bar(outcome_counts,outcome_probs)plt.show()#二项分布均值Np,方差Npq#进行试验的测试足够多时,二项分布近似正态分布#累计概率密度,binom.cdf()# The sum of all the probabilities to the left of k, including k.left = binom.cdf(k,N,p)# The sum of all probabilities to the right of k.right = 1 - leftSignificance Testing : p-value,置信区间的概念
Chi-squared tests
产生0.0~1.0之间随机数numpy.random.random(a,b),返回a*b维的ndarray
#手动产生卡方分布chi_squared_values = []for i in range(1000): numbers=numpy.random.random(32561,) for i in range(len(numbers)): if numbers[i]<0.5: numbers[i]=0 else: numbers[i]=1 mal=32561-numpy.sum(numbers) femal=numpy.sum(numbers) male_diff=(mal-16280.5)**2/16280.5 female_diff=(femal-16280.5)**2/16280.5 chi_squared_values.append(male_diff+female_diff)plt.hist(chi_squared_values)plt.show()#利用scipy产生卡方值from scipy.stats import chisquareobserved = np.array([5, 10, 15])expected = np.array([7, 11, 12])chisquare_value, pvalue = chisquare(observed, expected)Multi category chi-squared tests
pandas.crosstab 计算DataFrame表中的各项频次关系
import pandastable = pandas.crosstab(income["sex"], [income["race"]])print(table)scipy.stats.chi2_contingency 函数返回一些卡方分布参数
from scipy.stats import chi2_contingencytable=pandas.crosstab(income['sex'],[income['race']])chisq_value, pvalue, df, expected= chi2_contingency(table)pvalue_gender_race=pvalueGuided Project: Winning Jeopardy
代码 here 数据集here
list.remove() 可以直接修改list,移除第一个匹配项,但是没有返回值
Solving Systems of Equations with Matrices/vectors
#矩阵行变换import numpy as npmatrix = np.asarray([ [2, 1, 25], [3, 2, 40] ], dtype=np.float32)matrix[0]*=2matrix[0]-=matrix[1]matrix[1]-=(matrix[0]*3)matrix[1]/=2#行与行进行交换matrix[[0,2]] = matrix[[2,0]]#对多个向量作图import numpy as npimport matplotlib.pyplot as plt# We're going to plot two vectors# The first will start at origin 0,0, then go over 1 and up 2# The second will start at origin 1,2, then go over 3 and up 2# The third will start at origin 0,0, then go over 4 and up 4X = [0,1,0]Y = [0,2,0]U = [1,3,4]V = [2,2,4]plt.quiver(X, Y, U, V, angles='xy', scale_units='xy', scale=1)plt.xlim([0,6])plt.ylim([0,6])plt.show()#矩阵相乘numpy.dot(A,B)
阅读全文
0 0
- Dataquest学习总结[7]
- Dataquest学习总结[1]
- Dataquest学习总结[2]
- Dataquest学习总结[3]
- Dataquest学习总结[4]
- Dataquest学习总结[5]
- Dataquest学习总结[6]
- Dataquest学习总结[9]
- Dataquest学习总结[10]
- Dataquest学习总结[8]-Machine Learning
- Dataquest学习代码笔记
- Dataquest用户数据分析
- Dataquest用户流失预测
- 7-7学习总结
- Hibernate学习总结(7)
- linux学习总结7
- linux学习总结7
- lucene4.7学习总结
- 排序--选择排序
- 提高篇项目1-将数组n前面的数输出
- Visual Studio中在源文件与头文件之间切换的快捷键
- LeetCode 557. Reverse Words in a String III (字符串翻转)
- 【原创】【程序小游戏】平面四子棋 C++
- Dataquest学习总结[7]
- git 常用命令
- C++排序算法之桶排序
- caffe cuda8.0 GTX1070安装问题总结
- Laravel+Intervention上传图片
- 排序-希尔排序
- 使用C#读取文件更有效率的几种方法
- 树莓派raspbian资源下载
- 图片预加载之无序加载