Skewness

来源:互联网 发布:ios仿新浪微博源码 编辑:程序博客网 时间:2024/04/29 03:11

翻译原址:http://everythingmaths.co.za/maths/grade-11/11-statistics/11-statistics-05.cnxmlplus
另参考网址:
1. http://help.gooddata.com/doc/public/wh/WHAll/Default.htm?#MAQLRefGuide/NormalityTesting-SkewnessAndKurtosis.htm
2. http://www.amstat.org/publications/jse/v19n2/doane.pdf
3. http://datapigtechnologies.com/blog/index.php/methods-of-measuring-the-skewness-of-data/

We are now going to classify data sets into 3 categories that describe the shape of the data distribution: symmetric, left skewed, right skewed. We can use this classification for any data set, but here we will look only at distributions with one peak. Most of the data distributions that you have seen so far have only one peak, so the plots in this section should look familiar.

我们现在要把数据分成3个类型:对称,向左偏,向右偏。这些类型描述了数据分布的形状。我们可以把这种分类用在任何数据,但是我们在这里只着眼于单峰的分布。我们看到的大多数数据分布都是单峰的,所以应该情况相似。

Symmetric distributions
A symmetric distribution is one where the left and right hand sides of the distribution are roughly equally balanced around the mean. The histogram below shows a typical symmetric distribution.

一个对称的分布是平均值的左右两边大约相等的。下面的柱状图就是一个典型的对称分布。

这里写图片描述

For symmetric distributions, the mean is approximately equal to the median. The tails of the distribution are the parts to the left and to the right, away from the mean. The tail is the part where the counts in the histogram become smaller. For a symmetric distribution, the left and right tails are equally balanced, meaning that they have about the same length.

对于对称的分布,平均值大约等于中值。尾值是分布中远离平均值的左边和右边的部分。对于对称分布,左右两边的尾值大体平衡,即左右两边的长度相等。

The figure below shows the box and whisker diagram for a typical symmetric data set.

下图是一个典型的对称数据的盒须图。

这里写图片描述

Another property of a symmetric distribution is that its median (second quartile) lies in the middle of its first and third quartiles. Note that the whiskers of the plot (the minimum and maximum) do not have to be equally far away from the median. In the next section on outliers, you will see that the minimum and maximum values do not necessarily match the rest of the data distribution well.

对称分布的另一个性质是她的中值(盒须图的第二个四分位数)位于第一个四分位数和第三个四分位数的正中间。需要注意的是,图中的须(最大值和最小值)不一定距离中值一样远。在下一部分介绍离群值的时候,你会看见最大值和最小值不必很好地拟合其他的数据分布。

Skewed

A distribution that is skewed right (also known as positively skewed) is shown below.

向右偏的分布(也被称为正偏)如下图。

这里写图片描述

Now the picture is not symmetric around the mean anymore. For a right skewed distribution, the mean is typically greater than the median. Also notice that the tail of the distribution on the right hand (positive) side is longer than on the left hand side.

现在数据并不是对于平均值对称的了。对于一个右偏分布,平均值是明显比中位数大的。同时请注意,分布的右边的尾巴比左边的要长。

这里写图片描述

From the box and whisker diagram we can also see that the median is closer to the first quartile than the third quartile. The fact that the right hand side tail of the distribution is longer than the left can also be seen.

从盒须图我们可以看到,中值更靠近第一四分位值。同时也可以看到,右边的尾巴比左边的长。

A distribution that is skewed left has exactly the opposite characteristics of one that is skewed right:

  • the mean is typically less than the median;
  • the tail of the distribution is longer on the left hand side than on
    the right hand side;and
  • the median is closer to the third quartile than to the first
    quartile.

一个向左偏的分布有着相反的性质:

  • 平均值比中值要明显得小。
  • 左边的尾巴比右边的要长。
  • 中值更靠近第三四分位值。
0 0
原创粉丝点击