如何向外行解释机器学习

来源：互联网发布：java工程师等级编辑：程序博客网时间：2024/05/01 16:56

买点芒果去

假设有一天你准备去买点芒果。有个小贩摆放了一车。你可以一个一个挑，然后小贩根据你挑的芒果的斤两来算钱（在印度的典型情况）。显然，你想挑最甜最熟的芒果对吧（因为小贩是按芒果的重量来算钱，而不是按芒果的品质来算钱的）。可是你准备怎么挑呢？

你记得奶奶和你说过，嫩黄的芒果比暗黄的甜。所以你有了一个简单的判断标准：只挑嫩黄的芒果。你检查各个芒果的颜色，挑了些嫩黄的，买单，走人，爽不？

可没那么简单。

datamining

生活是很复杂的

你回到家，开始慢慢品尝你的芒果。你发现有一些芒果没有想的那么甜。你焦虑了。显然，奶奶的智慧不够啊。挑芒果可不是看看颜色那么简答的。

经过深思熟虑（并且尝了各种不同类型的芒果），你发现那些大个儿的，嫩黄的芒果绝对是甜的，而小个儿，嫩黄的芒果，只有一半的时候是甜的（比如你买了100个嫩黄的芒果，50个比较大，50个比较小，那么你会发现50个大个儿的芒果是甜的，而50个小个儿的芒果，平均只有25个是甜的）。

你对自己的发现非常开心，下次去买芒果的时候你就将这些规则牢牢的记在心里。但是下次再来到市集的时候，你发现你最喜欢的那家芒果摊搬出了镇子。于是你决定从其它卖芒果的小贩那里购买芒果，但是这位小贩的芒果和之前那位产地不同。现在，你突然发现你之前学到的挑芒果办法（大个儿的嫩黄的芒果最甜）又行不通了。你得从头再学过。你在那位小贩那里，品尝了各类芒果，你发现在这里，小个儿、暗黄的芒果其实才是最甜的。

没多久，你在其它城市的远房表妹来看你。你准备好好请她吃顿芒果。但是她说芒果甜不甜无所谓，她要的芒果一定要是最多汁的。于是，你又用你的方法品尝了各种芒果，发现比较软的芒果比较多汁。

之后，你搬去了其它国家。在那里，芒果吃起来和你家乡的味道完全不一样。你发现绿芒果其实比黄芒果好吃。

再接着，你娶了一位讨厌芒果的太太。她喜欢吃苹果。你得天天去买苹果。于是，你之前积累的那些挑芒果的经验一下子变的一文不值。你得用同样的方法，去学习苹果的各项物理属性和它的味道间的关系。你确实这样做了，因为你爱她。

有请计算机程序出场

现在想象一下，最近你正在写一个计算机程序帮你挑选芒果（或者苹果）。你会写下如下的规则：

1
2
if(颜色是嫩黄 and 尺寸是大的 and 购自最喜欢的小贩): 芒果是甜的
if(软的): 芒果是多汁的

等等等等。

你会用这些规则来挑选芒果。你甚至会让你的小弟去按照这个规则列表去买芒果，而且确定他一定会买到你满意的芒果。

但是一旦在你的芒果实验中有了新的发现，你就不得不手动修改这份规则列表。你得搞清楚影响芒果质量的所有因素的错综复杂的细节。

如果问题越来越复杂，则你要针对所有的芒果类型，手动地制定挑选规就变得非常困难。你的研究将让你拿到芒果科学的博士学位（如果有这样的学位的话）。

可谁有那么多时间去做这事儿呢。

有请机器学习算法

机器学习算法是由普通的算法演化而来。通过自动地从提供的数据中学习，它会让你的程序变得更“聪明”。

你从市场上的芒果里随机的抽取一定的样品（训练数据），制作一张表格，上面记着每个芒果的物理属性，比如颜色，大小，形状，产地，卖家，等等。（这些称之为特征）。

还记录下这个芒果甜不甜，是否多汁，是否成熟（输出变量）。你将这些数据提供给一个机器学习算法（分类算法/回归算法），然后它就会学习出一个关于芒果的物理属性和它的质量之间关系的模型。

下次你再去市集，只要测测那些芒果的特性（测试数据），然后将它输入一个机器学习算法。算法将根据之前计算出的模型来预测芒果是甜的，熟的，并且/还是多汁的。

该算法内部使用的规则其实就是类似你之前手写在纸上的那些规则（例如，决策树），或者更多涉及到的东西，但是基本上你就不需要担心这个了。

瞧，你现在可以满怀自信的去买芒果了，根本不用考虑那些挑选芒果的细节。更重要的是，你可以让你的算法随着时间越变越好（增强学习），当它读进更多的训练数据，它就会更加准确，并且在做了错误的预测之后自我修正。但是最棒的地方在于，你可以用同样的算法去训练不同的模型，比如预测苹果质量的模型，桔子的，香蕉的，葡萄的，樱桃的，西瓜的，让所有你心爱的人开心：）

这，就是专属于你的机器学习，是不是很酷啊。

机器学习：让你的算法更聪明，所以你就可以偷懒喽

原文链接： Pararth Shah 翻译：伯乐在线 - jiqihuman
译文链接： http://blog.jobbole.com/50338/
[ 转载必须在正文中标注并保留原文链接、译文链接和译者等信息。]

How do you explain Machine Learning to non Computer Science people?

Mango Shopping

Suppose you go shopping for mangoes one day. The vendor has laid out a cart full of mangoes. You can handpick the mangoes, the vendor will weigh them, and you pay according to a fixed Rs per Kg rate (typical story in India).

Obviously, you want to pick the sweetest, most ripe mangoes for yourself (since you are paying by weight and not by quality). How do you choose the mangoes?

You remember your grandmother saying that bright yellow mangoes are sweeter than pale yellow ones. So you make a simple rule: pick only from the bright yellow mangoes. You check the color of the mangoes, pick the bright yellow ones, pay up, and return home. Happy ending?

Not quite.

Life is complicated

Suppose you go home and taste the mangoes. Some of them are not sweet as you'd like. You are worried. Apparently, your grandmother's wisdom is insufficient. There is more to mangoes than just color.

After a lot of pondering (and tasting different types of mangoes), you conclude that the bigger, bright yellow mangoes are guaranteed to be sweet, while the smaller, bright yellow mangoes are sweet only half the time (i.e. if you buy 100 bright yellow mangoes, out of which 50 are big in size and 50 are small, then the 50 big mangoes will all be sweet, while out of the 50 small ones, on average only 25 mangoes will turn out to be sweet).

You are happy with your findings, and you keep them in mind the next time you go mango shopping. But next time at the market, you see that your favorite vendor has gone out of town. You decide to buy from a different vendor, who supplies mangoes grown from a different part of the country. Now, you realize that the rule which you had learnt (that big, bright yellow mangoes are the sweetest) is no longer applicable. You have to learn from scratch. You taste a mango of each kind from this vendor, and realize that the small, pale yellow ones are in fact the sweetest of all.

Now, a distant cousin visits you from another city. You decide to treat her with mangoes. But she mentions that she doesn't care about the sweetness of a mango, she only wants the most juicy ones. Once again, you run your experiments, tasting all kinds of mangoes, and realizing that the softer ones are more juicy.

Now, you move to a different part of the world. Here, mangoes taste surprisingly different from your home country. You realize that the green mangoes are in fact tastier than the yellow ones.

You marry someone who hates mangoes. She loves apples instead. You go apple shopping. Now, all your accumulated knowledge about mangoes is worthless. You have to learn everything about the correlation between the physical characteristics and the taste of apples, by the same method of experimentation. You do it, because you love her.

Enter computer programs

Now, imagine that all this while, you were writing a computer program to help you choose your mangoes (or apples). You would write rules of the following kind:

if (color is bright yellow and size is big and sold by favorite vendor): mango is sweet.
if (soft): mango is juicy.
etc.

You would use these rules to choose the mangoes. You could even send your younger brother with this list of rules to buy the mangoes, and you would be assured that he will pick only the mangoes of your choice.

But every time you make a new observation from your experiments, you have to manually modify the list of rules. You have to understand the intricate details of all the factors affecting the quality of mangoes. If the problem gets complicated enough, it can get really difficult to make accurate rules by hand that cover all possible types of mangoes. Your research could earn you a PhD in Mango Science (if there is one).

But not everyone has that kind of time.

Enter Machine Learning algorithms

ML algorithms are an evolution over normal algorithms. They make your programs "smarter", by allowing them to automatically learn from the data you provide.

You take a randomly selected specimen of mangoes from the market (training data), make a table of all the physical characteristics of each mango, like color, size, shape, grown in which part of the country, sold by which vendor, etc (features), along with the sweetness, juicyness, ripeness of that mango (output variables). You feed this data to the machine learning algorithm (classification/regression), and it learns a model of the correlation between an average mango's physical characteristics, and its quality.

Next time you go to the market, you measure the characteristics of the mangoes on sale (test data), and feed it to the ML algorithm. It will use the model computed earlier to predict which mangoes are sweet, ripe and/or juicy. The algorithm may internally use rules similar to the rules you manually wrote earlier (for eg, adecision tree), or it may use something more involved, but you don't need to worry about that, to a large extent.

Voila, you can now shop for mangoes with great confidence, without worrying about the details of how to choose the best mangoes. And what's more, you can make your algorithm improve over time (reinforcement learning), so that it will improve its accuracy as it reads more training data, and modifies itself when it makes a wrong prediction. But the best part is, you can use the same algorithm to train different models, one each for predicting the quality of apples, oranges, bananas, grapes, cherries and watermelons, and keep all your loved ones happy :)

And that, is Machine Learning for you. Tell me if it isn't cool.

Machine Learning: Making your algorithms smart, so that you don't need to be. ;)