Coursera机器学习第一周学习笔记

来源：互联网发布：网络免费打电话下载编辑：程序博客网时间：2024/04/29 17:38

Coursera机器学习第一周学习笔记

近期开始学习Coursera上Andrew Ng的机器学习课程，通过博客的方式，做好学习记录、整理，时刻巩固以及后期回顾，这门课真的很nice，感兴趣的朋友可以多去听一听，本文写作内容，多来自课程中的视频、讲义及ppt。

第一周（Introduction——welcome）

1.What is machine learning

definition：

Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
Tom Mitchell (1998) Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
对于机器学习，有两种定义：
目前常用的是第二种，Tom Mitchell所说的，计算机程序从经验E中学习相关于任务T和绩效指标P，他在T上的表现由P测量，并随着经验E改善。

举了个下棋的例子。
Example: playing checkers.
E = the experience of playing many games of checkers
T = the task of playing checkers.
P = the probability that the program will win the next game.

In general, any machine learning problem can be assigned to one of two broad classifications:Supervised learning and Unsupervised learning.监督学习和非监督学习
Others: Reinforcement learning, recommender systems.

2.监督学习和非监督学习

2.1监督学习Supervised learning

definition：In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.
就是说在监督学习中，首先给定一个数据集，而且已知正确的输出应该是什么样，也即给出正确答案。

Supervised learning problems are categorized into “regression” and “classification” problems.

In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function.
In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.

也就是说，监督学习算法主要分为两大类，回归问题（regression）和分类问题（classification）。回归问题通过连续的输出量来预测结果，而分类问题就是从离散的输出中来预测结果。

两个例子：

根据房地产市场规模的数据，尝试预测房价。不同大小的房子价格构成的函数是连续的输出，所以这是一个回归问题。当然，这里也可以变为一个分类问题，“房子的价格比我给出的价格低还是高”，就把房子分为两类。
（a）回归 - 给定一个人的照片，我们必须根据给定的图片来预测他们的年龄
（b）分类 - 对于肿瘤患者，我们必须预测肿瘤是恶性还是良性。

这里写图片描述

2.2非监督学习Unsupervised learning

definition：Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don’t necessarily know the effect of the variables.
非监督学习中，不一定知道变量的影响，通过基于数据中的变量之间的关系来对数据进行聚类，切非监督学习没有基于预测结果的反馈。

例子：