coursera Machine learning Andrew NG 笔记（一）

来源：互联网发布：windows徽标键与t 编辑：程序博客网时间：2024/05/18 02:06

看到不少推荐Andrew Ng的机器学习的课程，所以在coursera上注册了开始学。2016年1月15日

1. Introduction

1. machine learning definition

Arthur Samuel(1959): Machine learning is a field of study that gives computers the ability to learn without explicitly programmed.

Tom Mitchell（1998）（CMU) :Well-posed learning problem：A computer program is said to learn from experience E with respect to some task T, and some performance measure P, if its performance T, as measured by P, improves with exprience E.

2. Main two types of machine learning

（1）supervised learning（监督学习）

（2）unsupervised learning（非监督学习）

其他还有reinforcement learning（比如check playing），recommender system

3. Supervised learning

(1) refers to the fact that we gave the algorithm a data set in which “right answers” were given.

(2)主要分为regression（回归，主要output 为连续的）和 classification（分类，output是离散的）问题，regression 可以为线性，可以为非线性。

(3) SVM(support vector machine,支持向量机）

4. Unsupervised learning

(1) 没有标签（label），没有告诉right answer

(2) Clustering problem(聚类问题）
E.g Google news group automatically cluster news stories into groups about the same topic.

(3) Applications: Genes; organizing large computing clusters; social network analysis; market segmentation; astronomical data analysis

(4)Cocktail party problem
分离两个不同来源但叠加在一起的声音input1和input2
svd: single value decomposition

2. Linear regression with one variable

Cost Function

1. Model Representation

traning set
m = # of traning examples

(x, y) a single traning example
()
hθx=θ0+θ1x1

2. Cost Function

Fit the best line to training examples
线性回归 Hypothesis: hθ(x)=θ0+θ1x1
idea：将x带入hθ(x)得到estimated y，
因此minimize J(θ0,θ1) = 12m∑i=1m(hθ(x(i))−y(i))2

cost function= J(θ0,θ1) = 12m∑i=1m(hθ(x(i))−y(i))2

3. Gradient Descent

gradient descent是解决线性回归问题的一种方法，不断重复以下步骤，使得J逐渐收敛至最小值
θj:=θj−αδδθjJ(θ0,θ1)
每一步θ0,θ1 都是同时update
需要注意的问题
选择合适大小的α
对于超过两个变量的cost function同样适用
可以归纳为
repeat{
θj:=θj−α1m∑i=1m(hθ(x(i))−y(i))x(i)j}
注意也是同时更新的
技巧
（1）feature scaling
（2）Mean normalization
（3）怎么判断gradient descent是适用的？
plots can be helpful

4. Linear regression and normal equations

θ∈Rn+1, J(θ0,θ1,......,θn)=12m∑i=1m(hθ(x(i))−y(i))2
对每一个θ求偏微分
set δδθj(θ)=0 (for every j)
solve for θ0,θ1,......,θn

可以推导出θ=(XTX)−1XTY

5. Normal equation和gradient descent 的对比

当变量很多时，如n≥106的时候，适合用gradientdescent
但n较小时，需要选择α,需要很多部才能收敛，效果不如normal equation

0 0