Andrew Ng Class

来源：互联网发布：罗伯罗夫斯基仓鼠知乎编辑：程序博客网时间：2024/05/14 03:30

x(i):denotes the input variables(eg.living area ),also called feartures;
y(i):denotes the output or target variable that we are trying to predict(eg.house price)
(x(i),y(i)):a pair of these called training example;
(x(i),y(i));i=1,2,...,m called training set.
h(x):Our goal is ,given a training set,to learn a function h to predict the corresponding value of y is well,this function called hypothesis;the hypothesis’s job is to map this input to output;
hθ(x):the price that my hypothesis predicts a house with features X costs(X is a vector)
x1(i):is the living area of the i-th house in the training set;
x2(i): is its numbers of bedroom;
x0=1:surppose
θi:is parameters (also called weights) parameterizing the space of linear functions;
θ:is a vector;
x:is a vector;
n:is the number of input variables

Regression:
when the target variable that we’re trying to predict is continuous,such as in our housing,we call the learning problem a regression problem.
classification
when y can take only a small number of discrete values(such as if,given the living area,we wanted to predict if a dwelling is a house
or an apartment,say),we call it a classification problem.

now given a training set ,how do we pick,or learn,the parameters theta?
one reasonable method seems to be make h(x) close to y.
In order to design a learning algorithm,the first we have to decide is how we want to represent the hypothesis?
we are going to use a linear represention for the hypothesis;
how do we chose the parameters theta,so that our hypothesis H will make accurate predictions about all the houses?
a search algorithm:
1.the basic idea we’ll start with some value of theta,my parameter vector theta;
2.keep changing theta to reduce J(theta) a little bit,until we hopefully end up at the minimum with respect to J(theta)
Gradient Descent
1.one property of gradient decent is that gradient decent can sometimes depend on where you initialize your parameters,

0 0