时间序列 R 10 其他进阶预测方法 Advanced forecasting methods

来源：互联网发布：程序员客栈怎么样编辑：程序博客网时间：2024/05/21 17:04

1 Dynamic regression models 动态回归模型

前面的内容中要么只考虑了时间，要么只考虑了其他自变量的影响，这一节将考虑各个变量和时间的综合影响。

1.1 regression models+ ARIMA models

首先我们简单的将回归和Arima组合，做一个简单的动态回归模型。
其组合的方法和实质就是将回归模型中的误差项变为时间序列的ARIMA，也可以理解为下式：

y t = 回 归 + A R I M A + e t

当ARIMA为（1，1，1）时可以写成下式

y t = β 0 + β 1 x 1, t + \dots + β k x k, t + n t, (1 - ϕ 1 B) (1 - B) n t = (1 + θ 1 B) e t,

另外
如果先将原始的数据（包括y和x）都进行差分，直到稳定，这时就可以看做回归+ARMA了。（为了保持y和x的关系不变，他们要差分相同的阶数）

1.1.1 模型估计

在将两种模型结合之后，参数依然可以用最小二乘法或者最大似然估计来计算建立模型（注意计算的是et不是nt）

1.1.2 手动建模步骤

一般的手动建模的方法为：
1. 数据预处理，包括log转换等
2. 检查数据稳定性，不稳定的需要差分值稳定。首先假设模型为regression+AR(2),如果数据是季节性模型则先假设为regression+ARIMA(2,0,0)(1,0,0)m。
3. 利用上面的假设计算出regression的参数后，计算nt,然后根据nt，估计一个合适的ARMA模型
4. 利用新估计的ARMA，重复计算regression模型
5. 检查et是否是白噪声
可以利用AICc来评价模型，选择得分最低的一个。

1.1.3 使用R

可以用Arima函数来计算如下栗子：（ xreg这个参数是加入regression）

 (fit2 <- Arima(usconsumption[,1], xreg=usconsumption[,2],              order=c(1,0,2)))Series: usconsumption[, 1] ARIMA(1,0,2) with non-zero mean Coefficients:         ar1      ma1     ma2  intercept  usconsumption[, 2]      0.6516  -0.5440  0.2187     0.5750              0.2420s.e.  0.1468   0.1576  0.0790     0.0951              0.0513sigma^2 estimated as 0.3396:  log likelihood=-144.27AIC=300.54   AICc=301.08   BIC=319.14

也可使用auto.arima（）

fit <- auto.arima(usconsumption[,1], xreg=usconsumption[,2])

1.1.4 随机趋势和确定性趋势

确定性趋势就是在没有差分的时候回归中有斜率，模型如下：

y t = β 0 + β 1 t + n t,

其中nt是ARMA模型
随机趋势是在一阶差分之后有趋势

y t = y t - 1 + β 1 + n' t .

其中nt是ARMA
这里类似于有漂移的random walk
http://blog.csdn.net/bea_tree/article/details/51223501#t8

栗子，计算旅游人数：
这里写图片描述
有确定性趋势：

> auto.arima(austa,d=0,xreg=1:length(austa))ARIMA(2,0,0) with non-zero mean Coefficients:         ar1      ar2  intercept  1:length(austa)      1.0371  -0.3379     0.4173           0.1715s.e.  0.1675   0.1797     0.1866           0.0102sigma^2 estimated as 0.02486:  log likelihood=12.7

结果如下：

y t n t e t = 0.42 + 0.17 t + n t = 1.04 n t - 1 - 0.34 n t - 2 + e t \sim NID (0, 0.025) .

随机趋势：

> auto.arima(austa,d=1)Series: austa ARIMA(0,1,0) with drift         Coefficients:      drift      0.154s.e.  0.033sigma^2 estimated as 0.0324:  log likelihood=9.07AIC=-14.14   AICc=-13.69   BIC=-11.34

结果如下：

y t n t e t = y 0 + 0.15 t + n t = n t - 1 + e t \sim NID (0, 0.032) .

预测结果：

fit1 <- Arima(austa, order=c(0,1,0), include.drift=TRUE)fit2 <- Arima(austa, order=c(2,0,0), include.drift=TRUE)par(mfrow=c(2,1))plot(forecast(fit2), main="Forecasts from linear trend + AR(2) error",  ylim=c(1,8))plot(forecast(fit1), ylim=c(1,8))

这里写图片描述
可以看出两者的预测区间不同
这说明确定性趋势的趋势不随时间改变.而随机趋势的趋势是根据历史数据来估计的, 随机趋势的预测区间较大，如果预测长时间的数据，用随机趋势比较好。

1.2 滞后模型

1.2.1 模型

有些模型中自变量并不是马上就作用于因变量，比如可能收入提高之后不会立即买东西而是在收入提高之后的一个月之后买东西，因此可以在原本的动态规划基础上加入之后因子，下式是只有一个因变量的时候的表达式

y t = β 0 + γ 0 x t + γ 1 x t - 1 + \dots + γ k x t - k + n t,

这里

nt是Arima模型
之后阶数k的模型可以同计算Arima的p,q的时候一起用AIC计算。

1.2.2 栗子

探究广告和报价之间的关系
这里写图片描述
计算如下

plot(insurance, main="Insurance advertising and quotations", xlab="Year")# Lagged predictors. Test 0, 1, 2 or 3 lags.Advert <- cbind(insurance[,2], c(NA,insurance[1:39,2]), c(NA,NA,insurance[1:38,2]), c(NA,NA,NA,insurance[1:37,2]))colnames(Advert) <- paste("AdLag",0:3,sep="")# Choose optimal lag length for advertising based on AIC# Restrict data so models use same fitting periodfit1 <- auto.arima(insurance[4:40,1], xreg=Advert[4:40,1], d=0)fit2 <- auto.arima(insurance[4:40,1], xreg=Advert[4:40,1:2], d=0)fit3 <- auto.arima(insurance[4:40,1], xreg=Advert[4:40,1:3], d=0)fit4 <- auto.arima(insurance[4:40,1], xreg=Advert[4:40,1:4], d=0)# Best model fitted to all data (based on AICc)# Refit using all datafit <- auto.arima(insurance[,1], xreg=Advert[,1:2], d=0)>  fitARIMA(3,0,0) with non-zero mean Coefficients:        ar1     ar2    ar3  intercept  AdLag0  AdLag1      1.412  -0.932  0.359      2.039   1.256   0.162s.e.  0.170   0.255  0.159      0.993   0.067   0.059sigma^2 estimated as 0.189:  log likelihood=-23.89AIC=61.78   AICc=65.28   BIC=73.6

y t = β 0 + γ 0 x t + γ 1 x t - 1 + n t, n t = ϕ 1 n t - 1 + ϕ 2 n t - 2 + ϕ 3 n t - 3 + e t

预测：

fc8 <- forecast(fit, xreg=cbind(rep(8,20),c(Advert[40,1],rep(8,19))), h=20)plot(fc8, main="Forecast quotes with advertising set to 8", ylab="Quotes")

这里写图片描述

2 向量自回归 Vector autoregressions

之前都只考虑了自变量对因变量的单边影响关系，这里我们要讨论下因变量与自变量间相互影响的关系，也就是这节中的 Vector autoregressions，VAR

2.1 模型

在AR中我们是探究的一个量与他之前1……k lag的关系

y t = c + ϕ 1 y t - 1 + ϕ 2 y t - 2 + \dots + ϕ p y t - p + e t,

现在是考虑多个变量间的关系，下式是考虑了两个y1和y2，1 阶lag（称为VAR（1））

y 1, t y 2, t = c 1 + ϕ 11, 1 y 1, t - 1 + ϕ 12, 1 y 2, t - 1 + e 1, t = c 2 + ϕ 21, 1 y 1, t - 1 + ϕ 22, 1 y 2, t - 1 + e 2, t

这里讲两个变量的公式联立，建立了彼此之间的联系。

在解方程的时候，上式同AR一样必须是稳定的，如果是不稳定的需要先进行差分稳定之后再解方程。
假设有k个变量（上式中有2个），p阶lag，（上式只有1lag）每一个方程有有1+k*p个系数，总共有k（1+kp）个系数。所以lag越小越容易计算。
对于p的选取有多中方法：
AIC, HQ, SC and FPE
其中SC又叫做BIC，可以产生较小的p
而AIC倾向于产生较大的p
VAR适用场景
1. 预测的变量相互影响，不要明确的解释各参数的含义;
2. 测试变量是否对预测有用（这是Granger 因果测试的基础）
3. 当变量突然变化时，分析它的响应作用
4. 误差方差分解（forecast error variance decomposition，where the proportion of the forecast variance of one variable is attributed to the effect of other variables.）
另外，还有一种情况本书没有做详细介绍：经济变量的本身是非平稳序列，但是，它们的线性组合却有可能是平稳序列。这时候需要建立VEC模型（vector error correction model），这里不做介绍

2.2 R 使用

有vars工具箱
下列代码中可以看出，AIC FPE指标推荐5 lag ；hq、sc推荐1 lag

使用 Portmanteau test测试误差的不相关，文中说var(1)var(2)都没有通过测试，var(3)通过所以使用var（3）

library(vars)> VARselect(usconsumption, lag.max=8, type="const")$selectionAIC(n)  HQ(n)  SC(n) FPE(n)      5      1      1      5 > var <- VAR(usconsumption, p=3, type="const")> serial.test(var, lags.pt=10, type="PT.asymptotic")Portmanteau Test (asymptotic)data:  Residuals of VAR object var Chi-squared = 33.3837, df = 28, p-value = 0.2219> summary(var)VAR Estimation Results:========================= Endogenous variables: consumption, income Deterministic variables: const Sample size: 161 Estimation results for equation consumption: ============================================                Estimate Std. Error t value Pr(>|t|)    consumption.l1  0.22280    0.08580   2.597 0.010326 *  income.l1       0.04037    0.06230   0.648 0.518003    consumption.l2  0.20142    0.09000   2.238 0.026650 *  income.l2      -0.09830    0.06411  -1.533 0.127267    consumption.l3  0.23512    0.08824   2.665 0.008530 ** income.l3      -0.02416    0.06139  -0.394 0.694427    const           0.31972    0.09119   3.506 0.000596 ***Estimation results for equation income: =======================================                Estimate Std. Error t value Pr(>|t|)    consumption.l1  0.48705    0.11637   4.186 4.77e-05 ***income.l1      -0.24881    0.08450  -2.945 0.003736 ** consumption.l2  0.03222    0.12206   0.264 0.792135    income.l2      -0.11112    0.08695  -1.278 0.203170    consumption.l3  0.40297    0.11967   3.367 0.000959 ***income.l3      -0.09150    0.08326  -1.099 0.273484    const           0.36280    0.12368   2.933 0.003865 ** ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation matrix of residuals:            consumption incomeconsumption      1.0000 0.3639income           0.3639 1.0000----------------------------------------------------------------------fcst <- forecast(var)plot(fcst, xlab="Year")

这里写图片描述

1.3 神经网络 Neural network models

关于啥是神经网络看这里：
http://blog.csdn.net/bea_tree/article/details/51174511
这节要用的就是使用神经网络与ARIMA结合
这里以一层前馈神经网路为例：
NNAR(p,k)代表有p阶lag，k个神经元的神经网络，
NNAR(p,0)相当于ARIMA(p,0,0)
对于季节性数据：
形式为NNAR(p,P,K)m 以为输入的是(yt−1,yt−2,…,yt−p,yt−m,yt−2m,yt−Pm)并且有k个隐藏神经单元。
NNAR(p,P,0)m 与ARIMA(p,0,0)（P,0,0）m 相同
函数为nnetar()。若p and P未指定, 会被自动选择. 对于非季节性数据会根据AIC计算的线性AR（p）设置默认值，季节性数据P=1，p 根据线性设置. 如果k未指定则 k=(p+P+1)/2（选择其最近的整数）.

4 Forecasting hierarchical or grouped time series

有些物品是可以分类的，比如自行车可以分为山地车，公路车等等,有时候我们需要各个类别的预测值，本文将利用这种层次结构进行预测
下图是两层时间序列
这里写图片描述
含义为total分了两类a b a由分了三类 abc b由分了两类 a b
定义总的系列（series）数为：

n = 1 + n 1 + \dots + n K

他一共有8个系列（1+2+5）
,
用矩阵表达为
这里写图片描述

简记为

y t = S y K, t,

其中S是求和的意思。
我们要预测时total这一series，方法是用以前学过的方法来预测各个series的数据，然后用他们来估计最后的预测值。

4.1 自底向上

用最底层的来预测各个层的值，以图9.12为例
首先预测 AA AB AC的值相加得到A的值
预测ＢＡ　ＢＢ相加得到Ｂ的值
然后A+B得到total的值
用矩阵的形式表达为

y ~ h = S y^K, h .

其优点是不会漏掉数据，但是底层可能有较多的噪声，很难精确预测各个上层的数值。

4.2 自上至下

先预测最上层的，然后根据比例还计算最下层的，然后利用上面的矩阵公式来计算各个层的，比例计算方法

Average historical proportions

p j = 1 T \sum t = 1 T y j , t y t

yj,t代表底层的第j个节点，

pj代表它所占有的比例

Proportions of the historical averages

p j = \sum t = 1 T y j , t T / \sum t = 1 T y t T

Forecasted proportions

以上的比例都不能表示出随着比例随着时间的变化关系
这里利用预测值的比例来进行计算。比例由下式得到

p j = \prod ℓ = 0 K - 1 y ^ ( ℓ ) j , h S ^ ( ℓ + 1 ) j , h

以图9.12 解释之
记
这里写图片描述

先从左边

y ~ A, h = (y ^ A , h S ^ ( 1 ) A , h) y ~ h = (y ^ ( 1 ) AA , h S ^ ( 2 ) AA , h) y ~ h

然后

y ~ AA, h = (y ^ AA , h S ^ ( 1 ) AA , h) y ~ A, h = (y ^ AA , h S ^ ( 1 ) AA , h) (y ^ ( 1 ) AA , h S ^ ( 2 ) AA , h) y ~ h .

得到

p 1 = (y ^ AA , h S ^ ( 1 ) AA , h) (y ^ ( 1 ) AA , h S ^ ( 2 ) AA , h)

4.3 Middle-out approach

从中间选节预测节点，该点以上level（层）的用自下至上，该层以下的用自上至下

4.4 The optimal combination approach

y^h = S β h + ε h

4.5 R The hts package

hts函数需要输入最底层的数据和结构信息
以9.12的结构为例：

# bts is a time series matrix containing the bottom level series# The first three series belong to one group, and the last two series# belong to a different groupy <- hts(bts, nodes=list(2, c(3,2)))

栗子

这里写图片描述
这是澳大利亚的旅游业的层次结构，中间层是州的名字。
使用hts，方法默认为The optimal combination approach

library(hts)y <- hts(vn, nodes=list(4,c(2,2,2,2)))allf <- forecast(y, h=8)plot(allf)

各个系列的预测结果如下：
这里写图片描述

0 0