SAS Timeseries Modelling
来源:互联网 发布:手机怎样申请淘宝账号 编辑:程序博客网 时间:2024/06/07 15:45
/*************************************************************************************************************//* SB RAD FORECAST MACRO *//* Updated on 04/30/2014 *//* Input Data: sb_rad_input & sb_rad_lob_input(tabs from input Excel) *//* Output Data: sb_rad_forecasting_output.xlsx *//* *//* Purpose: The project adapts exponential smoothing methods to replace current one dimensional forecasting *//* approach(phasing method) for DELL SB business, so that the model built could be more robust and *//* accurate. For simplicity, a macro was created to build the time series model automatically. */ /* *//* Modeling Process: *//* - Step1: 1) Define library where the raw data files are stored *//* 2) Import raw data from Excel to SAS, and create the date variable "month_id" *//* 3) Simple QC */ /* - Step2: Define macro %opt_model & %esm_model *//* - Step3: Run *//* %esm_model(inputdata= dt.sb_rad_input, date= month_id, weeks= dt.weeks_in_month_lookup, *//* segment1= forecast_segment1, segment2= forecast_segment2, *//* predictors= &preds, finalout= sb_rad_forecast_output) *//* and *//* %esm_model(inputdata= dt.sb_rad_lob_input, date= month_id, weeks= dt.weeks_in_month_lookup, *//* segment1= forecast_segment1, segment2= forecast_segment2, *//* predictors= &preds, finalout= sb_rad_lob_forecast_output) *//* *//* where "inputdata" is dataset for model building, *//* "date" is a date variable, you don't need to change it in most cases, *//* "weeks" is dataset for appending variable "weeks_in_month", *//* "segment1" & "segmengt2" are variables for spilting dataset, *//* "predictors" are 5 target variables for forecasting (assigned to macro &pred), *//* "finalout" is the dataset we need finally. *//* - Step4: Export final result sb_rad_forecast_output & sb_rad_lob_forecast_output to *//* sb_rad_forecasting_output.xlsx. *//* *//* Notes: */ /* - 1) The data must contain a valid date variable "month_id" that gives the date of each observation. *//* And the date(month_id) must be continuous, or no model will be built and a warning message will *//* will be presented in SAS log. */ /* - 2) The variables segment1 and segment2 should be assigned, if more than one of them is not assigned, *//* the program will terminate. *//* - 3) Date variable: fiscal_year, fiscal_quarter(value 1-4), fiscal_month (value 1-12) *//* - 4) There must be 5 predictors: sites, sites_purchasing, site_lapsing,revenue and margin.The position/spelling of *//* the predictors don't affect the model building. *//* - 5) We spilt data to several sub-datasets using segment1 & segment2. If sub-dataset's records are *//* less than 12, no model will be built for current data. Forecasts for past months will be blank. *//* No predictions for the upcoming 12 months. At the same time, a warning message will be presented *//* in SAS log. */ /* - 6) In general, the macro uses 2 trasformations(none & log) and 7 smoothing methods(simple, double, *//* linear, seasonal, addseasonal, winters and addwinters) to build models. But if negative value *//* exists, we use non-transformed data. And only 6 of 7 methods are applied to build model, because *//* winters method cannot be used for negative data. *//* - 7) For forecasting records, if number of months in a quarter is not equal to 3, then all records in *//* this quarter will be deleted. Only records with completed quarter will be kept. *//* - 8) The tab names should be edited manually after exporting the result to *.xlsx file, because SAS *//* cannot specify the proper tab names we need(extra '_' will be created in place of space). */ /* *//*************************************************************************************************************/options mprint;%let path = L:\dell_qmgisi\SMB NA Analytics Shanghai\Regular update (SB RAD Forecast)\2014-06\data;libname excel "&path.\RAD Forecast Attainment Output (SB Only) - 20140619.xlsx"; /* original excel data */*libname weeks "&path1.\Weeks in Month Lookup Table.xlsx"; /* Weeks in month lookup table */libname dt "&path."; %let outpath = &path.\SB RAD Forecasting Output - SB.xlsx ; /* This should be the same file as excel*//***********************************************************************************//* No need to update the following unless there is change in variable or tab names *//***********************************************************************************/%let rad_tab = RAD_Forecasting_Output;/*%let rad_lob_tab = RAD_LOB_Forecasting_Output;*/%let preds = sites sites_purchasing sites_lapsing revenue margin; /* list of predictors in input data */%let yr = fiscal_year;%let qtr = fiscal_quarter;%let mth = fiscal_month;%let wk = weeks_in_month;/****************************************************************************//* Please make sure the following variables names are not changed *//* forecast_segment1, forecast_segment2 *//****************************************************************************//****************************************************************************//* Do not change the following!!!! *//****************************************************************************//********************************************************************//* Step1: Import Raw Data *//********************************************************************//* Prepare data */data dt.sb_rad_forecast_input;set excel."RAD Forecasting Input$"n; /* "RAD Forecasting Input" is the tab name in excel */month_id = mdy(&mth., 01, &yr.);site_new = input(compress(sites), 8.);sites_lapsing_new = input(compress(sites_lapsing), 8.);drop sites sites_lapsing; rename site_new = sites sites_lapsing_new = sites_lapsing;format month_id date9.;run;* 108;/* Weeks in month lookup *//*data dt.weeks_in_month_lookup;*//*set weeks."Sheet1$"n;*//*run;*/* 241;/* QC */proc contents data = dt.sb_rad_forecast_input;run;proc print data= dt.sb_rad_forecast_input;run;proc sql;select forecast_segment1, forecast_segment2, count(*) as cntfrom dt.sb_rad_forecast_inputgroup by 1, 2order by 1, 2, 3;quit;/********************************************************************//* Step2: %OPT_MODEL & %ESM_MODEL macros *//********************************************************************//* OPT_MODEL MACRO *//* -- 1) Build the optimized model based on the max adjusted R^2 for each dataset & target variable */%macro opt_model(data=, target=, ind_var=, pred=);%let methods = Simple Double Linear Addseasonal Seasonal Addwinters Winters ; %let trans = None Log; %local i j;title2 "Target Variable: &target.";proc sort data = &data. out = data_sorted;by &yr. &qtr. &mth.;run;/* Processing current data and assign intv(interval) & leads(lead) according to the target variable */%if %eval(%qupcase(&target.) EQ SITES) %then %do;%let intv = qtr; %let leads = 4;data data_ready;set data_sorted;by &yr. &qtr. &mth.;if (first.&yr. or first.&qtr.) and first.&mth.;run;%end;%else %if %eval(%qupcase(&target.) NE SITES) %then%do; %let intv = month;%let leads = 12;data data_ready;set data_sorted;run;%end;/* Get minimum value, and check nagative value in the following step, no model will be built if all values are missing */proc sql;select min(&target.) format 20. into : min_value from data_ready;quit;%if %eval(&min_value. NE .) %then %do;/* Select different trans & methods for nagative/positive dataset */%if %eval(&min_value. GT 0) %then %do; %let i_max = 7; %let j_max = 2; %end;%else %do; %let i_max = 6; %let j_max = 1; %end;/* Build models using methods & trnas */%do i = 1 %to &i_max.;%do j = 1 %to &j_max.;proc esm data = data_ready lead = &leads. outstat = stat_&i._&j.; /* output data stat_*_*(includes adjusted R^2) */id &ind_var. interval = &intv.;forecast &target. / model = %scan(&methods., &i.) transform = %scan(&trans., &j.);run;data stat_&i._&j.;informat trans $10. method $20.;format trans $10. method $20.;set stat_&i._&j.;trans = "%scan(&trans., &j.)" ;method = "%scan(&methods., &i.)" ;run;%end;%end;/* Combine all model's statistical result */data models_temp1;set %do i = 1 %to &i_max.;%do j = 1 %to &j_max.;stat_&i._&j.%end;%end;;run;/* Choose opt result based on max adjusted R^2 */proc sql;create table models_temp2 asselect * from (select *, max(adjrsq) as max_adjrsq from models_temp1 ) awhere adjrsq = max_adjrsq;quit;/* Get the opt trans, method, adjusted R^2 based on max adjusted r-square */proc sql;select trans into : opt_trans from models_temp2;select method into : opt_method from models_temp2;select adjrsq into : opt_adjrsq from models_temp2;quit;/* Build optimized model according to the opt trans & method we choose in the previous step */proc esm data = data_ready lead = &leads. outfor = pred_temp ;id &ind_var. interval = &intv.; forecast &target. / model = &opt_method. transform = &opt_trans.;run;/* Keep the variables we need */data &pred.;informat method_&target. $20.;set pred_temp( rename = (predict = forecasted_&target. ));/*where actual =.;*/ adj_r_square_&target. = &opt_adjrsq.; /* adjusted R^2 */method_&target.= compress("&opt_trans.")||" "||compress("&opt_method."); /* trans & method */keep &ind_var. forecasted_&target. adj_r_square_&target. method_&target.;run;%end;%else %do; data &pred.; informat method_&target. $20.;&ind_var. = . ;forecasted_&target. = . ;adj_r_square_&target. = . ;method_&target. = "";run; %end;%mend opt_model;/* ESM_MODEL MACRO *//* -- 1) Divide the inputdata into several sub-datasets using forecasting segments: RAD & Channels. *//* -- 2) Build 4 optimized models for each sub-dataset respectively based on the max adjusted r-square. */%macro esm_model(inputdata=, date=, weeks=, segment1=, segment2=, predictors=, finalout=);/* Check whether segment1 & segment2 are assigned */%if %length(&segment1.) = 0 or %length(&segment2.) = 0 %then %do;%put ERROR: segment1 or segment2 is not available ; %abort;%end; %else %do;/* Scan four target variables */%let predictor_1 = %scan(&predictors., 1);%let predictor_2 = %scan(&predictors., 2);%let predictor_3 = %scan(&predictors., 3);%let predictor_4 = %scan(&predictors., 4);%let predictor_5 = %scan(&predictors., 5);/* Sort the inputdata before building models */proc sort data = &inputdata. out = inputdata_sorted;by &segment1. &segment2. &yr. &qtr. &mth.;run;/* Divide into several sub-datasets using segment1 & segment2 */proc sql;select distinct quote(compress(%nrbquote(&segment1.)||%nrbquote(&segment2.))) into : div_seg separated by "," from inputdata_sorted;select count(distinct %nrbquote(&segment1.)||%nrbquote(&segment2.)) into : div_num from inputdata_sorted;quit; data %do i = 1 %to &div_num.; dataset_&i. %end; ;set inputdata_sorted;select(compress(&segment1.||&segment2.));%do i = 1 %to &div_num.;when(%scan(%nrbquote(&div_seg.), &i., ',')) output dataset_&i.;%end;otherwise;end;run;/* Build 4 optimized models for each sub-dataset & target variable respectively */%do m = 1 %to &div_num.;title1 "Current Dataset";proc sql;select distinct &segment1. into : curr_segment1 from dataset_&m.;select distinct &segment2. into : curr_segment2 from dataset_&m.;select count(*) format 8. into : curr_record_&m. from dataset_&m.;quit;/* Check whether month_id(date) is continuous */data _null_;set dataset_&m. end = eof;retain diff_sum 0;diff_year = &yr. - lag(&yr.);diff_month = &mth.- lag(&mth.);diff = diff_year*12 + diff_month;if _n_ = 1 then diff = 0;diff_sum = diff_sum + diff;if eof then call symputx("diff_curr_&m.", diff_sum);run; /* Limit the records to 12 and date is continuous */%if %eval(&&curr_record_&m. GE 12) and %eval(&&diff_curr_&m. EQ &&curr_record_&m. - 1) %then%do;title1 "Dataset with %nrbquote(&curr_segment1.) and %nrbquote(&curr_segment2.)";%opt_model(data= dataset_&m., target= &predictor_1., ind_var= &date., pred= pred_&predictor_1._&m.);%opt_model(data= dataset_&m., target= &predictor_2., ind_var= &date., pred= pred_&predictor_2._&m.);%opt_model(data= dataset_&m., target= &predictor_3., ind_var= &date., pred= pred_&predictor_3._&m.);%opt_model(data= dataset_&m., target= &predictor_4., ind_var= &date., pred= pred_&predictor_4._&m.);%opt_model(data= dataset_&m., target= &predictor_5., ind_var= &date., pred= pred_&predictor_5._&m.); /* Merge 4 model's predicted value(sites, sites_purchasing, revenue and margin) by month_id */data finalout_temp1_&m.;merge dataset_&m. pred_&predictor_1._&m. pred_&predictor_2._&m. pred_&predictor_3._&m. pred_&predictor_4._&m. pred_&predictor_5._&m.;by &date.; /* add current segment1 & segment2 */&segment1. = "%nrbquote(&curr_segment1.)";&segment2. = "%nrbquote(&curr_segment2.)";if &date. = . then delete; /* delete missing row */run; /* Get variable weeks_in_month */proc sql;create table finalout_temp2_&m. asselect year(a.&date.) as &yr.,qtr(a.&date.) as &qtr.,month(a.&date.) as &mth.,a.&date.,b.&wk.,&segment1.,&segment2.,&predictor_1.,&predictor_2.,&predictor_3.,&predictor_4.,&predictor_5.,forecasted_&predictor_1.,forecasted_&predictor_2.,forecasted_&predictor_3.,forecasted_&predictor_4.,forecasted_&predictor_5.,method_&predictor_1.,method_&predictor_2.,method_&predictor_3.,method_&predictor_4.,method_&predictor_5.,adj_r_square_&predictor_1.,adj_r_square_&predictor_2.,adj_r_square_&predictor_3.,adj_r_square_&predictor_4.,adj_r_square_&predictor_5.from finalout_temp1_&m. aleft join &weeks. bon year(a.&date.) = b.&yr. and qtr(a.&date.) = b.&qtr. and month(a.&date.) = b.&mth. ;quit;/* Keep completed forecasting quarter */data _null_;set finalout_temp2_&m. end = eof;if eof then call symputx("last_month_forecast", &mth.);run;%let last_obs = %eval(&&curr_record_&m. + 12 - %sysfunc(mod(&last_month_forecast., 3))); /* the last obs for current forecasting result *//* Final result for each dataset */data finalout_&m.;set finalout_temp2_&m.(obs = &last_obs.);run;%end; /* If records less than 12/date is not continuous, a warning message is presented in SAS log */%else %if %eval(&&curr_record_&m. LT 12) %then %put Warning: current dataset_&m.(&&curr_record_&m. records)less than 12 records; %else %if %eval(&&diff_curr_&m. NE &&curr_record_&m. -1) %then %put Warning: current dataset_&m., month_id is not continuous;%end;/* All sub-datasets result */data finalout_temp1;set %do i = 1 %to &div_num.;%if %eval(&&curr_record_&i. GE 12) and %eval(&&diff_curr_&i. EQ &&curr_record_&i. - 1 ) %then finalout_&i.;%else dataset_&i.;%end;;run;/* Aggregate values in quarter */proc sort data= finalout_temp1 out= finalout_temp2;by &segment1. &segment2. &yr. &qtr. &mth.;run;/* Data finalout_agg */data finalout_temp3 ;set finalout_temp2;by &segment1. &segment2. &yr. &qtr. &mth. ;retain %do i = 1 %to 5; %scan(&predictors., &i.)_agg forecasted_%scan(&predictors., &i.)_agg %end;;/* Initialize aggregated variables(actual & forecasted) */if first.&yr. or first.&qtr. then do;%do i = 1 %to 5;%scan(&predictors., &i.)_agg = 0; forecasted_%scan(&predictors., &i.)_agg = 0;%end;end; /* Accumulate values month by month within each quarter */%do n =1 %to 5;forecasted_%scan(&predictors., &n.)_agg = forecasted_%scan(&predictors., &n.)_agg + forecasted_%scan(&predictors., &n.);%scan(&predictors., &n.)_agg = %scan(&predictors., &n.)_agg + %scan(&predictors., &n.);%end;/* Process missing values : method_ & adj_r_square_ */array arrayNA1[5] %do i = 1 %to 5; method_%scan(&predictors., &i.) %end; ;array arrayNA2[5] %do i = 1 %to 5; adj_r_square_%scan(&predictors., &i.) %end; ;do m = 1 to 5;if arrayNA1[m] = " " then arrayNA1[m] = "NONE";if arrayNA2[m] = . then arrayNA2[m] = 0;end;/*drop &date. m forecasted_&predictor_1. forecasted_&predictor_2. forecasted_&predictor_3. forecasted_&predictor_4.; */run;/* Format process */data finalout_temp4;set finalout_temp3;/* sites and sites_purchasing */%do i = 1 %to 3;forecasted_%scan(&predictors., &i.)_agg = round(forecasted_%scan(&predictors., &i.)_agg);%scan(&predictors., &i.)_agg = round(%scan(&predictors., &i.)_agg);%end;/* revenue and margin */%do i = 4 %to 5;forecasted_%scan(&predictors., &i.)_agg = input(put(forecasted_%scan(&predictors., &i.)_agg, d11.9), d11.9);%scan(&predictors., &i.)_agg = input(put(%scan(&predictors., &i.)_agg , d11.9), d11.9);%end;run;/* Final result */proc sql;create table &finalout. asselect &yr.,&qtr.,&mth.,&wk.,&segment1.,&segment2.,&predictor_1.,&predictor_2.,&predictor_3.,&predictor_4.,&predictor_5.,forecasted_&predictor_1._agg as forecasted_&predictor_1.,forecasted_&predictor_2._agg as forecasted_&predictor_2.,forecasted_&predictor_3._agg as forecasted_&predictor_3.,forecasted_&predictor_4._agg as forecasted_&predictor_4.,forecasted_&predictor_5._agg as forecasted_&predictor_5.,&predictor_1._agg as actual_&predictor_1.,&predictor_2._agg as actual_&predictor_2.,&predictor_3._agg as actual_&predictor_3.,&predictor_4._agg as actual_&predictor_4.,&predictor_5._agg as actual_&predictor_5.,method_&predictor_1.,method_&predictor_2.,method_&predictor_3.,method_&predictor_4.,method_&predictor_5.,adj_r_square_&predictor_1.,adj_r_square_&predictor_2.,adj_r_square_&predictor_3.,adj_r_square_&predictor_4.,adj_r_square_&predictor_5.from finalout_temp4;quit;%end;%mend esm_model;/********************************************************************//* Step3: Run the example *//********************************************************************//* Input data : dt.sb_rad_input *//* Output data: sb_rad_forecast_output *//*proc printto log = "&path.\sb_rad_log_file.log" new; run;*/%esm_model(inputdata= dt.sb_rad_forecast_input , date= month_id, weeks= dt.weeks_in_month_lookup, segment1= forecast_segment1, segment2= forecast_segment2, predictors= &preds., finalout= sb_rad_forecast_output);/*proc printto; run;*/proc print data=sb_rad_forecast_output;run;/********************************************************************//* Step4: Export the result *//********************************************************************//* Export sb_rad_forecast_output to sb_rad_forecasting_output.xlsx */proc export data= sb_rad_forecast_output(drop = method_: adj_r_:) outfile= "&outpath." dbms=excelreplace; sheet="&rad_tab"; run;
0 0
- SAS Timeseries Modelling
- sas
- sas
- sAS
- sas
- SAS
- SAS
- sas
- SAS
- JStockChart -- Getting Started(Timeseries)
- Spark-TimeSeries使用方法
- TICC- TimeSeries Analyze
- Sketch based modelling papers
- Use Case Modelling
- Dimensional Modelling - Junk Dimension
- Miscellaneous: projection, modelling, rendering
- HANA Modelling Privileges
- Systems Modelling: Theory and Practice
- 【leetcode】【31】Next Permutation
- 简易csv解析
- tomcat架构分析(valve机制)
- 状态压缩动态规划入门
- Android Material Design之CoordinatorLayout+AppBarLayout实现上滑隐藏ToolBar
- SAS Timeseries Modelling
- js遍历Object所有属性
- 微信扫描二维码登录网站---href设置二维码大小
- DataTbale分页显示
- 如何降低人员流失造成对项目进度的影响
- 【Linux】linux常用基本命令
- Android基础:自定义带图片的Toast
- Android的数据存储(Preference、SQLite、SDCard、File、ContentProvider)【未完,待编辑】
- android 调用系统相机拍照 获取原图