SAS Timeseries Modelling

来源:互联网 发布:手机怎样申请淘宝账号 编辑:程序博客网 时间:2024/06/07 15:45
/*************************************************************************************************************//*                                         SB RAD FORECAST MACRO                                             *//*  Updated on 04/30/2014                                                                                    *//*  Input Data: sb_rad_input & sb_rad_lob_input(tabs from input Excel)                                       *//*  Output Data: sb_rad_forecasting_output.xlsx                                                              *//*                                                                                                           *//*  Purpose: The project adapts exponential smoothing methods to replace current one dimensional forecasting *//*           approach(phasing method) for DELL SB business, so that the model built could be more robust and *//*           accurate. For simplicity, a macro was created to build the time series model automatically.     */   /*                                                                                                           *//*  Modeling Process:                                                                                        *//*   - Step1: 1) Define library where the raw data files are stored                                          *//*            2) Import raw data from Excel to SAS, and create the date variable "month_id"                  *//*            3) Simple QC                                                                                   */               /*   - Step2: Define macro %opt_model & %esm_model                                                           *//*   - Step3: Run                                                                                            *//*            %esm_model(inputdata= dt.sb_rad_input, date= month_id, weeks= dt.weeks_in_month_lookup,        *//*                       segment1= forecast_segment1, segment2= forecast_segment2,                           *//*                       predictors= &preds, finalout= sb_rad_forecast_output)                               *//*       and                                                                                                 *//*            %esm_model(inputdata= dt.sb_rad_lob_input, date= month_id, weeks= dt.weeks_in_month_lookup,    *//*                       segment1= forecast_segment1, segment2= forecast_segment2,                           *//*                       predictors= &preds, finalout= sb_rad_lob_forecast_output)                           *//*                                                                                                           *//*       where "inputdata" is dataset for model building,                                                    *//*             "date" is a date variable, you don't need to change it in most cases,                         *//*             "weeks" is dataset for appending variable "weeks_in_month",                                   *//*             "segment1" & "segmengt2" are variables for spilting dataset,                                  *//*             "predictors" are 5 target variables for forecasting (assigned to macro &pred),                *//*             "finalout" is the dataset we need finally.                                                    *//*   - Step4: Export final result sb_rad_forecast_output & sb_rad_lob_forecast_output to                     *//*            sb_rad_forecasting_output.xlsx.                                                                *//*                                                                                                           *//*  Notes:                                                                                                   */ /*    - 1) The data must contain a valid date variable "month_id" that gives the date of each observation.   *//*         And the date(month_id) must be continuous, or no model will be built and a warning message will   *//*         will be presented in SAS log.                                                                     */ /*    - 2) The variables segment1 and segment2 should be assigned, if more than one of them is not assigned, *//*         the program will terminate.                                                                       *//*    - 3) Date variable: fiscal_year, fiscal_quarter(value 1-4), fiscal_month (value 1-12)                  *//*    - 4) There must be 5 predictors: sites, sites_purchasing, site_lapsing,revenue and margin.The position/spelling of  *//*         the predictors don't affect the model building.                                                 *//*    - 5) We spilt data to several sub-datasets using segment1 & segment2. If sub-dataset's records are    *//*         less than 12, no model will be built for current data. Forecasts for past months will be blank.   *//*         No predictions for the upcoming 12 months. At the same time, a warning message will be presented  *//*         in SAS log.                                                                                       */                                                                                                          /*    - 6) In general, the macro uses 2 trasformations(none & log) and 7 smoothing methods(simple, double,   *//*         linear, seasonal, addseasonal, winters and addwinters) to build models. But if negative value     *//*         exists, we use non-transformed data. And only 6 of 7 methods are applied to build model, because  *//*         winters method cannot be used for negative data.                                                  *//*    - 7) For forecasting records, if number of months in a quarter is not equal to 3, then all records in  *//*         this quarter will be deleted. Only records with completed quarter will be kept.                   *//*    - 8) The tab names should be edited manually after exporting the result to *.xlsx file, because SAS    *//*         cannot specify the proper tab names we need(extra '_' will be created in place of space).         */                                                                              /*                                                                                                           *//*************************************************************************************************************/options mprint;%let path = L:\dell_qmgisi\SMB NA Analytics Shanghai\Regular update (SB RAD Forecast)\2014-06\data;libname excel "&path.\RAD Forecast Attainment Output (SB Only) - 20140619.xlsx"; /* original excel data */*libname weeks "&path1.\Weeks in Month Lookup Table.xlsx";       /* Weeks in month lookup table */libname dt    "&path."; %let outpath = &path.\SB RAD Forecasting Output - SB.xlsx ;  /* This should be the same file as excel*//***********************************************************************************//* No need to update the following unless there is change in variable or tab names *//***********************************************************************************/%let rad_tab     = RAD_Forecasting_Output;/*%let rad_lob_tab = RAD_LOB_Forecasting_Output;*/%let preds = sites sites_purchasing sites_lapsing revenue margin;  /* list of predictors in input data */%let yr  = fiscal_year;%let qtr = fiscal_quarter;%let mth = fiscal_month;%let wk  = weeks_in_month;/****************************************************************************//*    Please make sure the following variables names are not changed        *//*    forecast_segment1, forecast_segment2                                  *//****************************************************************************//****************************************************************************//*                  Do not change the following!!!!                         *//****************************************************************************//********************************************************************//*                 Step1: Import Raw Data                           *//********************************************************************//* Prepare data */data dt.sb_rad_forecast_input;set excel."RAD Forecasting Input$"n;               /* "RAD Forecasting Input" is the tab name in excel */month_id = mdy(&mth., 01, &yr.);site_new = input(compress(sites), 8.);sites_lapsing_new = input(compress(sites_lapsing), 8.);drop sites sites_lapsing; rename site_new = sites sites_lapsing_new = sites_lapsing;format month_id date9.;run;* 108;/* Weeks in month lookup *//*data dt.weeks_in_month_lookup;*//*set weeks."Sheet1$"n;*//*run;*/* 241;/* QC */proc contents data = dt.sb_rad_forecast_input;run;proc print data= dt.sb_rad_forecast_input;run;proc sql;select forecast_segment1, forecast_segment2, count(*) as cntfrom dt.sb_rad_forecast_inputgroup by 1, 2order by 1, 2, 3;quit;/********************************************************************//*               Step2: %OPT_MODEL & %ESM_MODEL macros              *//********************************************************************//* OPT_MODEL MACRO *//* -- 1) Build the optimized model based on the max adjusted R^2 for each dataset & target variable */%macro opt_model(data=, target=, ind_var=, pred=);%let methods = Simple Double Linear Addseasonal Seasonal Addwinters Winters ;  %let trans   = None Log; %local i j;title2 "Target Variable: &target.";proc sort data = &data. out = data_sorted;by &yr. &qtr. &mth.;run;/* Processing current data and assign intv(interval) & leads(lead) according to the target variable */%if %eval(%qupcase(&target.) EQ SITES) %then %do;%let intv  = qtr; %let leads = 4;data data_ready;set data_sorted;by &yr. &qtr. &mth.;if (first.&yr. or first.&qtr.) and first.&mth.;run;%end;%else %if %eval(%qupcase(&target.) NE SITES) %then%do; %let intv  = month;%let leads = 12;data data_ready;set data_sorted;run;%end;/* Get minimum value, and check nagative value in the following step, no model will be built if all values are missing */proc sql;select min(&target.) format 20. into : min_value from data_ready;quit;%if %eval(&min_value. NE .) %then %do;/* Select different trans & methods for nagative/positive dataset */%if %eval(&min_value. GT 0) %then %do; %let i_max = 7;  %let j_max = 2; %end;%else %do; %let i_max = 6;  %let j_max = 1; %end;/* Build models using methods & trnas */%do i = 1 %to &i_max.;%do j = 1 %to &j_max.;proc esm data = data_ready lead = &leads. outstat = stat_&i._&j.;      /* output data stat_*_*(includes adjusted R^2) */id &ind_var. interval = &intv.;forecast &target. / model = %scan(&methods., &i.) transform = %scan(&trans., &j.);run;data stat_&i._&j.;informat trans $10. method $20.;format   trans $10. method $20.;set stat_&i._&j.;trans  = "%scan(&trans., &j.)" ;method = "%scan(&methods., &i.)" ;run;%end;%end;/* Combine all model's statistical result */data models_temp1;set %do i = 1 %to &i_max.;%do j = 1 %to &j_max.;stat_&i._&j.%end;%end;;run;/* Choose opt result based on max adjusted R^2 */proc sql;create table models_temp2 asselect * from (select *, max(adjrsq) as max_adjrsq from models_temp1 ) awhere adjrsq = max_adjrsq;quit;/* Get the opt trans, method, adjusted R^2 based on max adjusted r-square */proc sql;select trans  into : opt_trans  from models_temp2;select method into : opt_method from models_temp2;select adjrsq into : opt_adjrsq from models_temp2;quit;/* Build optimized model according to the opt trans & method we choose in the previous step */proc esm data = data_ready  lead = &leads.  outfor = pred_temp ;id &ind_var. interval = &intv.; forecast &target. / model = &opt_method. transform = &opt_trans.;run;/* Keep the variables we need */data &pred.;informat method_&target. $20.;set pred_temp( rename = (predict = forecasted_&target. ));/*where actual =.;*/     adj_r_square_&target. = &opt_adjrsq.;                                       /* adjusted R^2   */method_&target.= compress("&opt_trans.")||" "||compress("&opt_method.");    /* trans & method */keep &ind_var.  forecasted_&target. adj_r_square_&target. method_&target.;run;%end;%else %do; data &pred.; informat method_&target. $20.;&ind_var. = . ;forecasted_&target.   = . ;adj_r_square_&target. = . ;method_&target.       = "";run; %end;%mend opt_model;/* ESM_MODEL MACRO *//* -- 1) Divide the inputdata into several sub-datasets using forecasting segments: RAD & Channels.     *//* -- 2) Build 4 optimized models for each sub-dataset respectively based on the max adjusted r-square. */%macro esm_model(inputdata=, date=, weeks=, segment1=, segment2=, predictors=, finalout=);/* Check whether segment1 & segment2 are assigned */%if %length(&segment1.) = 0 or %length(&segment2.) = 0 %then %do;%put ERROR: segment1  or segment2 is not available ; %abort;%end; %else %do;/* Scan four target variables */%let predictor_1 = %scan(&predictors., 1);%let predictor_2 = %scan(&predictors., 2);%let predictor_3 = %scan(&predictors., 3);%let predictor_4 = %scan(&predictors., 4);%let predictor_5 = %scan(&predictors., 5);/* Sort the inputdata before building models */proc sort data = &inputdata. out = inputdata_sorted;by &segment1. &segment2. &yr. &qtr. &mth.;run;/* Divide into several sub-datasets using segment1 & segment2 */proc sql;select distinct quote(compress(%nrbquote(&segment1.)||%nrbquote(&segment2.)))  into : div_seg separated by "," from inputdata_sorted;select count(distinct %nrbquote(&segment1.)||%nrbquote(&segment2.))            into : div_num                  from inputdata_sorted;quit; data %do i = 1 %to &div_num.; dataset_&i. %end; ;set inputdata_sorted;select(compress(&segment1.||&segment2.));%do i = 1 %to &div_num.;when(%scan(%nrbquote(&div_seg.), &i., ',')) output dataset_&i.;%end;otherwise;end;run;/* Build 4 optimized models for each sub-dataset & target variable respectively */%do m = 1 %to &div_num.;title1 "Current Dataset";proc sql;select distinct &segment1.   into : curr_segment1     from dataset_&m.;select distinct &segment2.   into : curr_segment2     from dataset_&m.;select count(*)  format 8.   into : curr_record_&m.   from dataset_&m.;quit;/* Check whether month_id(date) is continuous */data _null_;set dataset_&m. end = eof;retain diff_sum 0;diff_year  = &yr. - lag(&yr.);diff_month = &mth.- lag(&mth.);diff       = diff_year*12 + diff_month;if _n_ = 1 then diff = 0;diff_sum = diff_sum + diff;if eof then call symputx("diff_curr_&m.", diff_sum);run;    /* Limit the records to 12 and date is continuous */%if %eval(&&curr_record_&m. GE 12) and %eval(&&diff_curr_&m. EQ &&curr_record_&m. - 1) %then%do;title1 "Dataset with %nrbquote(&curr_segment1.) and %nrbquote(&curr_segment2.)";%opt_model(data= dataset_&m., target= &predictor_1., ind_var= &date., pred= pred_&predictor_1._&m.);%opt_model(data= dataset_&m., target= &predictor_2., ind_var= &date., pred= pred_&predictor_2._&m.);%opt_model(data= dataset_&m., target= &predictor_3., ind_var= &date., pred= pred_&predictor_3._&m.);%opt_model(data= dataset_&m., target= &predictor_4., ind_var= &date., pred= pred_&predictor_4._&m.);%opt_model(data= dataset_&m., target= &predictor_5., ind_var= &date., pred= pred_&predictor_5._&m.);    /* Merge 4 model's predicted value(sites, sites_purchasing, revenue and margin) by month_id */data finalout_temp1_&m.;merge dataset_&m. pred_&predictor_1._&m. pred_&predictor_2._&m. pred_&predictor_3._&m. pred_&predictor_4._&m. pred_&predictor_5._&m.;by &date.;        /* add current segment1 & segment2 */&segment1. = "%nrbquote(&curr_segment1.)";&segment2. = "%nrbquote(&curr_segment2.)";if &date. = . then delete;   /* delete missing row */run;    /* Get variable weeks_in_month */proc sql;create table finalout_temp2_&m. asselect year(a.&date.)  as &yr.,qtr(a.&date.)   as &qtr.,month(a.&date.) as &mth.,a.&date.,b.&wk.,&segment1.,&segment2.,&predictor_1.,&predictor_2.,&predictor_3.,&predictor_4.,&predictor_5.,forecasted_&predictor_1.,forecasted_&predictor_2.,forecasted_&predictor_3.,forecasted_&predictor_4.,forecasted_&predictor_5.,method_&predictor_1.,method_&predictor_2.,method_&predictor_3.,method_&predictor_4.,method_&predictor_5.,adj_r_square_&predictor_1.,adj_r_square_&predictor_2.,adj_r_square_&predictor_3.,adj_r_square_&predictor_4.,adj_r_square_&predictor_5.from finalout_temp1_&m. aleft join  &weeks.      bon year(a.&date.) = b.&yr. and qtr(a.&date.) = b.&qtr. and month(a.&date.) = b.&mth. ;quit;/* Keep completed forecasting quarter */data _null_;set finalout_temp2_&m. end = eof;if eof then call symputx("last_month_forecast", &mth.);run;%let last_obs = %eval(&&curr_record_&m. + 12 - %sysfunc(mod(&last_month_forecast., 3)));    /* the last obs for current forecasting result *//* Final result for each dataset */data finalout_&m.;set finalout_temp2_&m.(obs = &last_obs.);run;%end;    /* If records less than 12/date is not continuous, a warning message is presented in SAS log */%else %if %eval(&&curr_record_&m. LT 12)                 %then %put Warning: current dataset_&m.(&&curr_record_&m. records)less than 12 records; %else %if %eval(&&diff_curr_&m. NE &&curr_record_&m. -1) %then %put Warning: current dataset_&m., month_id is not continuous;%end;/* All sub-datasets result */data finalout_temp1;set %do i = 1 %to &div_num.;%if %eval(&&curr_record_&i. GE 12) and %eval(&&diff_curr_&i. EQ &&curr_record_&i. - 1 ) %then finalout_&i.;%else dataset_&i.;%end;;run;/* Aggregate values in quarter */proc sort data= finalout_temp1 out= finalout_temp2;by &segment1. &segment2. &yr. &qtr. &mth.;run;/* Data finalout_agg */data finalout_temp3 ;set finalout_temp2;by &segment1. &segment2. &yr. &qtr. &mth. ;retain %do i = 1 %to 5; %scan(&predictors., &i.)_agg  forecasted_%scan(&predictors., &i.)_agg    %end;;/* Initialize aggregated variables(actual & forecasted) */if first.&yr. or first.&qtr. then do;%do i = 1 %to 5;%scan(&predictors., &i.)_agg = 0; forecasted_%scan(&predictors., &i.)_agg = 0;%end;end;    /* Accumulate values month by month within each quarter */%do n =1 %to 5;forecasted_%scan(&predictors., &n.)_agg  = forecasted_%scan(&predictors., &n.)_agg + forecasted_%scan(&predictors., &n.);%scan(&predictors., &n.)_agg             = %scan(&predictors., &n.)_agg + %scan(&predictors., &n.);%end;/* Process missing values : method_ & adj_r_square_ */array arrayNA1[5] %do i = 1 %to 5; method_%scan(&predictors., &i.)       %end; ;array arrayNA2[5] %do i = 1 %to 5; adj_r_square_%scan(&predictors., &i.) %end; ;do m = 1 to 5;if arrayNA1[m] = " " then arrayNA1[m] = "NONE";if arrayNA2[m] = .   then arrayNA2[m] = 0;end;/*drop &date. m  forecasted_&predictor_1. forecasted_&predictor_2. forecasted_&predictor_3. forecasted_&predictor_4.; */run;/* Format process */data finalout_temp4;set finalout_temp3;/* sites and sites_purchasing */%do i = 1 %to 3;forecasted_%scan(&predictors., &i.)_agg = round(forecasted_%scan(&predictors., &i.)_agg);%scan(&predictors., &i.)_agg            = round(%scan(&predictors., &i.)_agg);%end;/* revenue and margin */%do i = 4 %to 5;forecasted_%scan(&predictors., &i.)_agg = input(put(forecasted_%scan(&predictors., &i.)_agg, d11.9), d11.9);%scan(&predictors., &i.)_agg            = input(put(%scan(&predictors., &i.)_agg , d11.9), d11.9);%end;run;/* Final result */proc sql;create table &finalout. asselect &yr.,&qtr.,&mth.,&wk.,&segment1.,&segment2.,&predictor_1.,&predictor_2.,&predictor_3.,&predictor_4.,&predictor_5.,forecasted_&predictor_1._agg as forecasted_&predictor_1.,forecasted_&predictor_2._agg as forecasted_&predictor_2.,forecasted_&predictor_3._agg as forecasted_&predictor_3.,forecasted_&predictor_4._agg as forecasted_&predictor_4.,forecasted_&predictor_5._agg as forecasted_&predictor_5.,&predictor_1._agg as actual_&predictor_1.,&predictor_2._agg as actual_&predictor_2.,&predictor_3._agg as actual_&predictor_3.,&predictor_4._agg as actual_&predictor_4.,&predictor_5._agg as actual_&predictor_5.,method_&predictor_1.,method_&predictor_2.,method_&predictor_3.,method_&predictor_4.,method_&predictor_5.,adj_r_square_&predictor_1.,adj_r_square_&predictor_2.,adj_r_square_&predictor_3.,adj_r_square_&predictor_4.,adj_r_square_&predictor_5.from finalout_temp4;quit;%end;%mend esm_model;/********************************************************************//*                      Step3: Run the example                      *//********************************************************************//* Input data : dt.sb_rad_input         *//* Output data: sb_rad_forecast_output  *//*proc printto log   = "&path.\sb_rad_log_file.log" new; run;*/%esm_model(inputdata= dt.sb_rad_forecast_input , date= month_id, weeks= dt.weeks_in_month_lookup,            segment1= forecast_segment1, segment2= forecast_segment2, predictors= &preds., finalout= sb_rad_forecast_output);/*proc printto; run;*/proc print data=sb_rad_forecast_output;run;/********************************************************************//*                     Step4: Export the result                     *//********************************************************************//* Export sb_rad_forecast_output to sb_rad_forecasting_output.xlsx  */proc export data= sb_rad_forecast_output(drop = method_: adj_r_:)            outfile= "&outpath."             dbms=excelreplace;     sheet="&rad_tab"; run;

0 0
原创粉丝点击