SAS DM数据准备读书笔记(目录)

来源:互联网 发布:淘宝舞帝商城 编辑:程序博客网 时间:2024/06/07 09:06

计划从今天开始整理SAS DM的读书笔记,计划一天2-3篇,争取3个月左右时间,在小学期前完成。今天倒腾了一段宏变量加总的代码,发现学海无涯。还是要通过整理读书笔记把数据分析处理的能力系统巩固下。本读书笔记主要以分析代码为主。考虑读者已经具备基本的SAS BASE,MACRO,SQL的知识。

 

学习书籍:Data Preparation for DataMining Using SAS

网络上很容易下载到电子版,发现百度网盘和新浪爱问的电子资源很丰富,基本上常用的电子教材上面都可以找到。另外感谢新西兰的王同学,提供了这本教材的代码和数据。

 

教材总体评价:

自己硬盘里面有关SAS的教材估计下载了大几个G,大部分是有关于统计分析的,基本大同小异。而这本教材是我看过的最系统、最有价值的定位于分析前的数据准备的教材,这部分工作往往会受到部分不是很严谨的科研工作者的忽略,因为,他们很多研究成果都是为了发文章而炮制结果,纯属于自娱自乐。而业界应用则不一样,一旦数据分析结果产生偏差,将导致决策失误,损失的将是白花花的银子和宝贵的声誉,因此,业界非常非常注重分析前的数据准备工作。综上,无论对于科研工作人员,还是业界实务分析人员,这项工作都是非常重要的。这本书里面大量使用了BASE,MACRO和SQL,这和大部分SAS分析人员日常使用的工具是高度吻合的,读懂这些代码,并能熟练应用,应该可以说是精通SAS BASE了。对于BASE MACRO SQL还不是很熟悉的用户来说,有一定难度。可以先学习另外一门教材,SAS.Publishing.Data.Preparation.for.Analytics.Using.SAS 这本的内容相对简单一些,代码编程量和MACRO的应用会少很多。也是一本以实务应用为导向的教材。这两本书的内容都掌握的话,应该来说,贯穿统计分析、数据挖掘全过程的数据处理的基本功是具备了。

 

教材目录

 

1. Introduction

* setting the context of data mining

2. Tasks and Data Flow

* describes what data mining can do and where data preparation fits in

3. Review of Data Mining Modeling Techniques

* an overview of data mining techniques

4. SAS Macros: A Quick Start

* just in case you haven't worked with SAS macros

5. Data Acquisition and Integration

* where you get your data from and how it's pulled together

6. Integrity Checks

* how to make sure the data is correct and even what "correct" means

7. Exploratory Data Analysis

* get to know your data

8. Sampling and Partition

* dealing with large data sets as well as getting ready to validate the models you build

9. Data Transformations

* rarely is your source data in the form most effective for modeling - this chapter describes what can be done to produce the most effective models

10. Binning and Reduction of Cardinality

* make your variables less complex and often times, more presentable and understandable

11. Treatment of Missing Values

* you will have missing values in your data - here are several approaches for dealing with them

12. Predictive Power and Variable Reduction I

* introduces the concept of identifying usefulness of input variables and reducing the required number of variables

13. Analysis of Nominal and Ordinal Variables

* how to evaluate relationships with discrete variables

14. Analysis of Continuous Variables

* how to evaluate relationships with continuous variables

15. Principal Component Analysis

* how to use PCA for variable reduction during data preparation

16. Factor Analysis

* how to use Factor Analysis for variable reduction during data preparation

17. Predictive Power and Variable Reduction II

* defines methods of simplifying and reducing input variables with respect to the target variable

18. Putting It All Together

* a case study showing the application of all these techniques for data preparation in a realistic example

Appendix. Listing of SAS Macros

 

 

 

 

0 0
原创粉丝点击