Data Analysis

来源:互联网 发布:网络线上和线下的区别 编辑:程序博客网 时间:2024/05/01 21:12

 Introduction to Advanced Data Analysis

   Given datasets to analysis, there are basically two approaches can be applied o the task. For datasets of relative small sizes, statistical approach is a good option, which provides a number statistical status and visualizing methods to let you get the overviews of the given datasets. Visualizations are important techniques give you the direct intutions to the data. However, for very larger datasets, the statisfied graphs are generally hard to produced,  and valuable informations are hard to study out .  Data mining is defined as the process of generating actionable information, interesting patterns  from large and complex datasets. It provides enhanced techniques from machine learning and statistics to handle the datasets  with large volume and complex formats.   

 

Statistic Method

The are many plot methods, some are for multivariate analysis, some for bivariate analysis

Analysis Points, be visualized and summarized, descriptive , perspective. A ideal method should be
1) contains quantitative information. 2) not lose information 3) intuitive.

  • dot plot
  • jiterring plot (univariate)
  • mosaic plot (multivariate)
  • contingency table (bivariate for binary category)
  • multiplot  (scatter-plot matrix and co-plot)
  • false color plot (multivariate)
  • stacked plot (with using dot plot, soltions for composition problem)

  •  kernel density estimate

  • Linear regression

  • histogram

  • culmulative distribution function

  • correlation

The statistical summary are generally required.

  • The size of the dataset

  • The max, min values in an attribute

  • The mode, frequent in an categories attributes

  • Distribution of an numerica attributes, is it symetric or asymmetric ?

  • The spread of the data

  • Any clusters

  • Any outliers

 

Data Mining

 Data mining mainly contains 4 tasks: 1) classification. 2) clustering. 3) Association rules mining 4) Anomaly detection.

 

原创粉丝点击