Categorical Data
来源:互联网 发布:android自带数据库 编辑:程序博客网 时间:2024/06/05 09:50
This is an introduction to pandas categorical data type, including a short comparison with R’s factor.
Categoricals are a pandas data type, which correspond to categorical variables in statistics: a variable, which can take on only a limited, and usually fixed, number of possible values (categories; levels in R). Examples are gender, social class, blood types, country affiliations, observation time or ratings via Likert scales.In contrast to statistical categorical variables, categorical data might have an order (e.g. ‘strongly agree’ vs ‘agree’ or ‘first observation’ vs. ‘second observation’), but numerical operations (additions, divisions, …) are not possible.
All values of categorical data are either in categories or np.nan. Order is defined by the order of categories, not lexical order of the values. Internally, the data structure consists of a categories array and an integer array of codes which point to the real value in the categories array.
The categorical data type is useful in the following cases:
- A string variable consisting of only a few different values. Converting such a string variable to a categorical variable will save some memory, see here.
- The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). By converting to a categorical and specifying an order on the categories, sorting and min/max will use the logical order instead of the lexical order, see here.
- As a signal to other python libraries that this column should be treated as a categorical variable (e.g. to use suitable statistical methods or plot types).
概括:Categorical Data数据类型就类似“性别”、“血型”、“班级”等,只能是一些固定的“值“。Categorical Data可以有不同级别,但是不能用于数值计算。
- Categorical Data
- 重学Statistics, Cha2 Descriptive Statistics (Categorical and Quantitative Data)
- How to Get the Frequency Table of a Categorical Variable as a Data Frame in R
- Is it OK to mix categorical and continuous data for SVM (Support Vector Machines)?
- [Sklearn应用3] Preprocessing data (三)编码分类特征 Encoding categorical features
- CatBoost: A machine learning library to handle categorical (CAT) data automatically MACHINE LEARNING
- [初学笔记] categorical 功能
- Categorical Reparameterization with Gumbel-Softmax
- Categorical, Ordinal, Interval - 变量之间的区别
- How to deal with an SVM with categorical attributes?
- 特征工程——categorical特征 和 continuous特征
- 特征工程——categorical特征 和 continuous特征
- data
- data ()
- data
- Data
- data
- data
- Java Web工程转换为基于Maven的Web工程
- volatile关键字解析
- 面向对象的继承与修改(拖拽实例)
- stylus之选择器(Selectors)
- GTK3.0学习第一天---环境配置
- Categorical Data
- 两台主机可以无需密码而直接互相登录的SSH配置方法
- Taming Recurrent Neural Networks for Better Summarization
- MyBatis(四) sql执行流程
- jdbc.properties配置
- C语言link过程详解
- maven java编译环境的设置
- MyBatis动态SQL之 set 和 trim标记的使用
- SpringBoot添加freemarker+jxl下载数据库记录