Learning R---dummy
来源:互联网 发布:好用的网络电话软件 编辑:程序博客网 时间:2024/06/05 21:53
Intr
对数据框中的因子型和字符串变量快速高效地创建哑变量。在网上搜哑变量和one-hot encoding,碰巧看到的。感觉还是python比较适合,依赖一个库就好,R真是各个包,不继续维护的话,没准有很多坑。
Function
categories
主要作用:抽取分类变量的值,是生成哑变量的预处理工作。
categories函数抽取数据框中所有的因子型和字符型变量,忽略数值型变量,是dummy函数的预处理。
Arguments
x 数据框p 选择频数为前p个的值。可以是"all"(即分类变量的所有值),或者一个整数p(表示所有分类变量频数排名最靠前的p个),或者一个向量(指定每一个分类型变量的情况)
Examples
library(dummy)traindata <- data.frame(var1=as.factor(c("a","b","b","c")),var2=as.factor(c(1,1,2,3)), var3=c("val1","val2","val3","val3"),stringsAsFactors=FALSE)newdata <- data.frame(var1=as.factor(c("a","b","b","c","d","d")),var2=as.factor(c(1,1,2,3,4,5)),var3=c("val1","val2","val3","val3","val4","val4"),stringsAsFactors=FALSE)categories(x=traindata,p="all")categories(x=traindata,p=2)categories(x=traindata,p=c(2,1,3))
dummy
Arguments
dummy(x, p = "all", object = NULL, int = FALSE, verbose = FALSE)x 数据框p object为NULL时,参数有效。参数含义同categories中的参数object categories输出的对象 int TRUE表示哑变量为数值型,否则因子型verbose 是否需要展示进程
Examples
library(dummy)traindata <- data.frame(var1=as.factor(c("a","b","b","c")), var2=as.factor(c(1,1,2,3)), var3=c("val1","val2","val3","val3"), stringsAsFactors=FALSE)newdata <- data.frame(var1=as.factor(c("a","b","b","c","d","d")), var2=as.factor(c(1,1,2,3,4,5)), var3=c("val1","val2","val3","val3","val4","val4"), stringsAsFactors=FALSE)#create dummies of training set(dummies_train <- dummy(x=traindata))#create dummies of new set(dummies_new <- dummy(x=newdata))#how many new dummy variables should not have been created?sum(! colnames(dummies_new) %in% colnames(dummies_train))#create dummies of new set using categories found in training set(dummies_new <- dummy(x=newdata,object=categories(traindata,p="all")))#how many new dummy variables should not have be created?sum(! colnames(dummies_new) %in% colnames(dummies_train))#create dummies of training set,#using the top 2 categories of all variables found in the training datadummy(x=traindata,p=2)#create dummies of training set,#using respectively the top 2,3 and 1 categories of the three#variables found in training datadummy(x=traindata,p=c(2,3,1))#create all dummies of training datadummy(x=traindata)
Others
实际应用是否需要先把训练集和测试集合起来,再进行哑变量呢?不过如果训练集中没有这个类别,似乎模型在测试集中也没有啥用啊,真正的含义是把那些未知的类别都归于训练集中最后一个类别了。
至于哑变量和one-hot encoding的内容还要再找找资料学习下~之前完全没有考虑过这些内容哈,还是太欠缺咯
阅读全文
0 0
- Learning R---dummy
- dummy
- dummy
- R语言 | 多元回归分析中的对照编码(contrast coding) | 第一节 dummy variable(哑变量) 和 dummy coding
- machine learning in R
- Learning R学习笔记
- R learning -Base Graphics
- deep learning with R
- Learning R---stringr
- Learning R---Rwordseg
- Learning R---dplyr
- Learning R---randomForest
- Learning R---SMOTE
- 【Deep Learning】R-CNN
- Learning R---animation
- R语言分类变量转换为哑变量(dummy vairable)
- Dummy coding
- Learning Time Series with R
- web上下文设置及el失效问题解决
- 详解Linux基础网络服务之DNS域名解析
- Android LinearLayout线性布局属性
- 微软为什么没有因《反垄断法》被美国政府强行拆分?
- DNS基础知识
- Learning R---dummy
- 面试PTC+YK
- MT 112 Status of a Request for Stop Payment of a Cheque支票止付请求状态
- 单词查找树(c++ 版)
- MT 190 Advice of Charges, Interest and Other Adjustments收费、利息和其他调整
- 实现栈和队列的不同方法
- 5.五环加权有向图-最短路径问题
- 理论篇~第二章 数据仓库的命名规范
- LeetCode 0014