CountVectorizer、TfidfTransformer、TfidfVectorizer关系
来源:互联网 发布:香橙派 ubuntu 编辑:程序博客网 时间:2024/05/19 16:07
#统计词频信息
ct=CountVectorizer(stop_words='english')
print ct.fit_transform(test_x).todense()
输出:
[[1 1 1 1 1 1 2 1 1 1 1]
[1 0 0 1 1 0 1 1 0 1 0]
[0 0 0 0 0 0 0 0 0 1 0]]
print ct.vocabulary_
输出:
{u'story': 10, u'good': 6, u'escapades': 4, u'amounts': 1, u'series': 9, u'gander': 5, u'goose': 7, u'adage': 0, u'occasionally': 8, u'demonstrating': 3, u'amuses': 2}
#根据词频信息生成TF-IDF向量
transformer=TfidfTransformer()
print transformer.fit_transform(ct.fit_transform(test_x))
输出:
(0, 9) 0.18699352422
(0, 4) 0.240788208802
(0, 3) 0.240788208802
(0, 0) 0.240788208802
(0, 6) 0.481576417605
(0, 7) 0.240788208802
(0, 5) 0.316607558316
(0, 8) 0.316607558316
(0, 2) 0.316607558316
(0, 1) 0.316607558316
(0, 10) 0.316607558316
(1, 9) 0.328078311076
(1, 4) 0.422460559532
(1, 3) 0.422460559532
(1, 0) 0.422460559532
(1, 6) 0.422460559532
(1, 7) 0.422460559532
(2, 9) 1.0
TfidfVectorizer
Convert a collection of raw documents to a matrix of TF-IDF features.
将原始文档的集合转换为tf - idf特性的矩阵
Equivalent to CountVectorizer followed by TfidfTransformer.
ct=CountVectorizer(stop_words='english')
print ct.fit_transform(test_x).todense()
输出:
[[1 1 1 1 1 1 2 1 1 1 1]
[1 0 0 1 1 0 1 1 0 1 0]
[0 0 0 0 0 0 0 0 0 1 0]]
print ct.vocabulary_
输出:
{u'story': 10, u'good': 6, u'escapades': 4, u'amounts': 1, u'series': 9, u'gander': 5, u'goose': 7, u'adage': 0, u'occasionally': 8, u'demonstrating': 3, u'amuses': 2}
#根据词频信息生成TF-IDF向量
transformer=TfidfTransformer()
print transformer.fit_transform(ct.fit_transform(test_x))
输出:
(0, 9) 0.18699352422
(0, 4) 0.240788208802
(0, 3) 0.240788208802
(0, 0) 0.240788208802
(0, 6) 0.481576417605
(0, 7) 0.240788208802
(0, 5) 0.316607558316
(0, 8) 0.316607558316
(0, 2) 0.316607558316
(0, 1) 0.316607558316
(0, 10) 0.316607558316
(1, 9) 0.328078311076
(1, 4) 0.422460559532
(1, 3) 0.422460559532
(1, 0) 0.422460559532
(1, 6) 0.422460559532
(1, 7) 0.422460559532
(2, 9) 1.0
TfidfVectorizer
Convert a collection of raw documents to a matrix of TF-IDF features.
将原始文档的集合转换为tf - idf特性的矩阵
Equivalent to CountVectorizer followed by TfidfTransformer.
相当于CountVectorizer配合TfidfTransformer使用的效果
TfidfVectorizer类将CountVectorizer和TfdfTransformer类封装在一起
阅读全文
0 0
- CountVectorizer、TfidfTransformer、TfidfVectorizer关系
- sklearn CountVectorizer\TfidfVectorizer\TfidfTransformer函数详解
- TF-IDF权重计算:TfidfTransformer(),CountVectorizer()和TfidfVectorizer()
- TfidfVectorizer和TfidfTransformer
- CountVectorizer和TfidfVectorizer注意的地方
- sklearn 中的Countvectorizer/TfidfVectorizer保留长度小于2的字符方法
- CountVectorizer
- 使用CountVectorizer和TfidfVectorizer对fetch_20newsgroups数据进行分类,并对是否使用停用词进行对比(精确度)
- 分别使用CountVectorizer与TfidfVectorizer, 并且去掉停用词的条件下,对文本特征进行量化的朴素贝叶斯分类性能测试
- sklearn.feature_extraction.text.TfidfVectorizer
- 文本词频Countvectorizer
- Scikit Learn CountVectorizer 入门实例
- Spark CountVectorizer处理文本特征
- scikit-learn文本特征提取之TfidfVectorizer
- tf-idf:sklearn中TfidfVectorizer使用
- sklearn之sklearn.feature_extraction.text.CountVectorizer
- sklearn CountVectorizer按指定字符切分字符串
- Spark成长之路(10)-CountVectorizer
- 算法-删除字符串中的公共字符
- android绘图之Path总结
- MAT(Memory Analyzer Tool)工具入门介绍
- Knights of Ni(两次bfs)
- 组合数求模模板
- CountVectorizer、TfidfTransformer、TfidfVectorizer关系
- 【React】配置react-hot-loader后出现import' and 'export' may only appear at the top level
- Effective C++笔记
- 521356
- Qt5--文本编辑器 (二)
- node核心模块之path
- iOS导航栏创建
- Django Model
- 各种正则:邮箱,电话号码,身份证号。