sklearn CountVectorizer\TfidfVectorizer\TfidfTransformer函数详解
来源:互联网 发布:程序员过关 编辑:程序博客网 时间:2024/06/05 19:05
- sklearn CountVectorizer函数详解
from sklearn.feature_extraction.text import CountVectorizertexts=["dog cat fish","dog cat cat","fish bird", 'bird']cv = CountVectorizer()cv_fit=cv.fit_transform(texts)print(cv.get_feature_names())print(cv_fit.toarray())print(cv_fit)
- 返回的结果为稀疏矩阵
['bird', 'cat', 'dog', 'fish'][[0 1 1 1] [0 2 1 0] [1 0 0 1] [1 0 0 0]] (0, 3) 1 (0, 1) 1 (0, 2) 1 (1, 1) 2 (1, 2) 1 (2, 0) 1 (2, 3) 1 (3, 0) 1
- sklearn TfidfTransformer函数详解
from sklearn.feature_extraction.text import CountVectorizerfrom sklearn.feature_extraction.text import TfidfTransformertexts=["dog cat fish","dog cat cat","dog fish", 'dog pig pig 中国']cv = CountVectorizer()cv_fit=cv.fit_transform(texts)transformer = TfidfTransformer()tfidf = transformer.fit_transform(cv_fit)tfidf.toarray()
array([[ 0.64043405, 0.42389674, 0.64043405, 0. , 0. ], [ 0.94936136, 0.31418628, 0. , 0. , 0. ], [ 0. , 0.55193942, 0.83388421, 0. , 0. ], [ 0. , 0.22726773, 0. , 0.8710221 , 0.43551105]])
- sklearn TfidfVectorizer函数详解
- TfidfVectorizer函数的功能相当于下面这四行代码的功能,即CountVectorizer+TfidfTransformer
cv = CountVectorizer()cv_fit=cv.fit_transform(texts)transformer = TfidfTransformer()tfidf = transformer.fit_transform(cv_fit)
- 上代码,TfidfVectorizer
from sklearn.feature_extraction.text import TfidfVectorizertv = TfidfVectorizer(max_features=100, ngram_range=(1, 1), stop_words='english')X_description = tv.fit_transform(texts)print(X_description.toarray())
[[ 0.64043405 0.42389674 0.64043405 0. 0. ] [ 0.94936136 0.31418628 0. 0. 0. ] [ 0. 0.55193942 0.83388421 0. 0. ] [ 0. 0.22726773 0. 0.8710221 0.43551105]]
- 可观察到输出的结果和上面的结果是一毛一样的。
- ngram_range=(1, 1)也可以改为(2,3),这就是2-gram.
- stop_words暂时只支持英文,即”english”
阅读全文
0 0
- sklearn CountVectorizer\TfidfVectorizer\TfidfTransformer函数详解
- CountVectorizer、TfidfTransformer、TfidfVectorizer关系
- TF-IDF权重计算:TfidfTransformer(),CountVectorizer()和TfidfVectorizer()
- TfidfVectorizer和TfidfTransformer
- sklearn 中的Countvectorizer/TfidfVectorizer保留长度小于2的字符方法
- sklearn.feature_extraction.text.TfidfVectorizer
- CountVectorizer和TfidfVectorizer注意的地方
- sklearn之sklearn.feature_extraction.text.CountVectorizer
- tf-idf:sklearn中TfidfVectorizer使用
- sklearn CountVectorizer按指定字符切分字符串
- CountVectorizer
- 使用CountVectorizer和TfidfVectorizer对fetch_20newsgroups数据进行分类,并对是否使用停用词进行对比(精确度)
- 分别使用CountVectorizer与TfidfVectorizer, 并且去掉停用词的条件下,对文本特征进行量化的朴素贝叶斯分类性能测试
- PCA(sklearn参数详解)
- RandForest(sklearn)参数详解
- GBDT(sklearn)参数详解
- Random Forest(sklearn参数详解)
- sklearn.ensemble.RandomForest 参数详解
- Windows7 系统上配置caffe GPU/CPU 的深度学习框架
- 系统吞吐量、TPS(QPS)、用户并发量、性能测试概念和公式
- tensorflow.slice_input_producer
- FasterRCNN算法:RPN层的深入理解
- 军事理论课答案(西安交大版)
- sklearn CountVectorizer\TfidfVectorizer\TfidfTransformer函数详解
- pytorch使用(四)训练网络
- 搞定字体样式、背景的工具类(shape、selector、drawable)
- 物联网之无线网络技术(Cellular,LPWAN,LAN)
- Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser
- iOS MRC情况下重写setter getter方法
- 设计模式
- /var/lib/dpkg/info 文件夹作用以及补救方法
- LUOGU P2278 [HNOI2003]操作系统