NLTK vs Sklearn vs Gensim
来源:互联网 发布:身份证号码验证 java 编辑:程序博客网 时间:2024/06/06 03:34
NLTK、SKlearn和Gensim使用场景
引用quora上的回答:
Yuval Feinstein的回答:
Generally,
- NLTK is used primarily for general NLP tasks (tokenization, POS tagging, parsing, etc.)
- Sklearn is used primarily for machine learning (classification, clustering, etc.)
- Gensim is used primarily for topic modeling and document similarity.
Roland Bischof的回答:
- NLTK is specialized on gathering and classifying unstructured texts. If you need e.g. a POS-tagger, lematizer, dependeny-analyzer, etc, you’ll find them there, and sometimes nowhere else. It offers a quit broad range of tools developped mainly in academic research. But: most often it is not very well optimized - involving NLTK libraries often means to accept a huge performance loss. If you do text-gathering or -preprocessing, its fine to begin with - until you found some faster alternatives.
-SKLEARN is a much more an analyzing tool, rather than an gathering tool. Its greatly documented, well optimized, and covers a broad range of statistical methods.
-GENSIM is a very well optimized, but also highly specialized, library for doing jobs in the periphery of “WORD2DOC”. That is: it offers an easy and surpringly well working and swift AI-approach to unstructured texts. If you are interested in prodution, you might also have a look on TensorFlow, which offers a mathematically generalized, yet highly performant, model.
Although considerably overlapping, I personnaly prefer using NLTK for pre-processing, GENSIM as kind of base platform, and SKLEARN for third step processing issues.
- NLTK vs Sklearn vs Gensim
- sklearn、nltk、gensim语料输入对比之nltk
- sklearn、nltk、gensim语料输入对比之sklearn
- sklearn Discrete AdaBoost vs Real AdaBoost
- > VS >
- VS
- vs
- vs
- &&VS&
- VS
- vs
- VS
- VS
- 终于在pycharm下(Python3.6.1版本)安装完成机器学习相关库文件(sklearn scikit-learn gensim xgboost tensorflow nltk )
- 文本相似性工具安装 (python ,nltk , gensim)
- nltk 获取 gutenberg 语料,gensim 生成词库和 onehot 编码
- nlp的相关资源文档,nltk,pynlp,nlpir,gensim
- VIM vs VS:关公战秦琼
- okhttp+fastjson+三套环境的android—demo
- kafka安装
- java 多线程相关知识
- 第六周书面项目2-变量的变化(1)
- C和C++的混合编译--extern “C”的使用
- NLTK vs Sklearn vs Gensim
- shiro
- sqoop参数类型影响
- Java GC机制
- 实现jsp页面刷新
- SVM中不理解的点的整理
- Flex 布局教程:语法篇
- 【bzoj2435】道路修建
- Eclipse下链接第三方库(opencv,ubuntu)