Gensim-Similarity Queries
来源:互联网 发布:九天封神翅膀进阶数据 编辑:程序博客网 时间:2024/06/06 01:14
介绍
下面一个例子说明如何在gensim中做到这一点。
方法来自Indexing by Latent Semantic Analysis文章,例子来自gensim官网。
代码
from gensim import corpora, models, similaritiesdef GenDictandCorpus(): documents = ["Human machine interface for lab abc computer applications", "A survey of user opinion of computer system response time", "The EPS user interface management system", "System and human system engineering testing of EPS", "Relation of user perceived response time to error measurement", "The generation of random binary unordered trees", "The intersection graph of paths in trees", "Graph minors IV Widths of trees and well quasi ordering", "Graph minors A survey"] texts = [[word for word in document.lower().split()] for document in documents] # 词典 dictionary = corpora.Dictionary(texts) # 词库,以(词,词频)方式存贮 corpus = [dictionary.doc2bow(text) for text in texts] # print(dictionary) # print(corpus) return dictionary, corpusdef SimQuery(doc): dictionary, corpus = GenDictandCorpus() lsi = models.LsiModel(corpus, id2word=dictionary, num_topics=2) vec_bow = dictionary.doc2bow(doc.lower().split()) vec_lsi = lsi[vec_bow] # convert the query to LSI space # 为了准备相似性查询,我们需要输入我们要与后续查询进行比较的所有文档。 # 本例中,它们是用于训练LSI的9个文档,转换为2-D LSA空间。 # transform corpus to LSI space and index it index_corpus = similarities.MatrixSimilarity(lsi[corpus]) # 存贮和载入 # index.save('/tmp/deerwester.index') # index = similarities.MatrixSimilarity.load('/tmp/deerwester.index') # 对语料库执行相似性查询 sims = index_corpus[vec_lsi] # print(list(enumerate(sims))) # 相似性排序为降序 sims_sorted = sorted(enumerate(sims), key = lambda item: -item[1]) print(sims_sorted)SimQuery("Human computer interaction")结果:[(0, 0.9768815), (2, 0.96618712), (4, 0.93288612), (3, 0.89150834), (1, 0.87645805), (5, 0.032106727), (8, -0.002741307), (6, -0.07901895), (7, -0.2151109)]Process finished with exit code 0
参考:http://radimrehurek.com/gensim/tut3.html
阅读全文
0 0
- Gensim-Similarity Queries
- Gensim官方教程翻译(四)——相似度查询(Similarity Queries)
- Gensim官方教程翻译(四)——相似度查询(Similarity Queries)
- Gensim官方教程翻译(三)——相似度查询(Similarity Queries)
- gensim similarity计算文档相似度
- 手把手教你学Word2Vec系列三之Similarity Queries
- GENSIM
- GENSIM
- gensim
- Jaccard similarity
- Class Similarity
- Cosine similarity
- Cosine Similarity
- LintCode_Cosine Similarity
- Set Similarity
- Sentence Similarity
- gensim试用
- gensim安装
- 一大波水仙花来袭! ! !
- ArcPy导入地图服务器缓存
- 小米5X陀螺仪方向错误/不能用VR的解决办法
- web开发第三方登录之twitter登陆
- 阿里面试回来,想和Java程序员谈一谈
- Gensim-Similarity Queries
- OpenROV Cockpit说明
- HDU 6106 Classes (水题)
- 问题 : 找出直系亲属
- node 之二进制安装方法
- 对象2(继承)
- 国外程序员整理的Java资源大全
- DB2常见问题
- Windows国家授时中心服务器时间同步