NLP计算文档相似度之doc2vec

来源:互联网 发布:菜鸟网络是什么 编辑:程序博客网 时间:2024/06/06 01:06
import gensimoutp1 = 'D:\python_noweightpathway\TIA\docmodel'file = open(u'D:\python_noweightpathway\TIA\TIAxmmc.txt', encoding='utf-8')# fileghdjid = open(u'D:\python_noweightpathway\TIA\TIA.txt', encoding='utf-8')# ghdjids = []# for ghdjid in fileghdjid:#     ghdjids.append(ghdjid)# i = 0# for line in file:#     LabeledSentence(words=line.split(), labels=['SENT_%s' % ghdjids[i]])#     i = i + 1documents = gensim.models.doc2vec.TaggedLineDocument(file)model = gensim.models.Doc2Vec(documents, size=100, window=8, min_count=100, workers=8)model.save(outp1)

读取模型

import gensimmodel=gensim.models.Doc2Vec.load("D:\python_noweightpathway\TIA\docmodel")print(model.docvecs.most_similar(4))print(model.docvecs.similarity(2,12))
原创粉丝点击