读glove文件的代码
来源:互联网 发布:sci灌水 知乎 编辑:程序博客网 时间:2024/06/05 09:22
import hashlibimport gensim#原文件加上一行成为gensim可读的格式def prepend_slow(infile, outfile, line): """ Slower way to prepend the line by re-creating the inputfile. """ with open(infile, 'r',encoding= 'utf-8') as fin: with open(outfile, 'w',encoding= 'utf-8 ') as fout: fout.write(line + "\n") for line in fin: fout.write(line)def checksum(filename): """ This is to verify the file checksum is the same as the glove files we use to pre-computed the no. of lines in the glove file(s). """ BLOCKSIZE = 65536 hasher = hashlib.md5() with open(filename, 'rb') as afile: buf = afile.read(BLOCKSIZE) while len(buf) > 0: hasher.update(buf) buf = afile.read(BLOCKSIZE) return hasher.hexdigest()# Pre-computed glove files values.pretrain_num_lines = {"glove.840B.300d.txt": 2196017}def check_num_lines_in_glove(filename, check_checksum=False): return pretrain_num_lines[filename]# Input: GloVe Model File# More models can be downloaded from http://nlp.stanford.edu/projects/glove/glove_file = "glove.840B.300d.txt"_, tokens, dimensions, _ = glove_file.split('.')num_lines = check_num_lines_in_glove(glove_file)dims = int(dimensions[:-1])# Output: Gensim Model text format.gensim_file = 'glove_model.txt'gensim_first_line = "{} {}".format(num_lines, dims)# Prepends the line.prepend_slow(glove_file, gensim_file, gensim_first_line)# Load modelmodel =gensim.models.KeyedVectors.load_word2vec_format('glove_model.txt')model.syn0norm = model.syn0 # prevent recalc of normed vectorsmodel.word_vec('computer') #obtain word vectorprint(model.most_similar(positive=['australia'], topn=10))print(model.similarity('woman', 'man'))
阅读全文
0 0
- 读glove文件的代码
- glove背后的计算原理(进一步理解glove实现细节)
- Windows版本的Google word2vec和Stanford GloVe工具
- GloVe学习之Python中简单的词向量SVD分解
- PaperWeekly 第52期 | 更别致的词向量模型:Simpler GloVe
- PaperWeekly 第53期 | 更别致的词向量模型:Simpler GloVe
- glove入门实战
- glove入门实战
- GloVe使用心得
- word2vec 与 Glove 对比
- 深度学习-gloVe模型
- GloVe学习笔记
- glove 学习笔记
- GloVe 词向量模型
- C代码读bin文件的注意事项
- 下载文件的代码
- 文件分割的代码
- 读取文件的代码
- 6 招教你提高网站速度
- 迷之好奇
- 欢迎使用CSDN-markdown编辑器
- java.net.ConnectException:
- 图标与文本
- 读glove文件的代码
- onselectstart与onselect—禁止选择或禁止复制
- 使用vue-router切换页面时,获取上一页url以及当前页面url
- Mac OS Sierra忘记root密码如何重置
- 模糊查询:有几种情况1.中文 2.连起来的拼音查询 3.首字母查询 需要一个jar包pinyin4j
- Groovy入门
- C语言FILE结构体
- 认证机构信息管理软件最实用的功能
- 创建型模式—抽象工厂模式