short text conversation: neural network

来源：互联网发布：学习ios软件编辑：程序博客网时间：2024/05/20 05:53

A Neural Conversational Model, Google, 2015

主要是利用了seq2seq的结构（如下图所示），并且在固定领域IT和开放领域数据库上进行了测试，最终人工进行了评测，得到不错的结果。
这里写图片描述

seq2seq的方式不能进行多轮的对话，即不能保证对话前后的一致性，是该模型的限制。但是现在STC中几大待解决的问题包括：

IT

句子最大长度：400 words
training set: 30M tokens
validation set: 3M tokens
数据清洗：removing common names, numbers, and full URLs.（数据来源：IT helpdesk troubleshooting chat service）

训练模型的参数：
a single layer LSTM * 1024 memory cells
stochastic gradient descent with gradient clipping–随机梯度下降法
vocabulary的大小：20k

open domin

training set: 62M sentences(923 tokens)
validation set: 26M sentences(395M tokens)
测试集中的句子未出现在训练集中。
数据清洗：removing XML tags and obvious non-conversational text（数据来源：OpenSubtitles
dataset (Tiedemann, 2009)，原始数据为XML数据）

训练模型的参数：
a two-layered LSTM × 4096 memory cells
AdaGrad with gradient clipping.
vocabulary的大小：100k
??? To speed up the soft-max, we project the memory cells to 2048 linear units be-fore feeding the information to the classifier.

Neural Responding Machine for Short-Text Conversation, HuaWei 2015

综述：
主要是利用了三种attention的方式: NRM-global、NRM-local、NRM-hybrid，利用这三种方式训练得到的模型结果和LiHang检索式的方式及Statistic Machine Translation两种模型结果进行了对比。 Retrieval方式和global方式的结果持平，最优秀的是hybrid的方式。
用NRM-glo、NRM-loc分别单独训练的参数来初始化NRM-hyb，然后再训练NRM-hyb。

利用weibo 4.4million数据集，经清洗后得到如下训练集，可参考其清洗的方式：
training pairs: 4,435,959 sentences //用了自己的分词
test posts: 110

posts和responses的区别：
1). post中不同的words总共有125,237个，但是response中不同的words有679,958；
2). 分别构建了两个独立的vocabulary，大小为40,000，分别涵盖了post中97.8%的词，涵盖了response中96.2%的词。

encoder/decoder hidden states: 1000
word-embedding for post & response: 620
Model parameters initialized: [-0.1, 0.1]
SGD mini-batch/GPU/2 weeks

这里写图片描述

A Diversity-Promoting Objective Function for Neural Conversation Models, PKU 2016

有意思的一篇文章
这里写图片描述

阅读全文

0 0