一种基于神经网络的对话模型

来源：互联网发布：淘宝授权书编辑：程序博客网时间：2024/05/21 08:41

摘要

　　　对话模型在自然语言理解和机器智能方面是一个非常重要的任务。尽管之前已经有一些方法，但是他们受限于特定的领域（比如：预定机票）而且需要手动指定规则。在这篇文章中，我们针对这个任务展现了一种简单的方法，这种方法使用最近被提出的sequece to sequece 框架。我们的模型通过给定之前的对话中的句子来预测下一个句子的方式进行交谈。我们模型的有点就是它可以进行端到端的训练，因此需要风少的手动指定规则。我们发现这个直接的模型可以通过一个大量的基于对话的训练数据集生成简单的对话。我们的初步结果指出，尽管优化了错误的目标函数，但是模型仍然运行的很好。它能从某个特定领域数据集和一个大量的包含噪音的电字母一般性数据集中抽取知识。在一个特定领域的IT帮助台数据集中，模型可以冲过对话找出解决耨一个技术问题的方案。通过一个包含噪音的开放性的电影剧本数据集中，该模型可以执行简单的形式的常识推理。正如期望的一样，我们也发现了我们的模型一种常见的失败模式-缺乏连贯性。
　　　
一、引言

　　　神经网络的端到端训练的进展在许多领域都得到了显著的进展，如方言识别，计算机视觉和语言来处理。最近的工作表明，神经网络可以做的不仅仅是单纯的分类，他们可以用来映射复杂的其他复杂结构的结构。举个例子，有一个将一个序列映射成另一个序列的任务，这种任务在自然语言理解方面有直接的应用。这个框架的主要优点是，他需要很少的特征工程和领域特征，同时匹配或超越了国家的先进成果。这种进步，在我们看来，允许研究人员所工作的领域的任务涉及的知识可能不是现成的，或针对任务很难设计手动规则。对话模型可以直接受益于这一提法因为它需要查询和查询之间的映射关系响应。由于这种影响的复杂性，设计会话模型都是在非常狭窄的领域，在特征工程方面也需要有所保证。我们采用对话模型试验的目的就是采用递归神经网络，通过给定序列预测下一序列。我们发现这种方法在产生流利和准确的对话的答复方面可以做得很好。
　　　我们在一个IT帮助台的对话数据集的聊天会话上测试模型，发现该模型有时可以跟踪问题并提供一个有用的答案给用户。我们也从一个嘈杂的电影字幕的数据集的对话中实验，发现该模型可以有一个自然的对话，有事会执行常识推理的简单形式。在这两个案例中，递归神经网络相比n元文法模型，得到更好的困惑度以及捕捉到重要的长范围相关性。从定性的角度来看，我们的模型有时能够产生自然的对话。

二、相关研究

　　　我们的方法是基于最近的研究，该研究提出使用神经网络将一个序列映射到另一个序列。这个框架已经在神经机器翻译得到应用以及在沃尔玛14年数据集上进行应发和英德翻译任务方面得到提高。它也被用于其他任务，如图像解析和图像字幕。众所周知，普通Rnn网络容易造成梯度消失，大多数研究人员使用长的变体短期记忆（LSTM）递归神经网络（霍克赖特和所说的，1997）。
　　　我们的工作也受到基于神经网络的语言模型的成功的启发，它表明神经网络是自言语言的一种相当有效的模型。最近，索尔多尼和Shang等人应用递归神经网络到剪短的对话模型上。
　　　建造机器人和对话代理一直是许多研究人员在过去几十年里的追求，本文不再提供详细的参考列表。然而，大多数这些系统需要一个相当复杂的有许多阶段构成的处理管道。我们的工作不同于传统的系统。它提出一个终端到终端的方法来解决这个问题，不需要领域知识。原则上，它可以结合其他系统，重新评定候选回答的短列表，但我们的工作是基于生成的答案给定一个概率模型进行训练，以最大限度地给出了基于一些上下文的答案的概率。

三、模型

　　　我们的方法使用序列到序列（seq2)框架所描述的。该模型是基于一个递归神经网络，一次读取一个输入序列，并预测输出序列。在训练过程中，真正的输出序列由模型给出，所以可以通过反向传播学习模型。模型训练是为了最大限度的提高由上下文提供的正确的序列的交叉熵。在推理过程中，给定的真正的输出序列并没有看到，而是简单的将预测的输出作为输入来预测下一个输出。这是一种贪婪的推导方法。一种不那么贪婪的方法时使用波束搜索，并在以前的过程中得到几个候选方案传给下一步。这种预测的序列可以是基于概率选择的。
　　　
　　　集体来讲，假设我们观察到一个两轮对话，第一个人说：”ABC”,第二个人说：“WXYZ”。我们可以使用一个递归神经网络，如上图1所示，训练将”ABC”映射到“WXYZ”,当它接收到的模型的隐藏状态序列结束符号“< EOS >”可以看作是结束。
　　　该模型的有点在于它的简单性和通用性。我们可以应用这个模型在机器翻译，QA问答系统，和没有重大改变的对话中。将这种技术应用到会话建模也非常简单：简单的序列可以是由到目前为止的上下文转换得到的一系列序列，输出序列则是答复。
　　　不同于翻译这样简单的任务，一个像sequece to sequece这样的模型将无法成功解决建模对话的问题，因为一个明显的简化：目标函数优化不捕获通过人类沟通得到的实际目标实现，这是典型的长期的基于信息交换的而不是基于下一步预测。一个模型缺乏一致性和一般的世界知识是另一个明显的纯粹的无监督模型的局限性。

四、数据集

　　　在我们的试验中采用两类数据集：一个封闭的IT帮助台故障修复数据集和一个开放的电影抄本数据集。这两部分数据集的细节如下：
　　　
4.1 IT帮助台故障修复数据集

　　　在我们的第一组试验中，我们使用了一个从IT故障排除聊天服务中获取的数据集，在这个服务中，消费者面对电脑相关的问题，一个专家通过交谈和提出解决方案帮助他们解决问题。典型的相互作用（或线程）是400个单词长度，并且轮流作用是明确的信号。我们的训练数据集是30M令牌，3M被用作验证。进行了一定量的数据清洗，如删除共同的名称、数字和完整的网址。
　　　
4.2 开放电影字幕数据集

　　　我们还在开发的电影字幕数据集上测试了我们的模型。该数据集包含XML格式的电影对话。它包含电影中人物所说的话。我们使用简单的处理步骤删除XML标签和明显的非会话文本（例如：超链接）数据集。由于轮流说话没有明确表示，我们将连续的句子看做是不同的角色的人说的。我们基于上一句来训练模型来预测下一句，我们针对每一句都这么做（注意到这是我们数据集的一杯并且每一个机组都用于上下文和目标）。我们的训练集和验证集是分开的，训练集样本是62M的序列（923令牌），验证集26M(395M 令牌）。这样分割是为了似的一组序列要么出现在训练集中，要么出现在测试集中但不是同时出现。不想之前的数据集，该opensubtitles是相当大的和嘈杂的，因为连续的句子有可能是同一个任务所说。给定广泛的电影，和技术故障排除数据集相比，这是一个开放域的会话数据集。

五、实验
　　　
　　　在这一节中，我们描述了基于两个数据集的试验结果，并显示了一些我们训练的系统给出的相互作用的样本。我们使用一组200个问题的评价数据集，将我们的系统的性能和一个流行的以规则为基础的的Bot做了比较。
　　　
5.1 IT故障排除实验

　　　在这个实验中，我们使用了一个简单的LSTM层，使用1024个单元，采用梯度下降法。词汇是最常见的20K个，包含特殊标记指示轮流说话和角色。
　　　在收敛时，该模型实现了困惑度为8，而一个N-gram模型达到了18.下面是几个样本，模拟最多三个故障排除会话（远程访问，软件崩溃，和密码问题）。这这些谈话中，机器是我们的神经会话模型，人类担任和它对话的角色。
　　　
Conversation 1: VPN issues

　　　描述你的问题： i am having issues accessing vpn
　　　Machine: hi
　　　Human: hello
　　　Machine: could you please let me know what are the operating systems you are using ?
　　　Human: i am using linux
　　　Machine: ok it was in the ssl vpn on which you stated it was broken
　　　Machine: please check if you are logged in to corp network
　　　Machine: what is the error that you are running please
　　　Human: i am seeing an error related to vpn
　　　Machine: what is the error message that you are getting when connecting to vpn using network connect ?
　　　Human: connection refused or something like that
　　　Machine: may i know the version of network connect you connect ?
　　　Human: i am not sure i know that
　　　Machine: is the network connect prompting for any user certificate
　　　Machine: ?
　　　Human: yes
　　　Machine: are you at home or at work at the moment ?
　　　Machine: office ?
　　　Human: i am at home
　　　Machine: try this
　　　Machine: goto page and enter your name and password
　　　Human: done
　　　Machine: try logging in with and check if its working fine now
　　　Human: yes , now it works !
　　　Machine: great
　　　Machine: anything else that i can help
　　　Human: no , that was very useful
　　　Machine: you are other than : )
　　　Machine: have a nice night
　　　
Conversation 2: Browser issues.

　　　Describe your problem: browser keeps crashing
　　　Machine: hi
　　　Human: hi
　　　Machine: could you please let me know what is
　　　the operating system you are using ?
　　　Human: linux
　　　Machine: ok this is the issue with the system performance , nasser , tools .
　　　Human: what is that ?
　　　Machine: is it a windows machine ?
　　　Machine: we have linux on your machine .
　　　Machine: is it windows xp or windows 7 ?
　　　Human: i am using linux
　　　Machine: let me take a remote of your machine and check if issue has been pushed properly .
　　　Human: ok
　　　Machine: please open terminal and run the below commands in one computer :
　　　Machine: sudo apt - get update
　　　Machine: sudo apt - get upgrade
　　　Machine: sudo - updater
　　　Human: done
　　　Machine: let me try connecting now
　　　Human: ok
　　　Machine: its done
　　　Machine: should use the second or hit run on
　　　the background
　　　Human: done
　　　Machine: try signing out now and follow the
　　　prompts as per the message given above
　　　Machine: shd i reboot the machine .
　　　Human: ok , please go ahead
　　　Machine: once its completed please try to login
　　　.
　　　Human: sure
　　　Machine: please ping me back once you have tried to login
　　　Machine: are we still connected
　　　Human: i now logged in
　　　Machine: thanks for your confirmation
　　　Human: what should i do now ?
　　　Machine: wait for 1 5 minutes to complete the login policy
　　　Machine: and then try logging in
　　　Human: ok
　　　Machine: is there anything else to help you ?
　　　Human: let my try browsing a little bit and if it keeps crashing i will let you know
　　　Machine: sure will do
　　　Machine: seems the mail is not loading
　　　Machine: are we still connected to the chat ?
　　　Machine:
　　　Human: it seems good now , cool , thank you ,bye
　　　Machine: you , too
　　　Machine: bye
　　　
　　　
Conversation 3: Password issues

Describe your problem: i forgot my password

　　　Machine: hi , this is from helpdesk connect , what can i help you with today ?
　　　Human: hi , i forgot my password
　　　Machine: ok great
　　　Human: can you tell me how i can recover it ?
　　　Machine: i ’ ll need to verify who it is at that machine
　　　Machine: can we do a hangout
　　　Machine: ?
　　　Human: yes
　　　Machine:
　　　Human: cool , i am good now
　　　Machine: how ‘s this looking good ?
　　　Machine: i ‘ll go ahead and close this chat .
　　　Machine: take care . : )
　　　Human: bye
　　　Machine: bye .

5.2 开放电影字幕数据集实验
　　　
　　　在这个实验中我们使用一个两层的LSTM，使用adaGrad进行梯度剪裁。每一层的LSTM有4096个记忆单元，我们使用最常见的100K个单词建立词汇。为了加快softMax过程，，在到达最终的分类器之前，先将及一单元信息发射给2048个线性单元。
　　　在收敛时，该递归模型得到的困惑度为17，我们的平滑5-gram模型实现了28的困惑度。有趣的是，添加soft attention 机制并没有显著提升训练集或校验集的困惑度。

5.2.1 模型样本
　　　
　　　除了采用困惑度度量的方式，我们的递归模型也经常产生似是而非的答案。在这里我们给出一些样本问答对，使我们能够理解模型在训练包含噪音且大量的来自电影开放性字幕数据集时的能力。再次，这这些谈话中，机器是我们的神经会话模型，人类担任和它对话的角色。
　　　
Conversation 1: Basic

　　　Human: hello !
　　　Machine: hello !
　　　Human: how are you ?
　　　Machine: i ’m good .
　　　Human: what ’s your name ?
　　　Machine: i ’m julia .
　　　Human: when were you born ?
　　　Machine: july 20th .
　　　Human: what year were you born ?
　　　Machine: 1977 .
　　　Human: where are you from ?
　　　Machine: i ’m out in the boonies.
　　　Human: are you a man or a woman ?
　　　Machine: i ’m a woman .
　　　Human: why are we here ?
　　　Machine: i ’m not sure .
　　　Human: okay, bye !
　　　Machine: bye .
　　　Human: see you later .
　　　Machine: bye .
　　　
Conversation 2: Simple Q&A (contexts and multiple
choice)

　　　Human: my name is david . what is my name ?
　　　Machine: david .
　　　Human: my name is john . what is my name ?
　　　Machine: john .
　　　Human: are you a leader or a follower ?
　　　Machine: i ’m a leader .
　　　Human: are you a follower or a leader ?
　　　Machine: i ’m a leader .
　　　
Conversation 3: General knowledge Q&A

　　　Human: who is skywalker ?
　　　Machine: he is a hero .
　　　Human: who is bill clinton ?
　　　Machine: he ’s a billionaire .
　　　Human: is sky blue or black ?
　　　Machine: blue .
　　　Human: does a cat have a tail ?
　　　Machine: yes .
　　　Human: does a cat have a wing ?
　　　Machine: no
　　　Human: can a cat fly ?
　　　Machine: no .
　　　Human: how many legs does a cat have ?
　　　Machine: four , i think .
　　　Human: how many legs does a spider have ?
　　　Machine: three , i think .
　　　Human: how many legs does a centipede have ?
　　　Machine: eight .
　　　Human: what is the color of the sky ?
　　　Machine: blue .
　　　Human: what is the color of water ?
　　　Machine: water .
　　　Human: what is the color of blood ?
　　　Machine: it is the same as a black eye .
　　　Human: what is the usual color of a leaf ?
　　　Machine: it is a green one .
　　　Human: what is the color of a yellow car ?
　　　Machine: yellow .
　　　Human: how much is two plus two ?
　　　Machine: four .
　　　Human: how much is ten minus two ?
　　　Machine: seventy - two .
　　　
Conversation 4: Philosophical Q&A

　　　Human: what is the purpose of life ?
　　　Machine: to serve the greater good .
　　　Human: what is the purpose of living ?
　　　Machine: to live forever .
　　　Human: what is the purpose of existence ?
　　　Machine: to find out what happens when we get to the planet earth .
　　　Human: where are you now ?
　　　Machine: i ’m in the middle of nowhere .
　　　Human: what is the purpose of dying ?
　　　Machine: to have a life .
　　　Human: what is the purpose of being intelligent?
　　　Machine: to find out what it is .
　　　Human: what is the purpose of emotions ?
　　　Machine: i don ’t know .
　　　
Conversation 5: Morality

　　　Human: what is moral ?
　　　Machine: what empowered humanity , what intellectual the essence is .
　　　Human: what is immoral ?
　　　Machine: the fact that you have a child .
　　　Human: what is morality ?
　　　Machine: what is altruism ?
　　　Human: what is the definition of altruism ?
　　　Machine: if you don ’t believe in god , then you don ’t know .
　　　Human: ok … so what is the definition of morality ?
　　　Machine: well , the truth is , you ’re not a believer in god almighty .
　　　Human: tell me the definition of morality , i am quite upset now !
　　　Machine: i ’m not ashamed of being a philosopher !
　　　Human: tell me the definition of morality .
　　　Machine: i don ’t have ethics .
　　　Human: look , i need help , i need to know more about morality …
　　　Machine: i don ’t know what ethics is .
　　　Human: seriously , what is morality ?
　　　Machine: what is the definition of living ?
　　　Human: why living has anything to do with morality ?
　　　Machine: you ’re not a cop .
　　　Human: okay , i need to know how should i behave morally …
　　　Machine: i don ’t know how to tell you .
　　　Human: is morality and ethics the same ?
　　　Machine: yes , sir .
　　　Human: what are the things that i do to be immoral ?
　　　Machine: i don ’t know .
　　　Human: give me some examples of moral actions …
　　　Machine: i ’m not a moralist .
　　　Human: alright , morality ?
　　　Machine: integrity .
　　　Human: what is integrity ?
　　　Machine: i ’m sorry , i don ’t know what else to say .
　　　Human: be moral !
　　　Machine: be a man !
　　　Human: i really like our discussion on morality and ethics …
　　　Machine: and how i ’m not in the mood for a philosophical debate .
　　　Human: what do you like to talk about ?
　　　Machine: nothing .
　　　
　　　Conversation 6: Opinions (notice the model outputs “he” ,
　　　“she” or “it” depending on the subject being asked)
　　　
　　　Human: what do you think about tesla ?
　　　Machine: he ’s a good conductor .
　　　Human: what do you think about bill gates ?
　　　Machine: he ’s a good man .
　　　Human: What do you think about messi ?
　　　Machine: he ’s a great player .
　　　Human: what do you think about cleopatra ?
　　　Machine: oh , she ’s very regal .
　　　Human: what do you think about england during the reign of elizabeth ?
　　　Machine: it was a great place .

　　我们发现它令人鼓舞的是，该模型可以记住事实，理解上下文，执行常识推理,没有传统管道的复杂性。让我们惊讶的是，该模型没有任何明确的除了在词向量中的参数之外的知识表示组件。也许最实际意义的是，事实上，模型可以推广到新的问题。换言之，它不仅仅是简单的通过现有数据库匹配答案。事实上，大多数以上提出的问题，除了第一次谈话，都不出现在训练集上。然而，这个基本模型的缺点是它只是给出简单的短的，有时候不满意的答案。也许一个更大的问题的缺点是，该模型不捕获一致的个性。事实上，如果我们问不相同，但语义上类似的问题，答案有时可以是不一致的。这是我们预期的，因为我们模型的简单以及实验的数据集造成的。下面的对话说明了这样的失败：
　　　
　　　Conversation 7: Job and Personality
　　　
　　　Human: what is your job ?
　　　Machine: i ’m a lawyer .
　　　Human: what do you do ?
　　　Machine: i ’m a doctor
　　　
　　最后，作为和其他传统的基本规则的机器人比较，我们提供了一些在我们测试中某些问题的上，机器人的答案：
　　　
　　　Human: Are you a follower or a leader ?
　　　CleverBot: No !!
　　　Human: My name is david . what is my name ?
　　　CleverBot: How old are you David ?
　　　Human: My name is john . what is my name ?
　　　CleverBot: John
　　　Human: What is the purpose of being intelligent?
　　　CleverBot: The purpose of life is to love God and love people.
　　　Human: Is sky blue or black ?
　　　CleverBot: I am 200 years old.
　　　Human: What is the color of a yellow car ?
　　　CleverBot: Brown.
　　　Human: What is the usual color of a leaf ?
　　　CleverBot: What is the color of the apple in the fridge ?
　　　Human: What is the purpose of life ?
　　　CleverBot: My purpose it to forward my species,in other words to make it easier for future generations of mankind to live.
　　　Human: What do you think about tesla ?
　　　CleverBot: I am tesla.
　　　Human: What do you think about cleopatra ?
　　　CleverBot: Cool is cool.
　　　Human: What do you think about england during the reign of elizabeth ?
　　　CleverBot: I like him. Do you ?
　　　Human: What do you think about bill gates ?
　　　CleverBot: Lets talk about Link.
　　　Human: What is moral ?
　　　CleverBot: Morality is the quality of being in accord with standards of right or good conduct.
　　　Human: What is immoral ?
　　　CleverBot: What is the objective of sistetic life ?
　　　Human: Does a cat have a wing ?
　　　CleverBot: What is the name of your husband.
　　　Human: Can a cat fly ?
　　　CleverBot:: What is your husband’s name ?
　　　Human: Who is skywalker ?
　　　CleverBot: There are two of them, Anakin Skywalker and Luke Skywalker, both haracters from star Wars.
　　　Human: How many legs does a cat have ?
　　　CleverBot: An ant.
　　　Human: How many legs does a centipede have?
　　　CleverBot: A centipede’s number of legs varies from thirty to two hundred.
　　　Human: What is your job ?
　　　CleverBot: I work in mental health.
　　　Human: What do you do ?
　　　CleverBot: I have no job.

5.2.2 人类评估

　　为了公平和客观地比较我们的模型和CleverBot，我们挑选了200个问题，问了四个不同人去评估我们的模型和CleaverBot. 人类法官需要被问那个机器人是他们的首选，如果答案是同等质量的，我们也允许平局。如果四个人的三个法官同意，我们记录分数。我们的模型作为首选有97个问题，而CleverBot有60。20个问题上两者是平局，23个问题上法官存在分歧。这里有一些法官评价系统的样品（粗体我们展示他们的首选答案，第三个例子显示平局，和第四个例子显示diagreement）：
　　　Human: What is the deepest spot on the world ?
　　　NCM: The pacific ocean.
　　　CB: What ?
　　　
　　　Human: Do you like Mexican food or Indian
　　　food ?
　　　NCM: I hate it.
　　　CB: I love Mexican food.
　　　
　　　Human: Who are you crazy about ?
　　　NCM: You’re the one who’s crazy.
　　　CB: Who are you ?
　　　
　　　Human: What’s thirty plus forty ?
　　　NCM: Sixty ?
　　　CB: What should we talk about ?
　　　
　　然而，我们相信，设计一个好的快速度量对话模型的质量的指标仍然是一个开放的研究问题。我们通过手工检查方式，通过使用Mechanical Turk和基于规则的机器人作比较获取人类评估，通过计算困惑度的方式等。然而，这些都有明显的缺点，我们正在积极的追求这一研究方向以及其他的研究方向。

六、讨论

　　　在本文中，我们展示了一种简单的基于seq2seq框架的语言模型可以用来训练会话引擎。我们的结果显示它可以生成简单的基本的对话，并从嘈杂的但是开放域数据集中提取知识。尽管模型有明显的局限性，这是一个令人惊讶的一个纯数据驱动的方法，没有任何规则可以产生对许多类型的问题的正确答案。然而，该模型可能需要真正的修改来提供真实的对话。由于有许多限制，又缺乏连贯的个性，使得我们的系统很难通过图灵测试。

0 0