RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems
来源:互联网 发布:mac虚拟机共享文件 编辑:程序博客网 时间:2024/06/08 06:38
- Introduction
- Methodology
- Referenced Metric
- Unreferenced Metric
- Hybrid Evaluation
- Experiments
RUBER全称:Referenced metric and Unreferenced metric Blended Evaluation Routine.
Introduction
照例把BLEU, METEOR, ROUGE等方法批判一通,lowe的文章的那个方法(towards xxx )需要大量的人工标注,不flexible也不extensible。
RUBER:
* Embedding-based scorer, referenced metric。衡量生成的reply和groundtruth之间的相似性(similarity)。
* Neural network-based scorer,unreferenced metric。衡量生成的reply和它的query之间的相关性(relatedness)。采用负采样的方法来训练网络,不需要人工标注
Methodology
Referenced Metric
采用 vector pooling
[·]表示向量的维度
另外还有 min pooling,将两个concatenate在一起,
衡量
通过这种pooling的方法,可以较好的提取出uncommon words的信息,而且比vector extrema更具有鲁棒性。
Unreferenced Metric
衡量reply
使用双向rnn,将两个方向的最后一个状态concatenate起来作为sentence embedding;
并且引入了一个quadratic feature
使用MLP(多层感知机)得出
为了训练这个网络,采用的是负采样的方法,因此可以避免人工标注数据。方法为:给定一个groundtruth query reply pair, 从训练数据中随机选择另一个reply
Hybrid Evaluation
先对metric的结果
Experiments
数据是从豆瓣论坛上爬的中文,对两类对话系统进行了评估:基于特征的retrieval-and-reranking system,和seq2seq生成模型。
- RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems
- BLEU: a Method for Automatic Evaluation of Machine Translation
- [文献阅读] Bleu: a Method for Automatic Evaluation of Machine Translation
- P. Laguna/Evaluation of an Automatic Threshold Based Detector of Waveform Limits in Holter ECG
- [文献阅读] METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
- Just DIAL: DomaIn Alignment Layers for Unsupervised Domain Adaptation
- 笔记-2008-An Empirical Comparison of Goodness Measures for Unsupervised CWS with a ~
- A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries For Storage
- Apache httpd, an open-source HTTP server for modern operating systems including UNIX and Windows NT
- [cvpr2017]Deep Hashing Network for Unsupervised Domain Adaptation
- Pedestrian Detection: An Evaluation of the State of the Art
- An Analysis of Single-Layer Networks in Unsupervised Feature Learning
- This executable was created with an evaluation version of exe4j
- An Evaluation of Bong Joon-ho’s Snowpier
- An Evaluation Framework for MPEG video transmission in NS2 environment
- UNSUPERVISED CROSS-DOMAIN IMAGE GENERATION
- Database Management Systems (Query Evaluation)
- 12 Benefits of Golang For Enterprise Systems
- 关于在线安装vs2017缺少头文件的解决办法
- Thinkphp的where条件
- asp.net使用微信发布模版消息核心代码
- 在Azure上部署微服务项目的经验分享
- linux不同类型包的安装
- RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems
- 关于MySQL 1045等错误问题
- angular 路由跳转读取参数
- 最易理解的自动拆箱和装箱
- fusionCharts——拆线图
- Leetcode--268. Missing Number
- eclipse项目名称
- RecyclerView和listview的比较
- 链表中环的入口结点