#Paper Reading# Neural Extractive Summarization with Side Information

来源：互联网发布：电子图书数据库读秀编辑：程序博客网时间：2024/05/19 16:33

论文题目：Neural Extractive Summarization with Side Information
论文地址：https://arxiv.org/abs/1704.04530
论文发表于：arXiv(preprint) 2017

论文大体内容：
这篇通过增加side information（title, image caption）到单文档抽取式自动文本摘要中，使用层次式的document encoder和attention-based extractor的方法，在覆盖的信息量以及流畅性上面比没加side information的方式有了明显的提升。

1、本文关注点在于单文档，extraction式摘要；

2、抽取句子的方式是：按句子顺序，每个句子二分类，判断是否要加入摘要中；

3、模型结构图，包含3个组件：
①CNN sentence encoder;
②RNN document encoder;
③RNN(attention-based) sentence extractor
这里写图片描述

4、CNN sentence encoder（单层CNN）
①使用word2vec训练training set中的words，得到每个word的200维向量；
②1层的卷积+池化，得到sentence的embedding；

5、RNN document encoder
①使用LSTM的RNN；
②document的句子按逆序输入，防止漏掉前几个句子的作用；

6、RNN(attention-based) sentence extractor
①输入为sentence+side information
②输出为每个句子0/1，代表是否抽取该句子作为摘要；

实验
7、Dataset：CNN dataset（CNN articles），90K的training set，1220个validset，1093个testset；

8、Side information包括title，image caption，first sentence；

9、Baseline：
①LEAD-3：直接选择前3个句子作为摘要结果；
②POINTER-NET：没有用side information，但是使用了attention机制；
③SEQ2SEQ：a simple sequential encoder-decoder model which does not
use any side information；

10、评测标准：ROUGE

11、Side information的选择，发现选择title+caption能达到最好的效果
这里写图片描述

12、文摘截取长度对比
这里写图片描述

13、人工评测结果（选择20篇testset的文档，请5个人进行人工评估）
这里写图片描述

以上均为个人见解，因本人水平有限，如发现有所错漏，敬请指出，谢谢！

0 0