Reader-Aware Multi-Document Summarization: An Enhanced Model and The First Dataset

来源：互联网发布：富人国知乎编辑：程序博客网时间：2024/06/05 10:07

Piji Li, Lidong Bing, Wai Lamy
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong AI Lab, Tencent Inc., Shenzhen, China
EMNLP 2017

Background

RA-MDS(Reader-aware multi-document summarization)
Jointly consider news documents and reader comments when generating the summaries
Chanllege of RA-MDS
同时考虑新闻和评论时如何进行salience estimation，模型对有分歧的评论必须足够敏感，评论内容很杂乱

VAE(Variational Auto-Encoders)

变分自动编码器，背后的数学比较复杂
在输入输出维度满足要求的前提下，decoder 以为任何结构——MLP、CNN，RNN 。
由于输入数据规一化到 [0, 1] 区间，因此，令 decoder 的输出也在这个范围内。可通过在 decoder 的最后一层加上 sigmoid 激活实现
这里写图片描述

VAE——sampling

Reparemerization
N(0,1) 上采样 ϵ，z=σ⋅ϵ+μ
这里写图片描述
重新参数化的技巧
由于 z∼N(μ,σ)，我们应该从 N(μ,σ) 采样，但这个采样操作对 μ 和 σ 是不可导的，导致常规的通过误差反传的梯度下降法（GD）不能使用。
通过 reparemerization，我们首先从 N(0,1) 上采样 ϵ，然后，z=σ⋅ϵ+μ。这样，z∼N(μ,σ)，而且，从 encoder 输出到 z，只涉及线性操作（ϵ 对神经网络而言只是常数），因此，可以正常使用 GD 进行优化。

Cost function

x∈[0,1], using cross entropy to measure the difference between x and x’ :
这里写图片描述

Blog summarization
Explore the effect of comments or social contexts in single document summarization
A sparse coding based framework for RA-MDS
Only used the bag-of-words method , cannot capture the complex relationship between documents and comments
1.也是同时考虑了评论，但是针对单文档摘要
2.基于稀疏编码的RA-MDS框架，通过无人监督数据重建策略，联合考虑新闻文件和读者评论
只用bag-of-words来表示文本，不能捕捉新闻和评论之间复杂的关系

Based model

A sentence salience estimation framework
Salience Estimation via Variational Auto-Encoders for Multi-Document Summarization(Piji Li, Zihao Wang, Wai Lam, Zhaochun Ren, and Lidong Bing. 2017)

A sentence salience estimation framework

这里写图片描述
在解码过程中通过数据重构方法评估句子的显著性
x是句子向量，z是相应的潜在语义向量。 Sz是潜在的方向向量。
Sh和Sx是隐藏的向量和输出aspect term vectors.
Mh和Mx是用于基于神经对齐机制来精化Sh和Sx的两个存储器。
A是包含句子显着性信息的重构系数矩阵。

Purpose

Jointly consider news documents and reader comments to improve the MDS performance

Model

这里写图片描述
模型建立在前面的sentence salience estimation framework
把新闻和评论都放进VAEs，模型具备了共同获取新闻和评论信息的能力
因为评论中有很多噪声，设计了加权组合机制，评论权重估计

The news sentence salience estimation is conducted by an unsupervised data reconstruction framework.
这里写图片描述

Alignment mechanism
To recall the lost detailed information from the input sentence.
Sz是m维用于重建所有的潜在语义向量的的潜在方向向量，m远远小于n
VAEs解码的过程中，可以将Sz映射到Sh，然后产生m维新的aspect term vector
对齐机制，对于每个解码器隐藏状态将它与每个编码器隐藏状态对齐
从输入的句子中召回丢失的详细信息，对齐的过程中对评论的隐藏状态也采用了权重p

Optimization objective

这里写图片描述

Calculating p

The most important variable
(1)For all the news sentences Xd and all the comment sentences Xc, calculate the relation matrix
(2)Add an average pooling layer to get the coefficient value for each comment sentence
(3) add a sigmoid function
这里写图片描述
计算p的基本思想就是，如果评论句子与新闻内容更相似，那么它包含较少的噪音信息。
(1)对于所有的新闻句子Xd和所有的评论句子Xc，计算关系矩阵
(2)添加一个平均合并图层来获取每个评论句子的系数值
(3)使用sigmoid函数调整系数值在(0,1)的范围

Vector space

The comment weight will be different in different semantic vector space.
Merge the weights by a parameter lamada
这里写图片描述
不同的向量空间对同一个句子表示形式不同
在本文中，使用了两个向量空间由VAEs获得的潜在语义空间，和原来的字袋向量空间
pz和px分别从潜在的语义空间和term vector space中计算得来
P实际上就是用来控制评论句子的比例

Summary Construction

Based on the parsed constituency tree for each input sentence, we extract the noun-phrases (NPs) and verb-phrases (VPs)
Objective function is formulated as an integer linear programming (ILP) problem
这里写图片描述