Fear the REAPER A System for Automatic Multi-Document Summarization with Reinforcement Learning
来源:互联网 发布:plsql查看数据库表 编辑:程序博客网 时间:2024/06/08 00:51
Cody Rioux, Sadid A. Hasan, Yllias Chali
Abstract
- Achieve the largest coverage of the docu
ments content.目标的覆盖整个文档的内容 - Concentrate distributed information to hidden units layer by layer. 通过一层一层的隐藏单元集中分散的信息
- the whole deep architecture is fine tuned by minimizing the information loss of reconstruction validation. 整个框架是减少重建确认时发生的信息丢失
- According to the concentrated information, dynamic programming is used to seek most informative set of sentences as the summary
DP被用来计算最有信息量的集合,来作为摘要
Relatedwork
- We explore the use of SARSA which is a derivative of TD(lamada) that models the action space in addition to the state space modelled by TD(lamada). Furthermore we explore the use of an algorithm not based on temporal difference methods, but instead on policy iteration techniques
- REAPER (Relatedness-focused Extractive Automatic
summary Preparation Exploiting Reinfocement learning)
以相关性为中心的抽取自动摘要准备利用强化学习
Motivation
TD(lamada) is relatively old as far as reinforcement learning (RL)
algorithms are concerned, and the optimal ILP did not outperform ASRL using the same reward function.
强化学习有很大打提升空间
基于查询的摘要得到广泛关注
不对句子压缩的效果做进一步探讨
Model
- TD(lamada)
时间差(TD)学习是一种基于预测的机器学习方法。它主要用于强化学习问题,据说是“ 蒙特卡罗思想和动态规划(DP)思想的结合”。[1] TD类似于蒙特卡洛方法,因为它根据某种策略通过对环境进行采样来学习,并且与动态规划技术相关,因为它基于先前学习的估计来逼近其当前估计(称为自举)。TD学习算法与动物学习的时间差模型有关。[2]
temporal difference methods-wiki - Approximate Policy Iteration
近似策略迭代(API)遵循一个不同的范式,通过迭代地改进马尔可夫决策过程的策略,直到策略收敛为止。 - Sarsa算法
Q算法是当选择下一步的时候 会找最好的一个走(选最大Q值的) 而sarsa是当选择下一步的时候 运用和上一步一样/想等的Q值 但是最后都会更新之前的一步从而达到学习的效果~
On-policy Sarsa算法与Off-policy Q learning对比
Experiment
- Feature Space depends on the presence of top bigrams,而不用
tf *idf words - Reward Function
based on the n-gram concurrence score metric
the longest-common-subsequence recall metric- Immediate Rewards
- Query Focused Rewards
阅读全文
0 0
- Fear the REAPER A System for Automatic Multi-Document Summarization with Reinforcement Learning
- Framework of Automatic Text Summarization Using Reinforcement Learning
- Query-Oriented Multi-Document Summarization via Unsupervised Deep Learning
- learning to communicate with deep multi-agent reinforcement learning
- Abstractive Document Summarization with a Graph-Based Attentional Neural Model
- Ranking with Recursive Neural Networks and Its Application to Multi-document Summarization
- Reader-Aware Multi-Document Summarization: An Enhanced Model and The First Dataset
- A Survey on Automatic Text Summarization
- Policy Gradient Methods for Reinforcement Learning with Function Approximation
- Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning
- NOTE:Deep Reinforcement Learning with a Natural Language Action Space
- 多文档自动文摘:Multi-Document Summarization,MDS
- #Paper Reading# Manifold-Ranking Based Topic-Focused Multi-Document Summarization
- reinforcement learning for Flappy bird
- 论文读书笔记-automatic text summarization for annotating images
- 《reinforcement learning:an introduction》第一章《The Reinforcement Learning Problem》总结
- A System for Rapid, Automatic Shader Level-of-Detail
- A LAN connection is already configured with the IP address that is required for automatic IP address
- 吴恩达机器学习课程之参数拟合
- Codeforce 9c Hexadecimal's Numbers
- eclipse java通过jackess操作access数据库
- callbacks
- 输入输出流:实验1
- Fear the REAPER A System for Automatic Multi-Document Summarization with Reinforcement Learning
- 黑客入侵监狱系统放囚犯,换来十年监禁、25 万美元罚款
- Qt事件与信号(一)——重新实现事件处理器
- 解决ubuntu下提示未信任的应用启动器的问题
- Session
- 排序输出
- java怎么用代码获取类所在的包名
- 【Java】1.BlockQueue及简易版生产者消费者模型
- cf320b