论文笔记: Hierarchical Question-Image Co-Attention for Visual Question Answering
来源:互联网 发布:开淘宝网店难吗 编辑:程序博客网 时间:2024/05/16 09:28
Hierarchical Question-Image Co-Attention for Visual Question Answering
JiasenLu∗,JianweiYang∗,DhruvBatra∗† ,DeviParikh∗† ∗Virginia Tech,†Georgia Institute of Technology {jiasenlu, jw2yang, dbatra, parikh}@vt.edu
Abstract
A number of recent works have proposed attention models for Visual Question Answering(VQA)thatgeneratespatialmapshighlightingimageregionsrelevantto answeringthequestion. Inthispaper,wearguethatinadditiontomodeling“where to look” or visual attention, it is equally important to model “what words to listen to” or question attention. We present a novel co-attention model for VQA that jointly reasons about image and question attention. In addition, our model reasons about the question (and consequently the image via the co-attention mechanism) in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN).Ourmodelimprovesthestate-of-the-artontheVQAdatasetfrom60.3%to 60.5%, and from 61.6% to 63.3% on the COCO-QA dataset. By using ResNet, the performance is further improved to 62.1% for VQA and 65.4% for COCO-QA.1.
JiasenLu∗,JianweiYang∗,DhruvBatra∗† ,DeviParikh∗† ∗Virginia Tech,†Georgia Institute of Technology {jiasenlu, jw2yang, dbatra, parikh}@vt.edu
Abstract
A number of recent works have proposed attention models for Visual Question Answering(VQA)thatgeneratespatialmapshighlightingimageregionsrelevantto answeringthequestion. Inthispaper,wearguethatinadditiontomodeling“where to look” or visual attention, it is equally important to model “what words to listen to” or question attention. We present a novel co-attention model for VQA that jointly reasons about image and question attention. In addition, our model reasons about the question (and consequently the image via the co-attention mechanism) in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN).Ourmodelimprovesthestate-of-the-artontheVQAdatasetfrom60.3%to 60.5%, and from 61.6% to 63.3% on the COCO-QA dataset. By using ResNet, the performance is further improved to 62.1% for VQA and 65.4% for COCO-QA.1.
arXiv:1606.00061v5 [cs.CV] 19 Jan 2017
VQA中的注意力模型一般是生成空间映射以突出问题(文本)和图像区域之间的关联关系。
本文提出了除了视觉注意力即"看那里", 问题注意力即"听哪个词"也同等重要。
本文针对VQA提出了一种“协同注意力”模型联合的推理图像和文字注意力。
此外,本文的模型通过1D CNN有层次的推理问题(通过协同注意力,也包括图像推理)。
协同注意力:在问题(文本)和图像之间有自然的对称性,即问题(文本)的表征可以用来引导图像注意力,反之图像的表征可以用来引导文本注意力
问题(文本)层次: 本文使用三个层面来协同关注问题(文本)和图像,分别是 词, 短语, 句子(问题)
词层面, 用词嵌入表示
短语层面,使用1-D CNN获取unigrams, bigrams, trigrams中的信息,具体是,本文对词层面的表征采用不同过滤器卷积,对ngrams的输出响应进行池化组合形成一个简单的
短语层面的表征
问题层面,本文使用RNN来编码整个问题(文本),
在这个架构中的问题(文本)表征每个层面,本文构建了连接问题(文本)和图像的协同注意力映射,随后它们递归的组合起来最终预测答案的分布。
两种协同注意力: parallel , alternating
0 0
- 论文笔记: Hierarchical Question-Image Co-Attention for Visual Question Answering
- Hierarchical Question-Image Co-Attention for Visual Question Answering
- Hierarchical Question-Image Co-Attention for Visual Question Answering
- 论文笔记:Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
- 论文研读--Stacked Attention Networks for Image Question Answering
- Stacked Attention Networks for Image Question Answering
- 论文笔记:Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answeri
- Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
- 论文笔记 :Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
- ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering
- 【论文笔记】Question Answering with Subgraph Embeddings
- A Context-aware Attention Network for Interactive Question Answering--阅读笔记
- 视觉问答(Visual Question Answering)论文初步整理
- 阅读笔记(Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding)
- Exploring Models and Data for Image Question Answering
- Dynamic Memory Networks for Visual and Textual Question Answering
- (reading)Revisiting Visual Question Answering Baselines
- Dynamic Coattention Networks For Question Answering
- 数据缓存机制
- IOS开发日记
- 机器学习的理解
- virtualenv安装并配置flask开发环境
- 通过JS语句判断WEB网站的访问端是电脑还是手机
- 论文笔记: Hierarchical Question-Image Co-Attention for Visual Question Answering
- Mabitis中#与$符号区别
- 中国质造代办
- 【Hibernate】lazy延迟加载
- php使用淘宝IP库接口获取 IP所属地和运营商
- 使用域名没法访问网站
- ios 项目目录结构
- Maven项目出现红叉,但是编译和运行都没错
- QT第一课_对话框小程序