Semantic Compositional Networks for Visual Captioning
来源:互联网 发布:centos ftp客户端 编辑:程序博客网 时间:2024/05/17 02:17
Semantic Compositional Networks for Visual Captioning
(Submitted on 23 Nov 2016 (v1), last revised 28 Mar 2017 (this version, v2))
A Semantic Compositional Network (SCN) is developed for image captioning, in which semantic concepts (i.e., tags) are detected from the image, and the probability of each tag is used to compose the parameters in a long short-term memory (LSTM) network. The SCN extends each weight matrix of the LSTM to an ensemble of tag-dependent weight matrices. The degree to which each member of the ensemble is used to generate an image caption is tied to the image-dependent probability of the corresponding tag. In addition to captioning images, we also extend the SCN to generate captions for video clips. We qualitatively analyze semantic composition in SCNs, and quantitatively evaluate the algorithm on three benchmark datasets: COCO, Flickr30k, and Youtube2Text. Experimental results show that the proposed method significantly outperforms prior state-of-the-art approaches, across multiple evaluation metrics.
Submission history
From: Zhe Gan [view email][v1] Wed, 23 Nov 2016 21:22:22 GMT (1235kb,D)
[v2] Tue, 28 Mar 2017 18:33:51 GMT (2051kb,D)
阅读全文
0 0
- Semantic Compositional Networks for Visual Captioning
- 论文笔记 DenseCap: Fully Convolutional Localization Networks for Dense Captioning
- DenseCap:Fully Convolutional Localization Networks for Dense Captioning
- Fully Convolutional Networks for Semantic Segmentation
- Fully Convolutional Networks for Semantic Segmentation
- Fully Convolutional Networks for Semantic Segmentation
- Convolutional Networks for Image Semantic Segmentation
- Fully Convolutional Networks for Semantic Segmentation
- Fully Convolutional Networks for Semantic Segmentation
- Fully Convolutional Networks for Semantic Segmentation
- Convolutional Networks for Image Semantic Segmentation
- 论文《Fully Convolutional Networks for Semantic Segmentation》
- Video captioning with recurrent networks based on frame- and video-level features and visual content
- 论文笔记之---DenseCap:Fully Convolutional Localization Networks for Dense Captioning
- 论文阅读笔记:Fully Convolutional Networks for Semantic Segmentation
- 论文笔记《Fully Convolutional Networks for Semantic Segmentation》
- 论文笔记《Fully Convolutional Networks for Semantic Segmentation》
- 论文阅读笔记:Fully Convolutional Networks for Semantic Segmentation
- 抢数游戏及其演绎
- eigenface
- [网络流24题]骑士共存问题 二分图/最大点权独立集
- C语言volatile关键字
- Git常用命令汇总
- Semantic Compositional Networks for Visual Captioning
- 实现select change 点击重复内容无法触发问题
- XML中输入特殊符号
- 解决:win10下VS2010中文输入法输入英文出现BUG
- 如何删除一条数据
- 理论与问题的关系
- sql注入攻击
- 大数据常识(一)
- 设计模式_原型模式