Weakly Supervised Dense Video Captioning
来源:互联网 发布:xampp安装教程 linux 编辑:程序博客网 时间:2024/06/05 02:11
https://arxiv.org/abs/1704.01502
(Submitted on 5 Apr 2017)
This paper focuses on a novel and challenging vision task, dense video captioning, which aims to automatically describe a video clip with multiple informative and diverse caption sentences. The proposed method is trained without explicit annotation of fine-grained sentence to video region-sequence correspondence, but is only based on weak video-level sentence annotations. It differs from existing video captioning systems in three technical aspects. First, we propose lexical fully convolutional neural networks (Lexical-FCN) with weakly supervised multi-instance multi-label learning to weakly link video regions with lexical labels. Second, we introduce a novel submodular maximization scheme to generate multiple informative and diverse region-sequences based on the Lexical-FCN outputs. A winner-takes-all scheme is adopted to weakly associate sentences to region-sequences in the training phase. Third, a sequence-to-sequence learning based language model is trained with the weakly supervised information obtained through the association process. We show that the proposed method can not only produce informative and diverse dense captions, but also outperform state-of-the-art single video captioning methods by a large margin.
0 0
- Weakly Supervised Dense Video Captioning
- Dense-Captioning Events in Videos
- Dense-Captioning Events in Videos
- READING NOTE: Weakly Supervised Cascaded Convolutional Networks
- 论文笔记:Weakly Supervised Deep Detection Networks
- Weakly Supervised Deep Detection Networks 阅读笔记
- Weakly Supervised Object Recognition with Convolutional Neural Networks
- 论文研读--Weakly Supervised Object Localization with Progressive Domain Adaptation
- Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning
- Weakly supervised object recognition with convolutional neural networks 论文解读
- Two-Phase Learning for Weakly Supervised Object Localization
- 《Deep Self-Taught Learning for Weakly Supervised Object Localization》
- 论文笔记 DenseCap: Fully Convolutional Localization Networks for Dense Captioning
- DenseCap:Fully Convolutional Localization Networks for Dense Captioning
- Video Captioning with Multi-Faceted Attention
- Video Captioning with Transferred Semantic Attributes
- Multimodal Memory Modelling for Video Captioning
- Deep Learning for Video Classification and Captioning
- maven搭建nexus私服在myeclipse中的使用
- Java单例模式
- 网络协议 TCP 和 UDP
- Hibernate增删改查
- Caffe的matlab接口的基本操作
- Weakly Supervised Dense Video Captioning
- 我国专利法规定的专利类型有三种:发明专利、实用新型专利、外观设计专利
- 计算机视觉基础1——视差与深度信息
- linux termios结构
- Oracle 某一列或者几列 检查在某个字符集合中
- Centos linux下安装7zip
- spark.ml.param.shared
- 索引的使用
- C/C++一维数组与指针