自然图像里的文本检测 数据库 网址收集
来源:互联网 发布:淘宝卖纸箱 编辑:程序博客网 时间:2024/05/28 14:56
http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/
Char74k dataset
In this dataset, symbols used in both English and Kannada are available.
In the English language, Latin script (excluding accents) and Hindu-Arabic numerals are used. For simplicity we call this the "English" characters set. Our dataset consists of:
- 64 classes (0-9, A-Z, a-z)
- 7705 characters obtained from natural images
- 3410 hand drawn characters using a tablet PC
- 62992 synthesised characters from computer fonts
http://openresearch.baidu.com/activitybulletin/618.jhtml
Robust Reading Datasets
These datasets were collected and tagged by the ICDAR 2003 Robust Reading Dataset Collection Team ( photo. Clockwise from left: Shirley Wong, Simon Lucas, Alex Panaretos, Luis Sosa Velazquez, Robert Young, Anthony Tang.)
The datasets are organized into Sample , Trial and Competition datasets.
Sample datasets are provided to give you a quick impression of the data, and also to allow function testing of your software. That is, you can run tests on the sample data to check that your software works with the data, but the results won't mean much.
Trial datasets serve two purposes. Use them to get results for your ICDAR 2003 papers. For this purpose, they are partitioned into two sets: TrialTrain and TrialTest. Use TrialTrain to train or tune your algorithms, then quote results on TrialTest. For the competitions, you should train/tune your system on the entire Trial set.
Competition datasets will be used to measure the performance of your algorithms for the competitions. These will be kept private until the ICDAR 2003 conference, when they will be made public.
Robust Reading and Text Locating
Each dataset is provided as a zip file, and contains a set of JPEG scene images, and three XML tag files: locations.xml, words.xml and segmentation.xml.
locations.xml is for the Text Locating problem, and contains the path to each image and the set of rectangles for each image.
words.xml is for the Robust Reading competition - this tags each image with the bouding rectangles of each word in the image together with the text in each rectangle.
segmentation.xml - like words.xml, except that each word is also given its segmentation points - just in case this information is useful to your algorithm (e.g. may be used to speed up EM).
Sample (20 images, 7.3mb)
TrialTrain (258 images, 43.3mb)
TrialTest (251 images, 69.6mb)
icdar 2005/2007/2009针对robust reading 和 text locating 没有给出新的数据库,使用的仍是icdar2003给出的数据库。
网址如下:
http://algoval.essex.ac.uk:8080/icdar2005/index.jsp?page=intro.html
http://www.informatik.uni-trier.de/~ley/db/conf/icdar/icdar2007.html
http://www.cvc.uab.es/icdar2009/
icdar 2011 网址:给出了新的数据库,但是打不开,不过icdar 2013 给出的是更新后的icdar 2011 数据库
http://robustreading.opendfki.de/wiki/SceneText#TrainingDataset
Training Dataset
Training data is available now on following links:
Training Data Text Localization http://www.dfki.uni-kl.de/~shahab/robustreading/train-textloc.zipTraining Data Word Recognition http://www.dfki.uni-kl.de/~shahab/robustreading/train-wordrec.zip
Test Dataset
Test data is available now on following links:
Test Data Text Localization http://www.dfki.uni-kl.de/~shahab/robustreading/test-textloc.zipTest Data Word Recognition http://www.dfki.uni-kl.de/~shahab/robustreading/test-wordrec.zip
Test Data Text Localization + Ground Truth http://www.dfki.uni-kl.de/~shahab/robustreading/test-textloc-gt.zip
Test Data Word Recognition + Ground Truth http://www.dfki.uni-kl.de/~shahab/robustreading/test-wordrec-gt.zip
- 自然图像里的文本检测 数据库 网址收集
- 自然图像里的文本检测和识别2010-2014年论文汇总
- 新收集整理的图像处理网址
- 收集图像的资料网址ing
- php 过滤文本里的url网址
- 自然场景下植物图像的基元检测
- 图像处理网址收集
- 【推荐】大规模的自然场景文字检测与识别数据库
- 【推荐】大规模的自然场景文字检测与识别数据库
- 收集的一些目标检测、跟踪、识别标准测试视频集和图像数据库
- 我收集的一些目标检测、跟踪、识别标准测试视频集和图像数据库
- 自然场景文本识别:基于笔画宽度变换的文本检测
- 自然场景文本识别:基于笔画宽度变换的文本检测
- 【调研】特殊网络图像的敏感词检测,图像文本检测提取
- 利用SynthText生成自然场景文本检测数据集
- 收集的网址
- 我收集的网址
- 收集的开发网址
- nyoj 86 找球号
- Android中ListView之SimpleAdapter的使用
- 使用数组模拟链表
- CRC make error
- 杭电 1233
- 自然图像里的文本检测 数据库 网址收集
- 20141222
- 编译linux 内核
- BZOJ 3809 Gty的二逼妹子序列 莫队算法+分块
- Linux C语言字符串操作函数
- Struts文件上传allowedTypes问题,烦人的“允许上传的文件类型”
- leetcode---Valid Number
- Android彩信加载附件过程
- Animation补间动画