Ocr with abbyyR
来源:互联网 发布:阿里云服务器怎么重装 编辑:程序博客网 时间:2024/06/04 19:39
下载package
install.packages("abbyyR")## devtools::install_github('soodoku/abbyyR')library(abbyyR)# 设置工程目录project.dir <- "E:/Temp/"rm(list = ls())
抓取图片
library(rvest)library(dplyr)url = "http://www.c-gec.cn/a/zuixinhuodong/2017/0803/2531.html"imageUrl <- read_html(url) %>% html_nodes(xpath = "//div[@id='entrybody']/div/img/@src") %>% html_textfor(item in imageUrl){ curl_download(item, destfile=paste0(getwd(),"/raw/",basename(item)))}
建立App
首先要在http://ocrsdk.com/建立app应用,获得Id和password
setapp(c("ROcrApp1", "63WTSkZa8OZu2fQGqNh*****"))getAppInfo()
清空App空间
all_tasks <- listTasks()for (i in 1:nrow(all_tasks)) deleteTask(as.character(all_tasks$id[i]))
监控提交任务
filename <- paste0(project.dir,dir(project.dir, recursive=TRUE))library(progress)pb <- progress_bar$new(format = " downloading [:bar] :percent\n", total = length(filename), clear = FALSE, width= 60)tracker <- data.frame(filename=NA, taskid=NA)# Loopj <- 1for(file in filename){ print(file) tracker[j,] <- c(basename(file), as.character(abbyyR::submitImage(file_path=file)$id)) j <- j + 1 # Prg. bar pb$tick() Sys.sleep(1/100)}
执行Ocr
for (i in 1:nrow(tracker)) processDocument(tracker$taskid[i], language="ChinesePRC", profile="documentConversion", exportFormat="xlsx")
任务状态
i <- 1while(TRUE){ i <- nrow(listFinishedTasks()) if (i == length(filename)){ print("All Done!") break; } Sys.sleep(2)}
下载文件
finishedlist <- listFinishedTasks() %>% mutate(status = as.character(status)) %>% filter(status == "Completed")results <- merge(tracker, finishedlist, by.x="taskid", by.y="id")library(curl)setwd(project.dir)for(i in 1:nrow(results)){ print(i) curl_download(as.character(results$resultUrl[i]), destfile=paste0(getwd(),"/res/",sub(".png","",results$filename[i]),".xlsx"))}
Ocr识别效果
原始图
识别结果
阅读全文
0 0
- Ocr with abbyyR
- The difference with OCR and OLR
- OCR
- OCR
- OCR
- OCR
- OCR
- OCR
- OCR
- Recover Corrupt/Missing OCR with No Backup - (Oracle 10g)
- 【论文笔记】Recursive Recurrent Nets with Attention Modeling for OCR in the Wild
- CRS-0184: Cannot communicate with the CRS daemon.----一步步解决(OCR问题)
- 进入OCR。爱上OCR。
- OCR应用:OCR识图取字
- 决战OCR
- OCR 基本知识
- 什么是OCR?
- 什么是OCR
- UVALive 4998 浅谈欧拉定理优化数位动态规划求解高阶同余方程
- Maven相关配置和概念
- Qt中UI::MainWindow与MainWindow
- Git-删除文件和还原(笔记)
- Material Design整理(八)——TabLayout
- Ocr with abbyyR
- Direct3D绘制流水线
- 解决POI解析excel出现内存溢出问题
- JavaScript实现幻灯片效果
- 使用Spring Cloud构建统一配置中心
- window上如何通过tasklist杀死进程
- Android Studio混淆模板及常用第三方混淆(看了都说好)
- 最全的常用正则表达式
- Git二进制文件冲突解决