opencv3.0.0 识别表格

2024-03-06 23:48
文章标签 表格 识别 opencv3.0

本文主要是介绍opencv3.0.0 识别表格,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

转载地址:http://answers.opencv.org/question/63847/how-to-extract-tables-from-an-image/

As the others proposed finding the horizontal and vertical lines seems to be a nice way to go. Below you can find such a solution. In case you have any question feel free to ask, though I have added comments through my code so it should not be hard to follow.

#include <iostream>
#include <opencv2/opencv.hpp>using namespace std;
using namespace cv;int main()
{// Load source imagestring filename = "table.jpg";Mat src = imread(filename);// Check if image is loaded fineif(!src.data)cerr << "Problem loading image!!!" << endl;//    // Show source image
//    imshow("src", src);// resizing for practical reasonsMat rsz;Size size(800, 900);resize(src, rsz, size);imshow("rsz", rsz);// Transform source image to gray if it is notMat gray;if (rsz.channels() == 3){cvtColor(rsz, gray, CV_BGR2GRAY);}else{gray = rsz;}// Show gray imageimshow("gray", gray);// Apply adaptiveThreshold at the bitwise_not of gray, notice the ~ symbolMat bw;adaptiveThreshold(~gray, bw, 255, CV_ADAPTIVE_THRESH_MEAN_C, THRESH_BINARY, 15, -2);// Show binary imageimshow("binary", bw);

image description

    // Create the images that will use to extract the horizonta and vertical linesMat horizontal = bw.clone();Mat vertical = bw.clone();int scale = 15; // play with this variable in order to increase/decrease the amount of lines to be detected// Specify size on horizontal axisint horizontalsize = horizontal.cols / scale;// Create structure element for extracting horizontal lines through morphology operationsMat horizontalStructure = getStructuringElement(MORPH_RECT, Size(horizontalsize,1));// Apply morphology operationserode(horizontal, horizontal, horizontalStructure, Point(-1, -1));dilate(horizontal, horizontal, horizontalStructure, Point(-1, -1));
//    dilate(horizontal, horizontal, horizontalStructure, Point(-1, -1)); // expand horizontal lines// Show extracted horizontal linesimshow("horizontal", horizontal);

image description

    // Specify size on vertical axisint verticalsize = vertical.rows / scale;// Create structure element for extracting vertical lines through morphology operationsMat verticalStructure = getStructuringElement(MORPH_RECT, Size( 1,verticalsize));// Apply morphology operationserode(vertical, vertical, verticalStructure, Point(-1, -1));dilate(vertical, vertical, verticalStructure, Point(-1, -1));
//    dilate(vertical, vertical, verticalStructure, Point(-1, -1)); // expand vertical lines// Show extracted vertical linesimshow("vertical", vertical);

image description

    // create a mask which includes the tablesMat mask = horizontal + vertical;imshow("mask", mask);

image description

    // find the joints between the lines of the tables, we will use this information in order to descriminate tables from pictures (tables will contain more than 4 joints while a picture only 4 (i.e. at the corners))Mat joints;bitwise_and(horizontal, vertical, joints);imshow("joints", joints);

image description

    // Find external contours from the mask, which most probably will belong to tables or to imagesvector<Vec4i> hierarchy;std::vector<std::vector<cv::Point> > contours;cv::findContours(mask, contours, hierarchy, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE, Point(0, 0));vector<vector<Point> > contours_poly( contours.size() );vector<Rect> boundRect( contours.size() );vector<Mat> rois;for (size_t i = 0; i < contours.size(); i++){// find the area of each contourdouble area = contourArea(contours[i]);//        // filter individual lines of blobs that might exist and they do not represent a tableif(area < 100) // value is randomly chosen, you will need to find that by yourself with trial and error procedurecontinue;approxPolyDP( Mat(contours[i]), contours_poly[i], 3, true );boundRect[i] = boundingRect( Mat(contours_poly[i]) );// find the number of joints that each table hasMat roi = joints(boundRect[i]);vector<vector<Point> > joints_contours;findContours(roi, joints_contours, RETR_CCOMP, CHAIN_APPROX_SIMPLE);// if the number is not more than 5 then most likely it not a tableif(joints_contours.size() <= 4)continue;rois.push_back(rsz(boundRect[i]).clone());//        drawContours( rsz, contours, i, Scalar(0, 0, 255), CV_FILLED, 8, vector<Vec4i>(), 0, Point() );rectangle( rsz, boundRect[i].tl(), boundRect[i].br(), Scalar(0, 255, 0), 1, 8, 0 );}for(size_t i = 0; i < rois.size(); ++i){/* Now you can do whatever post process you want* with the data within the rectangles/tables. */imshow("roi", rois[i]);waitKey();}

image description image description

    imshow("contours", rsz);

image description

    waitKey();return 0;
}

Of course you will need to try it by yourself and apply any modifications that might be needed depending on your dataset. Enjoy ;-).

这篇关于opencv3.0.0 识别表格的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/781758

相关文章

Java Web实现类似Excel表格锁定功能实战教程

《JavaWeb实现类似Excel表格锁定功能实战教程》本文将详细介绍通过创建特定div元素并利用CSS布局和JavaScript事件监听来实现类似Excel的锁定行和列效果的方法,感兴趣的朋友跟随... 目录1. 模拟Excel表格锁定功能2. 创建3个div元素实现表格锁定2.1 div元素布局设计2.

Python中图片与PDF识别文本(OCR)的全面指南

《Python中图片与PDF识别文本(OCR)的全面指南》在数据爆炸时代,80%的企业数据以非结构化形式存在,其中PDF和图像是最主要的载体,本文将深入探索Python中OCR技术如何将这些数字纸张转... 目录一、OCR技术核心原理二、python图像识别四大工具库1. Pytesseract - 经典O

Python实现精准提取 PDF中的文本,表格与图片

《Python实现精准提取PDF中的文本,表格与图片》在实际的系统开发中,处理PDF文件不仅限于读取整页文本,还有提取文档中的表格数据,图片或特定区域的内容,下面我们来看看如何使用Python实... 目录安装 python 库提取 PDF 文本内容:获取整页文本与指定区域内容获取页面上的所有文本内容获取

Python基于微信OCR引擎实现高效图片文字识别

《Python基于微信OCR引擎实现高效图片文字识别》这篇文章主要为大家详细介绍了一款基于微信OCR引擎的图片文字识别桌面应用开发全过程,可以实现从图片拖拽识别到文字提取,感兴趣的小伙伴可以跟随小编一... 目录一、项目概述1.1 开发背景1.2 技术选型1.3 核心优势二、功能详解2.1 核心功能模块2.

Python验证码识别方式(使用pytesseract库)

《Python验证码识别方式(使用pytesseract库)》:本文主要介绍Python验证码识别方式(使用pytesseract库),具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全... 目录1、安装Tesseract-OCR2、在python中使用3、本地图片识别4、结合playwrigh

使用C#删除Excel表格中的重复行数据的代码详解

《使用C#删除Excel表格中的重复行数据的代码详解》重复行是指在Excel表格中完全相同的多行数据,删除这些重复行至关重要,因为它们不仅会干扰数据分析,还可能导致错误的决策和结论,所以本文给大家介绍... 目录简介使用工具C# 删除Excel工作表中的重复行语法工作原理实现代码C# 删除指定Excel单元

使用Python实现网页表格转换为markdown

《使用Python实现网页表格转换为markdown》在日常工作中,我们经常需要从网页上复制表格数据,并将其转换成Markdown格式,本文将使用Python编写一个网页表格转Markdown工具,需... 在日常工作中,我们经常需要从网页上复制表格数据,并将其转换成Markdown格式,以便在文档、邮件或

Python实现pdf电子发票信息提取到excel表格

《Python实现pdf电子发票信息提取到excel表格》这篇文章主要为大家详细介绍了如何使用Python实现pdf电子发票信息提取并保存到excel表格,文中的示例代码讲解详细,感兴趣的小伙伴可以跟... 目录应用场景详细代码步骤总结优化应用场景电子发票信息提取系统主要应用于以下场景:企业财务部门:需

Python实现获取带合并单元格的表格数据

《Python实现获取带合并单元格的表格数据》由于在日常运维中经常出现一些合并单元格的表格,如果要获取数据比较麻烦,所以本文我们就来聊聊如何使用Python实现获取带合并单元格的表格数据吧... 由于在日常运维中经常出现一些合并单元格的表格,如果要获取数据比较麻烦,现将将封装成类,并通过调用list_exc

使用Python和PaddleOCR实现图文识别的代码和步骤

《使用Python和PaddleOCR实现图文识别的代码和步骤》在当今数字化时代,图文识别技术的应用越来越广泛,如文档数字化、信息提取等,PaddleOCR是百度开源的一款强大的OCR工具包,它集成了... 目录一、引言二、环境准备2.1 安装 python2.2 安装 PaddlePaddle2.3 安装