OCR文字识别

cooolr 于 2022-07-13 发布

easyocr

纯python项目,识别一张图片需要3秒左右

pip install easyocr
import easyocr
reader = easyocr.Reader(['ch_sim','en'])
result = reader.readtext('chinese.jpg')

paddleocr

底层c++实现,识别一张图片0.5秒内

pip install paddleocr common dual data prox tight inference
pip install paddle
pip install paddlepaddle

# 检查libstdc++.so版本,没有1.3.11版本的需要编译安装GCC
# https://blog.csdn.net/EI__Nino/article/details/100086157
strings /usr/lib64/libstdc++.so.6|grep CXXABI
yum install gmp-devel mpmr-devel libmpc-devel -y
wget ftp://ftp.gnu.org:21/gnu/gcc/gcc-9.2.0/gcc-9.2.0.tar.gz
tar zxvf gcc-9.2.0.tar.gz
cd gcc-9.2.0
./configure --disable-multilib --enable-languages=c,c++ --prefix=$HOME/local
make -j10
make -j install
# 检查libstdc++.so版本
strings $HOME/local/lib64/libstdc++.so.6|grep CXXABI
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='ch')
result = ocr.ocr('chinese.jpg', cls=True)