pdf转txt

cooolr 于 2022-07-13 发布

思路:

  1. pdf转image

     # sudo apt-get install poppler-utils
     # sudo pip3 install pdf2image
        
     from pdf2image import convert_from_path
     from pdf2image.exceptions import (
      PDFInfoNotInstalledError,
      PDFPageCountError,
      PDFSyntaxError
     )
        
     pdf_path = "D:/codespace/2022-07/1.pdf"
     images = convert_from_path(pdf_path)
     for i, image in enumerate(images):
         fname = "input_images/image" + str(i) + ".png"
         image.save(fname, "PNG")
    
  2. image ocr转txt