OCR PDF on Windows

This turns out to be fairly easy.  
  1. Install Tesseract
  2. Convert your PDF to TIFF (Print > Fax > check option print to file and save as TIFF)
    OR install Ghostscript, and convert using the command line (I installed the 64 bit version):
    gswin64c -r300x300 -o out.tif -sDEVICE=tiffg4 in.pdf
  3. Convert the PDF: `tesseract input.tif out -l deu pdf` (skip the `-l deu` bit if it’s not in German; out is the name of the output file, the extension, e.g. pdf, will be appended)

Done.  You can even get Google to translate the output.

Leave a Reply

Your email address will not be published. Required fields are marked *