Scantools is a library and a matching set of command line applications for graphics manipulation, written with a view towards the handling of scanned documents and generation of high-quality PDF files.

At present, the library can convert image files to PDF/A. Files in JBIG2, JPEG and JPEG2000 format are directly included into the PDF, other files are compressed in a lossless manner. HOCR files, which are produced by optical character recognition programs such as ‘tesseract’, can be used to make the PDF file searchable. The resulting files comply with the ISO PDF/A standard for long-term archiving of digital documents and offer compression rates comparable to that of the DJVU file format.

Command line utilities

There are currently three command line utilities.

  • image2pdf, converts images to a PDF/A compliant PDF file.
  • hocr2any, converts HOCR files to text, or renders them as raster graphics or PDF files.
  • ocrPDF, adds a text layer to a graphics-only PDF file, without re-encoding graphics data or otherwise modifying file content.

Author

The Scantools library has been written by Stefan Kebekus at the University of Freiburg, Germany.

License

The program is open source and available for free. It is licensed under the GNU Public License v3, or any later version of the GNU Public License. If you use images created with this program in your publication, we would be glad if you could include a reference to this site.

  • An encoder for the JBIG2 file format is found here.