image2pdf

The program image2pdf converts JBIG2 and other images to PDF, without re-encoding the image data.

Output format

The output files of image2pdf comply with the ISO PDF/A standard for long-term archiving of digital documents.

Metadata, which is mandatory in PDF/A files, is stored both as embedded XMP and as a PDF info directory. The PDF info directory, which is optional in the PDF/A standard, might be removed in future versions of this program. For now, info directories are included because many current PDF handling programs do not interpret XMP data correctly.

Frequently asked questions

Isn’t there a similar program ‘pdf.py’ by Google, which is included in the jbig2enc package?

There is. The main differences between ‘pdf.py’ and image2pdf are the following.

  • The program ‘image2pdf’ produced PDF/A compliant output files.

  • The output of ‘image2pdf’ is, for all I know, correct. PDF-Handling tools such as ‘qpdf’ often report issues in files created ‘pdf.py’, even if those files render without problems in PDF viewing software.

  • JBIG2 files created by Google’s ‘jbig2enc’ often contain issues –there are segements whose ‘retention bits’ are set wrongly. The result, PDF files created out of these files do not pass PDF/A compliance tests. The program ‘image2pdf’ corrects these issues before creating the PDF file.

When I add JPG2000 files to a PDF, they get converted to other bitmap formats and the size increases dramatically

Sadly, there are many known issues with JPG2000. One issue is that there are two closely related file formats that are commonly referred to as “JPG2000”, namely “JP2” and “JPX”. According to the PDF/A standard, only JPX file can be embedded into PDFs. So, if you ask scantools to insert a “JP2” file into your PDF, it needs to be converted (losslessly) to another file format, hence the increase in size. To fix the issue, convert your “JP2” into “JPX”. If you do not care too much about compliance with the PDF/A standard, it might help to simply rename your file to “something.jpx”.

When I add a JPX image to PDF, the image2pdf utility refuses to perform OCR

This happens on systems where the Qt library is built without support for JPG2000, most notably on Ubuntu systems. As a workaround, you might wish to use image2pdf with the “–no-ocr” option, and use the command line utility ocrPDF to perform OCR afterwards. The resulting PDF/A file is nearly identical to the one you would have obtained with image2pdf. Alternatively, use a Linux/Fedora system where Qt has JPG2000 support.

I have been asked why I didn’t include JPG2000 support in scantools directly, rather than relying on Qt. Sadly, the answer is that I found all the existing JPG2000 implementations (most notably, the japser and openjpeg libraries) so buggy and so poorly documented that I felt they should not be included as part of a quality software package.