Results 1 to 2 of 2
  1. #1
    zweibieren is offline Senior Member
    Join Date
    Aug 2009
    Location
    Pittsburgh, PA
    Posts
    284
    Rep Power
    5

    Default PDFBox PDPage convertToText produces all white image

    My program needs images from PDF files. It works for some files, but produces all-white images for others. How can I fix this? My PDFBox is version 1.7.1. I am using Netbeans 7.2.1 with jdk1.7.0_21, all on Win8.

    {I have upgraded to PDFbox 1.8.1. The failure persists. For both versions there are documents for which correct images are produced.}

    The simple version of the program is

    Java Code:
    package com.physpics.test;
    import java.io.File;
    import javax.imageio.ImageIO;
    import org.apache.pdfbox.pdmodel.PDPage;
    import org.apache.pdfbox.pdmodel.PDDocument;
    
    /** * Read a PDF and write a PNG for its first page. */
    public class PDFtoPNG {
        public static void main(String[] args) throws Exception {
            PDDocument document = PDDocument.load(new File("page4.pdf"));
            java.util.List allPages = document.getDocumentCatalog().getAllPages();
            PDPage pdfPage = (PDPage)allPages.get(0);
            ImageIO.write(pdfPage.convertToImage(), "PNG", new File("page4.png")); 
            document.close();
        }
    }
    A sample of a failing PDF is attached. It shows fine on Adobe Reader.
    Its output is simply all zeros. ImageMagick's identify says it is
    Java Code:
    identify -verbose page4.png
    Image: page4.png
      Format: PNG (Portable Network Graphics)
      Class: DirectClass
      Geometry: 1224x1584+0+0
    ...
      Colors: 1
      Histogram:
       1938816: (255,255,255) #FFFFFF white
    ...
      Page geometry: 1224x1584+0+0
    Attached Files Attached Files
    Last edited by zweibieren; 05-31-2013 at 08:24 PM.

  2. #2
    zweibieren is offline Senior Member
    Join Date
    Aug 2009
    Location
    Pittsburgh, PA
    Posts
    284
    Rep Power
    5

    Default Re: PDFBox PDPage convertToText produces all white image

    One of the documents that fails uses PDF type 3 fonts. In PDFBox it turns out these are not really implemented. The other document has no type 3 fonts, so I don't know why it fails. but I can't afford the time to look right now.

Similar Threads

  1. Scaling an image using PdfBox
    By gayasubbu in forum Advanced Java
    Replies: 1
    Last Post: 12-18-2012, 07:44 PM
  2. Replies: 4
    Last Post: 04-17-2012, 09:49 PM
  3. pdfbox to overwrite image on a pdf
    By vammpiro in forum Advanced Java
    Replies: 1
    Last Post: 03-09-2012, 03:54 PM
  4. Replies: 2
    Last Post: 06-29-2011, 09:08 AM
  5. PDFBox: Problem with converting pdf page into image
    By artfhc in forum Advanced Java
    Replies: 1
    Last Post: 01-03-2011, 07:37 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •