Results 1 to 2 of 2
  1. #1
    Hellsing is offline Member
    Join Date
    Nov 2012
    Posts
    2
    Rep Power
    0

    Default reading a pdf file using httpClient and itextpdf

    Hi,

    I am trying to write a java application that can read part of a pdf document online (or just for testing purposes, reading from my wamp server). For the most part, it all works but the pdfreader doesn't do the conversion well and prints symbols

    Here is the code I was using:

    Java Code:
    import org.apache.http.client.*;
    import org.apache.http.client.methods.*;
    import org.apache.http.impl.client.BasicResponseHandler;
    import org.apache.http.impl.client.DefaultHttpClient;
    import org.apache.http.client.ResponseHandler;
    import com.itextpdf.text.Document;
    import com.itextpdf.text.pdf.PdfImportedPage;
    import com.itextpdf.text.pdf.PdfReader;
    import com.itextpdf.text.pdf.parser.PdfTextExtractor;
    import java.io.File;
    import java.lang.System;
    
    
    
    //NOTE:
    
    
    
    
    import java.io.*;
    
    import javax.xml.ws.http.HTTPException;
    
    public class HttpConn {
    	
    private static String url = "http://127.0.0.1/showme/IJERTV1IS3191.pdf";
    	
    	
    	public static void main(String[] args) {
    		
    		
    		
    		System.getProperties();
    		// Create an instance of HttpClient
    		HttpClient client = new DefaultHttpClient();
    		
    		// Create a method instance
    		HttpGet method = new HttpGet(url);
    		
    		//Provide custom retry handler is necessary
    		//:OC:method.getParams().setParameter(HttpConnectionParams.RETRY_HANDLER, new DefaultHttpRequestRetryHandler(3, false));
    		
    		
    		try{
    		//execute the method
    			
    			
    			System.out.println("executing request " + method.getURI());
    
                // Create a response handler
                ResponseHandler<String> responseHandler = new BasicResponseHandler();
                String responseBody = client.execute(method, responseHandler);
                
                PdfReader reader = new PdfReader(responseBody);
                int n = reader.getNumberOfPages();
                
                String str=PdfTextExtractor.getTextFromPage(reader, 2); //Extracting the content from a particular page.
                          System.out.println(str);
                     reader.close();
               // System.out.println("----------------------------------------");
              //  System.out.println(responseBody);
                //System.out.println("----------------------------------------");
    			
                //HttpResponse response = new BasicHttpResponse(HttpVersion.HTTP_1_1, HttpStatus.SC_OK, "OK");
                //System.out.println(response.getProtocolVersion());
                //System.out.println(response.getStatusLine().getStatusCode());
                //System.out.println(response.getStatusLine().getReasonPhrase());
                //System.out.println(response.getStatusLine().toString());
                
    			
    			} catch (HTTPException e) {
    				//protocol and transport protocol issues 
    					
    					System.err.println("Fatal protocol violation: " + e.getMessage());
    					e.printStackTrace();
    					
    				} catch (IOException e) {
    				
    					System.err.println("Fatal transport error: " + e.getMessage());
    					e.printStackTrace();
    				} finally {
    				
    					//release the connection
    					method.releaseConnection();
    				
    		}
    	}
    
    
    }

  2. #2
    Hellsing is offline Member
    Join Date
    Nov 2012
    Posts
    2
    Rep Power
    0

    Default Re: reading a pdf file using httpClient and itextpdf

    Hi all,

    tried something different. tried converting the inputstream to a byte array which i could pass to the pdfreader but i get no output

    Java Code:
    import java.net.*;
    import java.io.*;
    import com.itextpdf.text.Document;
    import com.itextpdf.text.pdf.PdfImportedPage;
    import com.itextpdf.text.pdf.PdfReader;
    import com.itextpdf.text.pdf.parser.PdfTextExtractor;
    import org.apache.commons.io.*;
    
    public class urlconn {
        public static void main(String[] args) throws Exception {
            URL oracle = new URL("http://127.0.0.1/showme/dss.pdf");
            URLConnection yc = oracle.openConnection();
            yc.setDoInput(true);
            
            InputStream is = yc.getInputStream();
            byte[] b = new byte[2028];
            b =IOUtils.toByteArray (yc.getInputStream());
            int len;
            while ((len = is.read(b)) != -1) {
            	
            	 PdfReader reader = new PdfReader(b);
            	 String str=PdfTextExtractor.getTextFromPage(reader, 1);
            	 System.out.println(str);
            }
           
        }
    }

Similar Threads

  1. Replies: 0
    Last Post: 07-20-2011, 09:43 PM
  2. httpClient API to Download a xls file.
    By mpahlenig in forum New To Java
    Replies: 1
    Last Post: 01-13-2010, 04:31 AM
  3. Using HttpClient
    By jdetloff in forum New To Java
    Replies: 4
    Last Post: 01-06-2010, 11:43 AM
  4. Replies: 9
    Last Post: 10-20-2009, 11:52 AM
  5. Replies: 3
    Last Post: 05-10-2009, 12:31 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •