Results 1 to 4 of 4
  1. #1
    Join Date
    Oct 2010
    Posts
    3
    Rep Power
    0

    Default how to obtain html source of this page

    hello
    this simple program obtains all utf-8 web pages i examined correctly except this one.
    when i enter it`s url in View HTTP Request and Response Header charset is utf-8 but my program doesn`t show correct characters.
    can anybody help me?

    Java Code:
    package simpleapp;
    
    public class Main {
    
        public static void main(String[] args) {
            net myNET = new net();
            StringBuffer str = myNET.get_content("http://old.tsetmc.com/Loader.aspx");
        }
    }
    
    package simpleapp;
    
    import java.net.URL;
    import java.net.URLConnection;
    import java.net.MalformedURLException;
    import java.io.InputStream;
    import java.io.IOException;
    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    
    public class net {
    
        public StringBuffer get_content(String url){
    
            StringBuffer str = new StringBuffer();
    
            try{
                URL myURL = new URL(url);
                URLConnection myConnection = myURL.openConnection();
                InputStream in = myConnection.getInputStream();
                BufferedReader myStream = new BufferedReader(new InputStreamReader(in,"utf-8"));
                int ch;
    
                while((ch = myStream.read()) != -1){
                    str.append((char)ch);
                }
                System.out.print(str);
            }
            catch(MalformedURLException e){
                e.printStackTrace();
            }
            catch(IOException e){
                e.printStackTrace();
            }
            return str;
        }
    }
    Last edited by ali zi zeperto; 10-09-2012 at 10:31 AM.

  2. #2
    Tolls is online now Moderator
    Join Date
    Apr 2009
    Posts
    11,846
    Rep Power
    19

    Default Re: how to obtain html source of this page

    Where are you printing out to?
    What character sets does that environment support?
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  3. #3
    Join Date
    Oct 2010
    Posts
    3
    Rep Power
    0

    Default Re: how to obtain html source of this page

    Quote Originally Posted by Tolls View Post
    Where are you printing out to?
    What character sets does that environment support?
    i print the result in netbeans

  4. #4
    Tolls is online now Moderator
    Join Date
    Apr 2009
    Posts
    11,846
    Rep Power
    19

    Default Re: how to obtain html source of this page

    Actually I just spotted the problem.
    Java Code:
                while((ch = myStream.read()) != -1){
                    str.append((char)ch);
                }
    That reads a single byte.
    UTF-8 characters can be up to 4 bytes long.
    So you're completely mucking up any non-ASCII characters there.
    Use the readLine() method of the BufferedReader instead.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

Similar Threads

  1. how to obtain manage bean property from other jsp page
    By nikhil_me in forum JavaServer Faces (JSF)
    Replies: 0
    Last Post: 06-11-2012, 03:41 AM
  2. Page source not the same : getInputStream() vs Ctrl+U
    By FunkyProg in forum New To Java
    Replies: 6
    Last Post: 04-05-2011, 01:50 AM
  3. Web : frame source, not page source
    By FunkyProg in forum New To Java
    Replies: 0
    Last Post: 03-30-2011, 12:49 AM
  4. Page Source
    By fawkes in forum Networking
    Replies: 0
    Last Post: 03-24-2009, 06:06 PM
  5. Can we Obtain Java Source Code?
    By tornado in forum New To Java
    Replies: 7
    Last Post: 12-10-2008, 07:23 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •