Results 1 to 20 of 20
  1. #1
    couling is offline Member
    Join Date
    Nov 2010
    Posts
    54
    Rep Power
    0

    Default Readline from an input stream and nothing more.

    Hi All

    I'm trying to achieve something quite specific and at the moment it looks like I'm going to have to code this myself from scratch, but I wanted to run it past people first to see if they know a way to do this from standard J2SE classes.

    I need a method which will:
    1. Read a line of text (as a string) from an input stream
    2. Ideally using any supported charset (those obtainable as a Charset object) but at minimum UTF-8
    3. Return null or throw an exception if the input line is longer than a specified charicter length
    4. Read no more bytes from the input stream than are required for the line. This point is important for two reasons:
      1. The bytes following the bytes for the new line charicter may be raw binary and not conform to any charset.
      2. The InputStream may block indefanately if all the bytes for a line have already been read and the readline method then tries to read more


    Obveously I've considdered a BufferedReader but this doesnt come close to requirement.

    Thanks for your time.
    ----Signature ----
    Please use [CODE] tags and indent correctly. It really helps when reading your code.

  2. #2
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,728
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by couling View Post
    I'm trying to achieve something quite specific and at the moment it looks like I'm going to have to code this myself from scratch, but I wanted to run it past people first to see if they know a way to do this from standard J2SE classes.

    I need a method which will:
    1. Read a line of text (as a string) from an input stream
    2. Ideally using any supported charset (those obtainable as a Charset object) but at minimum UTF-8
    3. Return null or throw an exception if the input line is longer than a specified charicter length
    4. Read no more bytes from the input stream than are required for the line. This point is important for two reasons:
      1. The bytes following the bytes for the new line charicter may be raw binary and not conform to any charset.
      2. The InputStream may block indefanately if all the bytes for a line have already been read and the readline method then tries to read more


    Obveously I've considdered a BufferedReader but this doesnt come close to requirement.
    It takes a bit of programming: read single bytes from an InputStream and store them in a byte array. Stop reading when the end of line character(s) have been read or the array is full. Convert your byte array to a String using a certain decoding (e.g. UTF-8). An InputStreamReader comes to mind, wrapped around a ByteArrayInputStream for that purpose.

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  3. #3
    couling is offline Member
    Join Date
    Nov 2010
    Posts
    54
    Rep Power
    0

    Default

    Yer I thought I'd have to code it myself. :rolleyes:

    I'm not famular with the ByteArrayInputStream. What additional benafit does wrapping my input stream in one offer?

    One of the problems I'm facing is that decoding (bytes -> chars) using a Charset object looks to be something which is done on a buffer, not charicter by charicter. As mentioned in my OP this has the danger of throwing an exception because of an attempt to decode data which is not in any charicter set (let alone the one I'm trying to decode). The marker between charicter data and raw binary data lies in decoded charicter data, not directly in the binary data.

    Is there anything in the specification of the InputStreamReader to suggest it will not fall foul of this?

    Thanks
    Last edited by couling; 05-26-2011 at 03:44 PM.
    ----Signature ----
    Please use [CODE] tags and indent correctly. It really helps when reading your code.

  4. #4
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,728
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by couling View Post
    Yer I thought I'd have to code it myself. :rolleyes:

    I'm not famular with the ByteArrayInputStream. What additional benafit does wrapping my input stream in one offer?

    One of the problems I'm facing is that decoding (bytes -> chars) using a Charset object looks to be something which is done on a buffer, not charicter by charicter. As mentioned in my OP this has the danger of throwing an exception because of an attempt to decode data which is not in any charicter set (let alone the one I'm trying to decode).

    Is there anything in the specification of the InputStreamReader to suggest it will not fall foul of this?
    If you have read the bytes in a byte array, create a ByteArrayInputStream (using that array) and wrap it in an InputStreamReader with a certain en/decoding. If the decoding fails, what are you to do but let the InputStreamReader throw its Exception?

    edit: another approach would be to read your partial text file with a BufferedReader with a buffer size of one.

    kind regards,

    Jos
    Last edited by JosAH; 05-26-2011 at 07:25 PM.
    cenosillicaphobia: the fear for an empty beer glass

  5. #5
    ra4king's Avatar
    ra4king is offline Senior Member
    Join Date
    Apr 2011
    Location
    Atlanta, Georgia, US
    Posts
    396
    Rep Power
    4

    Default

    You don't have to code anything by yourself. Just wrap your InputStream in a BufferedReader or a Scanner and if "line.length() > requiredLength" throw an Exception.

  6. #6
    couling is offline Member
    Join Date
    Nov 2010
    Posts
    54
    Rep Power
    0

    Default

    No, that wont work... The over-reads are a real problem....

    This isnt the protocall I'm working with but, take HTTP for an example. A response might be something along the lines of:
    Java Code:
    HTTP 1/1 200 Ok<cr><lf>
    SomeHeader: SomeValue<cr><lf>
    <cr><lf>
    <raw data ...>
    In this context you need to interpret the header (everything up to and including the last <lf>) as plain text. The following data can be a raw file (eg: jpeg data).

    In HTTP this isnt a problem because everything in the header is in ASCII (according to the standard). So charicters and bytes map 1 to 1 and there is no translation between them. That is... you can read bytes as chars. The protocall I'm trying to work with doesnt have such a limitation.

    Another way to view the problem is to say I need to be able to reverse map char count to byte count. When reading a line I need to read up to and including the new line charicter(s) and know precisely how many bytes that represents. (All I have to do then is buffer.. mark, rewind and skip).
    ----Signature ----
    Please use [CODE] tags and indent correctly. It really helps when reading your code.

  7. #7
    couling is offline Member
    Join Date
    Nov 2010
    Posts
    54
    Rep Power
    0

    Default

    Love the idea of a bufferedreader with a size of 1. Sadly the test failed... see code and output:
    Java Code:
    import java.io.BufferedReader;
    import java.io.FileInputStream;
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.InputStreamReader;
    
    
    public class Test {
    	public static void main(String args[]) throws IOException {
    		BufferedReader r = new BufferedReader(new InputStreamReader(new TestInputStream(new FileInputStream("c:\\pound.txt"))),1);
    		System.out.println(r.readLine());
    		
    	}
    }
    
    class TestInputStream extends InputStream {
    	private final InputStream input;
    	
    	public void close() throws IOException {
    		input.close();
    	}
    	
    	public TestInputStream(InputStream input) {
    		this.input = input;
    	}
    	
    	public int read() throws IOException {
    		System.out.println("read()");
    		return input.read();
    	}
    	
    	public int read(byte [] b) throws IOException {
    		System.out.print("reader(");
    		System.out.print(b.length);
    		System.out.print(") returned ");
    		int x = input.read(b);
    		System.out.println(x);
    		return x;
    	}
    	
    	public int read(byte [] b, int off, int len) throws IOException {
    		System.out.print("reader(");
    		System.out.print(b.length);
    		System.out.print(",");
    		System.out.print(off);
    		System.out.print(",");
    		System.out.print(len);
    		System.out.print(") returned ");
    		int x = input.read(b, off, len);
    		System.out.println(x);
    		return x;
    	}
    	
    	public void reset() throws IOException {
    		System.out.println("reset()");
    		input.reset();
    	}
    	
    	public void mark(int x) {
    		System.out.print("mark(");
    		System.out.print(x);
    		System.out.println(")");
    		input.mark(x);
    	}
    	
    	public void skip(int x) throws IOException {
    		System.out.print("skip(");
    		System.out.print(x);
    		System.out.println(")");
    		input.skip(x);
    	}
    
    }
    reader(8192,0,8192) returned 13

    [code]
    It read 8192 bytes into an array of 8192 bytes doh!
    ----Signature ----
    Please use [CODE] tags and indent correctly. It really helps when reading your code.

  8. #8
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,728
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by couling View Post
    reader(8192,0,8192) returned 13

    [code]
    It read 8192 bytes into an array of 8192 bytes doh!
    No it didn't, your array is 8192 bytes long and the read method is allowed to read 8192 bytes (starting at offset 0) but it only read 13 bytes in total.

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  9. #9
    couling is offline Member
    Join Date
    Nov 2010
    Posts
    54
    Rep Power
    0

    Default

    That's because the file is 13 bytes long. :p

    The reader tried to read 8192 bytes, and didn't try to rewind the overread.

    The file contained a single byte '' in some form of ASCII I can't remember, followed by a new line followed by some other bytes. Those bytes are thus lost because they were not supposed to be decoded into unicode but were lost somewhere in the reader or buffer.

    The more I look at this the more I convince myself I need to code manually. For utf8 I can cheat a little and assume that new line characters can be mapped directly to and from bytes safely, so I can search the incoming bytes for the end of line before I invoke the charset decoding function.

    Thanks for the discussion.
    ----Signature ----
    Please use [CODE] tags and indent correctly. It really helps when reading your code.

  10. #10
    ra4king's Avatar
    ra4king is offline Senior Member
    Join Date
    Apr 2011
    Location
    Atlanta, Georgia, US
    Posts
    396
    Rep Power
    4

    Default

    Reading bytes from a file get lost? Either this is something I have never heard before or you're doing something wrong. Unless something is wrong with your hardware, you can't lose bytes reading from a file. ;)

  11. #11
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,728
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by ra4king View Post
    Reading bytes from a file get lost? Either this is something I have never heard before or you're doing something wrong. Unless something is wrong with your hardware, you can't lose bytes reading from a file. ;)
    No, those bytes aren't lost but they are in a buffer somewhere; given the scenario created by the OP:

    Java Code:
    BufferedReader r = new BufferedReader(new InputStreamReader(new TestInputStream(new FileInputStream("c:\\pound.txt"))),1);
    The TestInputStream reports that 13 bytes have been read so either the InputStream reader or the Buffered reader must've requested for (at least) that many bytes; an InputStreamReader doesn't really buffer so it must've been the BufferedReader that bufffers those bytes (converted to chars). I don't understand that because it has a buffer of one single char (see its ctor). Of course the FileInputStream can buffer all it wants, those bytes remain logically in the input stream. I have to think about this; of course it can be solved by doing everything yourself (as I suggested in my first reply), but I find that clumsy ...

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  12. #12
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,173
    Rep Power
    20

    Default

    Quote Originally Posted by JosAH View Post
    No, those bytes aren't lost but they are in a buffer somewhere; given the scenario created by the OP:
    The InputStreamReader holds a buffer (heapbytebuffer or something like that).
    That's where the stuff has ended up.

  13. #13
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,173
    Rep Power
    20

    Default

    Just to clarify, the buffer is in the StreamDecoder which is in the InputStreamReader.

  14. #14
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,728
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by Tolls View Post
    Just to clarify, the buffer is in the StreamDecoder which is in the InputStreamReader.
    Yes, but the API documentation says:

    Quote Originally Posted by API
    Each invocation of one of an InputStreamReader's read() methods may cause one or more bytes to be read from the underlying byte-input stream. To enable the efficient conversion of bytes to characters, more bytes may be read ahead from the underlying stream than are necessary to satisfy the current read operation.
    and a simple pound sign doesn't make it buffer 13 chars ...

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  15. #15
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,173
    Rep Power
    20

    Default

    Using the above code this:
    Java Code:
    System.out.println(r.readLine());
    Read in the entire 2 line file I used into the sd buffer.
    Debug it and watch.

  16. #16
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,728
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by Tolls View Post
    Using the above code this:
    Java Code:
    System.out.println(r.readLine());
    Read in the entire 2 line file I used into the sd buffer.
    Debug it and watch.
    Yep, I debugged it and saw the InputStreamReader slorping in as much as it could; that ruins my 1 char buffer idea for the BufferedReader ... Oh well, programming it yourself takes just a few lines of code (although I find it a bit clumsy).

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  17. #17
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,173
    Rep Power
    20

    Default

    It does seem odd that you can't control the underlying StreamDecoder, but there is no handle on that at all.

  18. #18
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,728
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by Tolls View Post
    It does seem odd that you can't control the underlying StreamDecoder, but there is no handle on that at all.
    Yep, I also tried it with a Scanner object (it can read lines also) but it uses a decoder internally just as the InputStreamReader does, so the results are the same ...

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  19. #19
    ra4king's Avatar
    ra4king is offline Senior Member
    Join Date
    Apr 2011
    Location
    Atlanta, Georgia, US
    Posts
    396
    Rep Power
    4

    Default

    What are you trying to do that can't be satisfied by the current classes?

  20. #20
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,728
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by ra4king View Post
    What are you trying to do that can't be satisfied by the current classes?
    Read the OP's first post; it can't be done with Readers (or more exact: anything that uses a CharsetDecoder) because the darn things always reads ahead. I'm afraid it has to be done 'manually' by using a simple InputStream.

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

Similar Threads

  1. Input Stream as an Object?
    By sehudson in forum New To Java
    Replies: 1
    Last Post: 03-12-2011, 01:37 AM
  2. Http Input Stream read method
    By chandan.mishra in forum Advanced Java
    Replies: 2
    Last Post: 01-03-2011, 03:36 AM
  3. Object Input Stream EOFException
    By FlyNn in forum New To Java
    Replies: 1
    Last Post: 12-18-2010, 01:33 PM
  4. Input stream error
    By Johnny68 in forum New To Java
    Replies: 10
    Last Post: 08-05-2010, 07:20 PM
  5. Stream closed on a ClassLoader input
    By RaistlinMajeren in forum Advanced Java
    Replies: 15
    Last Post: 06-03-2010, 08:18 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •