Results 1 to 14 of 14
  1. #1
    nessa203 is offline Member
    Join Date
    Jan 2010
    Posts
    13
    Rep Power
    0

    Question How do I count empty spaces in a byte array?

    I'm trying to copy one byte array to another (input - temp). They are both the same length.

    If there are 2+ bytes in a row that are the same in the input byte[] then a specific byte is written to the temp[i] followed by the byte that is repeated and then however many bytes are the same are skipped. For example, a a a a a = 5 a _ _ _. (If that makes sense!)

    I'm not 100% sure about the loop that'll do this yet, but thinking ahead, will actually I be able to count the number of spaces in the array that haven't actually got anything in them?

    This is what I've got so far:

    Java Code:
    	public byte[] transform(byte[] in) {
    		input = in;
    		temp = new byte[in.length];
    		
    		for (int i = 0; i < in.length-1; i++) {
    			if (input[i] == input[i+1]) {
    				rep = input[i];
    				if (input[i+1] == input[i+2]) {
    					temp[i] = (byte) 0xE3;
    					temp[i+1] = input[i];
    					i += 2;
    				}
    			} else {
    				temp[i] = input[i];
    			}
    			for (byte b: temp) {
    				// check through and count how many spaces in the 
    				// array are taken/empty
    			}
    		}
    		System.out.println(rep);
    		return temp;

  2. #2
    JosAH's Avatar
    JosAH is online now Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,531
    Blog Entries
    7
    Rep Power
    20

    Default

    Something like this can do the job:

    Java Code:
    private int copy(char[] from, char[] to) {
       int j;
       for (int i= 0; i < from.length; i= j) {
          for (j= i+1; j < from.length && from[j] == from[i] && j-i < 128; j++);
          if (j-i >= 3) { // found repeating char from[i] at least 3 times
             to[i++]= <special character>;
             to[i++]= (byte)(j-i); // fails if j-i > 127, hence the silly test above
             to[i++]= from[j-1]; // copy character
          }
          else {
             to[i]= from[i];
             j= i+1;
          }
       }
       return j; // number of chars copied
    }
    Pay special attention to the gory details in the body of the loop. This version doesn't encode a character that repeats less than 3 times.

    kind regards,

    Jos (warning: untested code)
    Last edited by JosAH; 01-08-2010 at 07:56 PM.

  3. #3
    nessa203 is offline Member
    Join Date
    Jan 2010
    Posts
    13
    Rep Power
    0

    Smile Going on

    That's very helpful Jos, thank you very much!

    That would work perfectly if I was only going through looking for repetitions once.

    For if I'm reading a really looong array in, that has lots of potential different repetitions, I've put the loops into another loop (I know I haven't quite got it).

    Would it be better to use counters and keep track of how many times there is a repetition and how long each one is, and then use them to calculate the size of the new array? ie wholeRepeat and individualRepeat..

    Java Code:
    	private int count(byte[] from, byte[] to) {
    		int j = 0;
    		int wholeRepeat;
    		int individualRepeat;
    		byte repeatedByte;
    		for (byte b: from) {
    			for (int i = 0; i < from.length; i = j) {
    				for (j = i+1; j < from.length && from[j] == from[i] && j-1 < 128; j++);
    				if (j-1 >= 3) {
    					individualRepeat = j;
    					repeatedByte = getRepeatByte(individualRepeat);
    					to[i++] = repeatedByte;
    					to[i++] = (byte) (j-i);
    					to[i++] = from[j-1];
    				} else {
    					to[i] = from[i];
    					j = i+1;
    				}
    			}
    			return j;
    		}
    
    	}
    	
    	private byte getRepeatByte(int numberOfRepeats) {
    		byte instruction;
    		switch (numberOfRepeats) {
    		case 1: instruction = (byte) 0xE1; break;
    		case 2: instruction = (byte) 0xE2; break;
    		case 3: instruction = (byte) 0xE3; break;
    		case 4: instruction = (byte) 0xE4; break;
    		case 5: instruction = (byte) 0xE5; break;
    		case 6: instruction = (byte) 0xE6; break;
    		case 7: instruction = (byte) 0xE7; break;
    		case 8: instruction = (byte) 0xE8; break;
    		case 9: instruction = (byte) 0xE9; break;
    		case 10: instruction = (byte) 0xEA; break;
    		case 11: instruction = (byte) 0xEB; break;
    		case 12: instruction = (byte) 0xEC; break;
    		case 13: instruction = (byte) 0xED; break;
    		case 14: instruction = (byte) 0xEE; break;
    		case 15: instruction = (byte) 0xEF; break;
    		default: instruction = (byte) 0xE0; break;
    		}
    		return instruction;
    	}

  4. #4
    JosAH's Avatar
    JosAH is online now Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,531
    Blog Entries
    7
    Rep Power
    20

    Default

    Quote Originally Posted by nessa203 View Post
    For if I'm reading a really looong array in, that has lots of potential different repetitions, I've put the loops into another loop (I know I haven't quite got it).
    Sorry I don't understand your remark. btw the body of your if-statement doesn't make sense, also the if-condition in my version reads 'j-minus-aye', not 'j-minus-one'.

    kind regards,

    Jos

  5. #5
    nessa203 is offline Member
    Join Date
    Jan 2010
    Posts
    13
    Rep Power
    0

    Default Clarification

    What I mean is, that loop would work for "a a a a a" = "5a", what if i had "a a a a a b b b a b c c c c c"? I'd try to make it "5a---3b a b 5c".

  6. #6
    JosAH's Avatar
    JosAH is online now Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,531
    Blog Entries
    7
    Rep Power
    20

    Default

    Quote Originally Posted by nessa203 View Post
    What I mean is, that loop would work for "a a a a a" = "5a", what if i had "a a a a a b b b a b c c c c c"? I'd try to make it "5a---3b a b 5c".
    My version of the algorithm would produce X5aX3babX5c where X denotes that special character. The algorithm refuses to encode characters that repeat less than three times consecutively (too avoid overflow in the destination array). If this is not what you want please elaborate.

    kind regards,

    Jos

  7. #7
    nessa203 is offline Member
    Join Date
    Jan 2010
    Posts
    13
    Rep Power
    0

    Lightbulb

    Ah I see! I realise that what I wrote above doesn't exactly make clear what I'm getting at - and I think I was a bit confused myself! Thank you for the help, I really appreciate it.

    I was coming at it from the approach that I would have 15 different unique bytes each of which specifies how many times to repeat the next character (up to 15 times), for example if I had "0xE3" followed by "a" then a would be repeated 3 times, if I had "0xE4" followed by "a" then a would be repeated 4 times, etc. But the way you did it would make the next method I'm going to call easier to write! :)

    This is what I have now:

    Java Code:
    	private int count(byte[] from, byte[] to) {
    		int j = 0;
    		byte markerByte = (byte) 0xE1;
    		for (int i = 0; i < from.length; i = j) {
    			for (j = i+1; j < from.length && from[j] == from[i] && j-i < 128; j++);
    			if (j-i >= 3) {
    				to[i++] = markerByte;
    				to[i++] = (byte) (j-i);
    				to[i++] = from[j-1];
    			} else {
    				to[i] = from[i];
    				j = i+1;
    			}
    		}
    		return j;
    	}
    Then I can iterate through the to[] array and just write it to a smaller one of size j.

  8. #8
    JosAH's Avatar
    JosAH is online now Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,531
    Blog Entries
    7
    Rep Power
    20

    Default

    Quote Originally Posted by nessa203 View Post
    Then I can iterate through the to[] array and just write it to a smaller one of size j.
    There is a small mistake in my code: it should return something else, not j because i and j are used for other purposes. Also: be very careful with that 'marker byte' value; what do you do when that byte value occurs in your original byte values? (there are solutions for that little problem).

    Here's my corrected code:

    Java Code:
    private int copy(char[] from, char[] to) {
       int k= 0; // total bytes copied so far
       for (int i= 0, j= 0; i < from.length; i= j) {
          for (j= i+1; j < from.length && from[j] == from[i] && j-i < 128; j++);
          if (j-i >= 3) { // found repeating char from[i] at least 3 times
             to[k++]= <special character>;
             to[k++]= (byte)(j-i); // fails if j-i > 127, hence the silly test above
             to[k++]= from[j-1]; // copy character
          }
          else {
             to[k]= from[k];
             k= j= i+1;
          }
       }
       return k; // number of chars copied
    }
    kind regards,

    Jos

  9. #9
    nessa203 is offline Member
    Join Date
    Jan 2010
    Posts
    13
    Rep Power
    0

    Question

    When I run this with a simple file, I don't get any output saying "Instruction: " at any point - leading me to believe it's never actually called.

    Is this because in my count method the to[k++] is never actually written?? I'v confused myself rather!

    Java Code:
    public class Transformer {
    	
    	byte[] input;
    	byte[] temp;
    	byte[] output;
    
    	public Transformer() {
    	}
    	
    	public byte[] transform(byte[] in) {
    		input = in;
    		temp = new byte[in.length];
    		output = new byte[count(input, temp)];
    		output = compress(temp, output);
    		return output;
    	}	
    
    		private int count(byte[] from, byte[] to) {
    			int k = 0; // total bytes copied so far
    			for (int i = 0, j = 0; i < from.length; i = j) {
    				for (j = i+1; j < from.length && from[j] == from[i]; j++);
    				if (j-i >= 3 && j <= 15) {
    					to[k++] = getRepeatByte(j);
    					to[k++] = from[j-1];
    				} else {
    					to[i] = from[i];
    					k++;
    				}
    				System.out.println(k);
    			}
    			return k;
    		}
    		
    		private byte[] compress(byte[] from, byte[] to) {
    			for (int i = 0, j = 0; i < from.length; i++) {
    				System.out.println(from[i]);
    				if (from[i] == 0) {
    					System.out.println("empty");
    				} else {
    					to[j++] = from[i];
    				}
    			}
    			return to;
    		}
    	
    	private byte getRepeatByte(int numberOfRepeats) {
    		byte instruction;
    		switch (numberOfRepeats) {
    		case 1: instruction = (byte) 0xE1; break;
    		case 2: instruction = (byte) 0xE2; break;
    		case 3: instruction = (byte) 0xE3; break;
    		case 4: instruction = (byte) 0xE4; break;
    		case 5: instruction = (byte) 0xE5; break;
    		case 6: instruction = (byte) 0xE6; break;
    		case 7: instruction = (byte) 0xE7; break;
    		case 8: instruction = (byte) 0xE8; break;
    		case 9: instruction = (byte) 0xE9; break;
    		case 10: instruction = (byte) 0xEA; break;
    		case 11: instruction = (byte) 0xEB; break;
    		case 12: instruction = (byte) 0xEC; break;
    		case 13: instruction = (byte) 0xED; break;
    		case 14: instruction = (byte) 0xEE; break;
    		case 15: instruction = (byte) 0xEF; break;
    		default: instruction = (byte) 0xE0; break;
    		}
    		System.out.println("Instruction: " + instruction);
    		return instruction;
    	}
    }
    I just changed it a little. Now, for example, when I run a file containing "aaaabbbbbaaaabbabaaabbbbbbbaaaaa" I get the following output:
    Instruction: -28
    2
    Instruction: -23
    4
    Instruction: -19
    6
    7
    8
    9
    10
    11
    12
    13
    -28
    97
    -23
    98
    -19
    97
    0
    empty
    0
    empty
    0
    empty
    0
    empty
    0
    empty
    0
    empty
    0
    empty
    98
    0
    empty
    97
    98
    97
    0
    empty
    0
    empty
    98
    0
    empty
    0
    empty
    0
    empty
    0
    empty
    0
    empty
    0
    empty
    97
    0
    empty
    0
    empty
    0
    empty
    0
    empty
    10
    Last edited by nessa203; 01-11-2010 at 01:10 PM. Reason: Further information added

  10. #10
    JosAH's Avatar
    JosAH is online now Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,531
    Blog Entries
    7
    Rep Power
    20

    Default

    You mutilated my method again and you don't need that other method, i.e. you can 'calculate' the value of the marker byte:

    Java Code:
    private int count(byte[] from, byte[] to) {
    	int k = 0; // total bytes copied so far
    	for (int i = 0, j = 0; i < from.length; i = j) {
    		for (j = i+1; j < from.length && from[j] == from[i] && j-i < 15; j++);
    		if (j-i >= 2) {
    			to[k++] = (byte)(0xE0+j-i);
    			to[k++] = from[j-1];
    		} else 
    			to[k++] = from[i];
    	}
    	return k;
    }
    Your version didn't write the encoded sequence if a character repeated more than 15 times. Also the else { ... } part was wrong. A repetition of two can be encoded in the same number of bytes.

    kind regards,

    Jos

  11. #11
    nessa203 is offline Member
    Join Date
    Jan 2010
    Posts
    13
    Rep Power
    0

    Default

    I'm sorry for mutilating your method. :( It wasn't my intention at all.
    Thank you very much for all of your help!

  12. #12
    JosAH's Avatar
    JosAH is online now Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,531
    Blog Entries
    7
    Rep Power
    20

    Default

    Quote Originally Posted by nessa203 View Post
    I'm sorry for mutilating your method. :( It wasn't my intention at all.
    Thank you very much for all of your help!
    Don't worry about it, do with that method whatever you want. I do have a couple of questions for you though:

    1) Why do you want to filter out 0 values?
    2) How do you want to take care of byte values 0xE0, 0xE1 ... 0xEF in your source array?
    3) What is the purpose of this entire exercise?

    kind regards,

    Jos

  13. #13
    nessa203 is offline Member
    Join Date
    Jan 2010
    Posts
    13
    Rep Power
    0

    Smile

    Answers to your q's:
    1) In order to make the file size smaller.
    2) I haven't really considered that yet! I though about it earlier when thinking about using the marker byte and came across reserved opcodes: numbers 254 (0xfe) and 255 (0xff), but i've not really researched them to see if they're actually suitable.
    3) I'm trying to make an encoder - to read in files, compress them and output them into a smaller file. After that a decompressor. The decompressor is quite simple in that it should have a dictionary of 0xFF phrases. The zeroth phrase is a byte with value 0x0, the first phrase is a byte with value 0x1 and so on, (the níth element of the array is a phrase consisting of a bytse with value n). The decompressor reads the input one byte at a time and treats each one as an instruction. I was told that doing it using hexadecimal notation made it easier to see bit patterns..

  14. #14
    JosAH's Avatar
    JosAH is online now Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,531
    Blog Entries
    7
    Rep Power
    20

    Default

    Quote Originally Posted by nessa203 View Post
    Answers to your q's:
    1) In order to make the file size smaller.
    2) I haven't really considered that yet! I though about it earlier when thinking about using the marker byte and came across reserved opcodes: numbers 254 (0xfe) and 255 (0xff), but i've not really researched them to see if they're actually suitable.
    3) I'm trying to make an encoder - to read in files, compress them and output them into a smaller file. After that a decompressor. The decompressor is quite simple in that it should have a dictionary of 0xFF phrases. The zeroth phrase is a byte with value 0x0, the first phrase is a byte with value 0x1 and so on, (the níth element of the array is a phrase consisting of a bytse with value n). The decompressor reads the input one byte at a time and treats each one as an instruction. I was told that doing it using hexadecimal notation made it easier to see bit patterns..
    About 1) you can't just leave out bytes with value 0; e.g. suppose this is the byte stream from a compressed file: 1, 2, 3. Was/were there zero valued bytes that were filtered out? If yes, how many and at what position should they be inserted in order to decompress the file?

    About 2) google for 'run length encoding' for solutions to that small problem.

    About 3) I don't understand what you're saying here.

    kind regards,

    Jos

Similar Threads

  1. i can do count, but having problem with array
    By judy318 in forum New To Java
    Replies: 4
    Last Post: 10-29-2009, 09:23 PM
  2. Delete Empty Spaces...
    By ohytheng in forum New To Java
    Replies: 1
    Last Post: 04-15-2009, 09:59 PM
  3. String array to byte array?!
    By Joe2003 in forum Advanced Java
    Replies: 5
    Last Post: 02-28-2009, 06:09 AM
  4. Printing Byte Array
    By suchismitasuchi in forum New To Java
    Replies: 3
    Last Post: 01-19-2009, 10:58 AM
  5. Byte Array
    By sandor in forum New To Java
    Replies: 12
    Last Post: 01-15-2009, 03:31 AM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •