Results 1 to 14 of 14
- 01-08-2010, 05:14 PM #1
Member
- Join Date
- Jan 2010
- Posts
- 13
- Rep Power
- 0
How do I count empty spaces in a byte array?
I'm trying to copy one byte array to another (input - temp). They are both the same length.
If there are 2+ bytes in a row that are the same in the input byte[] then a specific byte is written to the temp[i] followed by the byte that is repeated and then however many bytes are the same are skipped. For example, a a a a a = 5 a _ _ _. (If that makes sense!)
I'm not 100% sure about the loop that'll do this yet, but thinking ahead, will actually I be able to count the number of spaces in the array that haven't actually got anything in them?
This is what I've got so far:
Java Code:public byte[] transform(byte[] in) { input = in; temp = new byte[in.length]; for (int i = 0; i < in.length-1; i++) { if (input[i] == input[i+1]) { rep = input[i]; if (input[i+1] == input[i+2]) { temp[i] = (byte) 0xE3; temp[i+1] = input[i]; i += 2; } } else { temp[i] = input[i]; } for (byte b: temp) { // check through and count how many spaces in the // array are taken/empty } } System.out.println(rep); return temp;
- 01-08-2010, 06:08 PM #2
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,372
- Blog Entries
- 7
- Rep Power
- 17
Something like this can do the job:
Pay special attention to the gory details in the body of the loop. This version doesn't encode a character that repeats less than 3 times.Java Code:private int copy(char[] from, char[] to) { int j; for (int i= 0; i < from.length; i= j) { for (j= i+1; j < from.length && from[j] == from[i] && j-i < 128; j++); if (j-i >= 3) { // found repeating char from[i] at least 3 times to[i++]= <special character>; to[i++]= (byte)(j-i); // fails if j-i > 127, hence the silly test above to[i++]= from[j-1]; // copy character } else { to[i]= from[i]; j= i+1; } } return j; // number of chars copied }
kind regards,
Jos (warning: untested code)Last edited by JosAH; 01-08-2010 at 07:56 PM.
- 01-09-2010, 01:29 PM #3
Member
- Join Date
- Jan 2010
- Posts
- 13
- Rep Power
- 0
Going on
That's very helpful Jos, thank you very much!
That would work perfectly if I was only going through looking for repetitions once.
For if I'm reading a really looong array in, that has lots of potential different repetitions, I've put the loops into another loop (I know I haven't quite got it).
Would it be better to use counters and keep track of how many times there is a repetition and how long each one is, and then use them to calculate the size of the new array? ie wholeRepeat and individualRepeat..
Java Code:private int count(byte[] from, byte[] to) { int j = 0; int wholeRepeat; int individualRepeat; byte repeatedByte; for (byte b: from) { for (int i = 0; i < from.length; i = j) { for (j = i+1; j < from.length && from[j] == from[i] && j-1 < 128; j++); if (j-1 >= 3) { individualRepeat = j; repeatedByte = getRepeatByte(individualRepeat); to[i++] = repeatedByte; to[i++] = (byte) (j-i); to[i++] = from[j-1]; } else { to[i] = from[i]; j = i+1; } } return j; } } private byte getRepeatByte(int numberOfRepeats) { byte instruction; switch (numberOfRepeats) { case 1: instruction = (byte) 0xE1; break; case 2: instruction = (byte) 0xE2; break; case 3: instruction = (byte) 0xE3; break; case 4: instruction = (byte) 0xE4; break; case 5: instruction = (byte) 0xE5; break; case 6: instruction = (byte) 0xE6; break; case 7: instruction = (byte) 0xE7; break; case 8: instruction = (byte) 0xE8; break; case 9: instruction = (byte) 0xE9; break; case 10: instruction = (byte) 0xEA; break; case 11: instruction = (byte) 0xEB; break; case 12: instruction = (byte) 0xEC; break; case 13: instruction = (byte) 0xED; break; case 14: instruction = (byte) 0xEE; break; case 15: instruction = (byte) 0xEF; break; default: instruction = (byte) 0xE0; break; } return instruction; }
- 01-09-2010, 02:14 PM #4
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,372
- Blog Entries
- 7
- Rep Power
- 17
- 01-09-2010, 02:40 PM #5
Member
- Join Date
- Jan 2010
- Posts
- 13
- Rep Power
- 0
Clarification
What I mean is, that loop would work for "a a a a a" = "5a", what if i had "a a a a a b b b a b c c c c c"? I'd try to make it "5a---3b a b 5c".
- 01-09-2010, 02:47 PM #6
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,372
- Blog Entries
- 7
- Rep Power
- 17
My version of the algorithm would produce X5aX3babX5c where X denotes that special character. The algorithm refuses to encode characters that repeat less than three times consecutively (too avoid overflow in the destination array). If this is not what you want please elaborate.
kind regards,
Jos
- 01-09-2010, 03:34 PM #7
Member
- Join Date
- Jan 2010
- Posts
- 13
- Rep Power
- 0
Ah I see! I realise that what I wrote above doesn't exactly make clear what I'm getting at - and I think I was a bit confused myself! Thank you for the help, I really appreciate it.
I was coming at it from the approach that I would have 15 different unique bytes each of which specifies how many times to repeat the next character (up to 15 times), for example if I had "0xE3" followed by "a" then a would be repeated 3 times, if I had "0xE4" followed by "a" then a would be repeated 4 times, etc. But the way you did it would make the next method I'm going to call easier to write! :)
This is what I have now:
Then I can iterate through the to[] array and just write it to a smaller one of size j.Java Code:private int count(byte[] from, byte[] to) { int j = 0; byte markerByte = (byte) 0xE1; for (int i = 0; i < from.length; i = j) { for (j = i+1; j < from.length && from[j] == from[i] && j-i < 128; j++); if (j-i >= 3) { to[i++] = markerByte; to[i++] = (byte) (j-i); to[i++] = from[j-1]; } else { to[i] = from[i]; j = i+1; } } return j; }
- 01-09-2010, 03:58 PM #8
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,372
- Blog Entries
- 7
- Rep Power
- 17
There is a small mistake in my code: it should return something else, not j because i and j are used for other purposes. Also: be very careful with that 'marker byte' value; what do you do when that byte value occurs in your original byte values? (there are solutions for that little problem).
Here's my corrected code:
kind regards,Java Code:private int copy(char[] from, char[] to) { int k= 0; // total bytes copied so far for (int i= 0, j= 0; i < from.length; i= j) { for (j= i+1; j < from.length && from[j] == from[i] && j-i < 128; j++); if (j-i >= 3) { // found repeating char from[i] at least 3 times to[k++]= <special character>; to[k++]= (byte)(j-i); // fails if j-i > 127, hence the silly test above to[k++]= from[j-1]; // copy character } else { to[k]= from[k]; k= j= i+1; } } return k; // number of chars copied }
Jos
- 01-11-2010, 01:03 PM #9
Member
- Join Date
- Jan 2010
- Posts
- 13
- Rep Power
- 0
When I run this with a simple file, I don't get any output saying "Instruction: " at any point - leading me to believe it's never actually called.
Is this because in my count method the to[k++] is never actually written?? I'v confused myself rather!
I just changed it a little. Now, for example, when I run a file containing "aaaabbbbbaaaabbabaaabbbbbbbaaaaa" I get the following output:Java Code:public class Transformer { byte[] input; byte[] temp; byte[] output; public Transformer() { } public byte[] transform(byte[] in) { input = in; temp = new byte[in.length]; output = new byte[count(input, temp)]; output = compress(temp, output); return output; } private int count(byte[] from, byte[] to) { int k = 0; // total bytes copied so far for (int i = 0, j = 0; i < from.length; i = j) { for (j = i+1; j < from.length && from[j] == from[i]; j++); if (j-i >= 3 && j <= 15) { to[k++] = getRepeatByte(j); to[k++] = from[j-1]; } else { to[i] = from[i]; k++; } System.out.println(k); } return k; } private byte[] compress(byte[] from, byte[] to) { for (int i = 0, j = 0; i < from.length; i++) { System.out.println(from[i]); if (from[i] == 0) { System.out.println("empty"); } else { to[j++] = from[i]; } } return to; } private byte getRepeatByte(int numberOfRepeats) { byte instruction; switch (numberOfRepeats) { case 1: instruction = (byte) 0xE1; break; case 2: instruction = (byte) 0xE2; break; case 3: instruction = (byte) 0xE3; break; case 4: instruction = (byte) 0xE4; break; case 5: instruction = (byte) 0xE5; break; case 6: instruction = (byte) 0xE6; break; case 7: instruction = (byte) 0xE7; break; case 8: instruction = (byte) 0xE8; break; case 9: instruction = (byte) 0xE9; break; case 10: instruction = (byte) 0xEA; break; case 11: instruction = (byte) 0xEB; break; case 12: instruction = (byte) 0xEC; break; case 13: instruction = (byte) 0xED; break; case 14: instruction = (byte) 0xEE; break; case 15: instruction = (byte) 0xEF; break; default: instruction = (byte) 0xE0; break; } System.out.println("Instruction: " + instruction); return instruction; } }
Instruction: -28
2
Instruction: -23
4
Instruction: -19
6
7
8
9
10
11
12
13
-28
97
-23
98
-19
97
0
empty
0
empty
0
empty
0
empty
0
empty
0
empty
0
empty
98
0
empty
97
98
97
0
empty
0
empty
98
0
empty
0
empty
0
empty
0
empty
0
empty
0
empty
97
0
empty
0
empty
0
empty
0
empty
10Last edited by nessa203; 01-11-2010 at 01:10 PM. Reason: Further information added
- 01-11-2010, 01:43 PM #10
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,372
- Blog Entries
- 7
- Rep Power
- 17
You mutilated my method again and you don't need that other method, i.e. you can 'calculate' the value of the marker byte:
Your version didn't write the encoded sequence if a character repeated more than 15 times. Also the else { ... } part was wrong. A repetition of two can be encoded in the same number of bytes.Java Code:private int count(byte[] from, byte[] to) { int k = 0; // total bytes copied so far for (int i = 0, j = 0; i < from.length; i = j) { for (j = i+1; j < from.length && from[j] == from[i] && j-i < 15; j++); if (j-i >= 2) { to[k++] = (byte)(0xE0+j-i); to[k++] = from[j-1]; } else to[k++] = from[i]; } return k; }
kind regards,
Jos
- 01-11-2010, 02:22 PM #11
Member
- Join Date
- Jan 2010
- Posts
- 13
- Rep Power
- 0
I'm sorry for mutilating your method. :( It wasn't my intention at all.
Thank you very much for all of your help!
- 01-11-2010, 04:48 PM #12
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,372
- Blog Entries
- 7
- Rep Power
- 17
Don't worry about it, do with that method whatever you want. I do have a couple of questions for you though:
1) Why do you want to filter out 0 values?
2) How do you want to take care of byte values 0xE0, 0xE1 ... 0xEF in your source array?
3) What is the purpose of this entire exercise?
kind regards,
Jos
- 01-11-2010, 05:31 PM #13
Member
- Join Date
- Jan 2010
- Posts
- 13
- Rep Power
- 0
Answers to your q's:
1) In order to make the file size smaller.
2) I haven't really considered that yet! I though about it earlier when thinking about using the marker byte and came across reserved opcodes: numbers 254 (0xfe) and 255 (0xff), but i've not really researched them to see if they're actually suitable.
3) I'm trying to make an encoder - to read in files, compress them and output them into a smaller file. After that a decompressor. The decompressor is quite simple in that it should have a dictionary of 0xFF phrases. The zeroth phrase is a byte with value 0x0, the first phrase is a byte with value 0x1 and so on, (the n’th element of the array is a phrase consisting of a bytse with value n). The decompressor reads the input one byte at a time and treats each one as an instruction. I was told that doing it using hexadecimal notation made it easier to see bit patterns..
- 01-11-2010, 05:46 PM #14
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,372
- Blog Entries
- 7
- Rep Power
- 17
About 1) you can't just leave out bytes with value 0; e.g. suppose this is the byte stream from a compressed file: 1, 2, 3. Was/were there zero valued bytes that were filtered out? If yes, how many and at what position should they be inserted in order to decompress the file?
About 2) google for 'run length encoding' for solutions to that small problem.
About 3) I don't understand what you're saying here.
kind regards,
Jos
Similar Threads
-
i can do count, but having problem with array
By judy318 in forum New To JavaReplies: 4Last Post: 10-29-2009, 09:23 PM -
Delete Empty Spaces...
By ohytheng in forum New To JavaReplies: 1Last Post: 04-15-2009, 09:59 PM -
String array to byte array?!
By Joe2003 in forum Advanced JavaReplies: 5Last Post: 02-28-2009, 06:09 AM -
Printing Byte Array
By suchismitasuchi in forum New To JavaReplies: 3Last Post: 01-19-2009, 10:58 AM -
Byte Array
By sandor in forum New To JavaReplies: 12Last Post: 01-15-2009, 03:31 AM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks