# Thread: How do I count empty spaces in a byte array?

1. Member Join Date
Jan 2010
Posts
13
Rep Power
0

## How do I count empty spaces in a byte array?

I'm trying to copy one byte array to another (input - temp). They are both the same length.

If there are 2+ bytes in a row that are the same in the input byte[] then a specific byte is written to the temp[i] followed by the byte that is repeated and then however many bytes are the same are skipped. For example, a a a a a = 5 a _ _ _. (If that makes sense!)

I'm not 100% sure about the loop that'll do this yet, but thinking ahead, will actually I be able to count the number of spaces in the array that haven't actually got anything in them?

This is what I've got so far:

Java Code:
```	public byte[] transform(byte[] in) {
input = in;
temp = new byte[in.length];

for (int i = 0; i < in.length-1; i++) {
if (input[i] == input[i+1]) {
rep = input[i];
if (input[i+1] == input[i+2]) {
temp[i] = (byte) 0xE3;
temp[i+1] = input[i];
i += 2;
}
} else {
temp[i] = input[i];
}
for (byte b: temp) {
// check through and count how many spaces in the
// array are taken/empty
}
}
System.out.println(rep);
return temp;```  Reply With Quote

2. ## Something like this can do the job:

Java Code:
```private int copy(char[] from, char[] to) {
int j;
for (int i= 0; i < from.length; i= j) {
for (j= i+1; j < from.length && from[j] == from[i] && j-i < 128; j++);
if (j-i >= 3) { // found repeating char from[i] at least 3 times
to[i++]= <special character>;
to[i++]= (byte)(j-i); // fails if j-i > 127, hence the silly test above
to[i++]= from[j-1]; // copy character
}
else {
to[i]= from[i];
j= i+1;
}
}
return j; // number of chars copied
}```
Pay special attention to the gory details in the body of the loop. This version doesn't encode a character that repeats less than 3 times.

kind regards,

Jos (warning: untested code)
Last edited by JosAH; 01-08-2010 at 07:56 PM.  Reply With Quote

3. Member Join Date
Jan 2010
Posts
13
Rep Power
0

## Going on

That's very helpful Jos, thank you very much!

That would work perfectly if I was only going through looking for repetitions once.

For if I'm reading a really looong array in, that has lots of potential different repetitions, I've put the loops into another loop (I know I haven't quite got it).

Would it be better to use counters and keep track of how many times there is a repetition and how long each one is, and then use them to calculate the size of the new array? ie wholeRepeat and individualRepeat..

Java Code:
```	private int count(byte[] from, byte[] to) {
int j = 0;
int wholeRepeat;
int individualRepeat;
byte repeatedByte;
for (byte b: from) {
for (int i = 0; i < from.length; i = j) {
for (j = i+1; j < from.length && from[j] == from[i] && j-1 < 128; j++);
if (j-1 >= 3) {
individualRepeat = j;
repeatedByte = getRepeatByte(individualRepeat);
to[i++] = repeatedByte;
to[i++] = (byte) (j-i);
to[i++] = from[j-1];
} else {
to[i] = from[i];
j = i+1;
}
}
return j;
}

}

private byte getRepeatByte(int numberOfRepeats) {
byte instruction;
switch (numberOfRepeats) {
case 1: instruction = (byte) 0xE1; break;
case 2: instruction = (byte) 0xE2; break;
case 3: instruction = (byte) 0xE3; break;
case 4: instruction = (byte) 0xE4; break;
case 5: instruction = (byte) 0xE5; break;
case 6: instruction = (byte) 0xE6; break;
case 7: instruction = (byte) 0xE7; break;
case 8: instruction = (byte) 0xE8; break;
case 9: instruction = (byte) 0xE9; break;
case 10: instruction = (byte) 0xEA; break;
case 11: instruction = (byte) 0xEB; break;
case 12: instruction = (byte) 0xEC; break;
case 13: instruction = (byte) 0xED; break;
case 14: instruction = (byte) 0xEE; break;
case 15: instruction = (byte) 0xEF; break;
default: instruction = (byte) 0xE0; break;
}
return instruction;
}```  Reply With Quote

4. ##  Originally Posted by nessa203 For if I'm reading a really looong array in, that has lots of potential different repetitions, I've put the loops into another loop (I know I haven't quite got it).
Sorry I don't understand your remark. btw the body of your if-statement doesn't make sense, also the if-condition in my version reads 'j-minus-aye', not 'j-minus-one'.

kind regards,

Jos  Reply With Quote

5. Member Join Date
Jan 2010
Posts
13
Rep Power
0

## Clarification

What I mean is, that loop would work for "a a a a a" = "5a", what if i had "a a a a a b b b a b c c c c c"? I'd try to make it "5a---3b a b 5c".  Reply With Quote

6. ##  Originally Posted by nessa203 What I mean is, that loop would work for "a a a a a" = "5a", what if i had "a a a a a b b b a b c c c c c"? I'd try to make it "5a---3b a b 5c".
My version of the algorithm would produce X5aX3babX5c where X denotes that special character. The algorithm refuses to encode characters that repeat less than three times consecutively (too avoid overflow in the destination array). If this is not what you want please elaborate.

kind regards,

Jos  Reply With Quote

7. Member Join Date
Jan 2010
Posts
13
Rep Power
0

## Ah I see! I realise that what I wrote above doesn't exactly make clear what I'm getting at - and I think I was a bit confused myself! Thank you for the help, I really appreciate it.

I was coming at it from the approach that I would have 15 different unique bytes each of which specifies how many times to repeat the next character (up to 15 times), for example if I had "0xE3" followed by "a" then a would be repeated 3 times, if I had "0xE4" followed by "a" then a would be repeated 4 times, etc. But the way you did it would make the next method I'm going to call easier to write! :)

This is what I have now:

Java Code:
```	private int count(byte[] from, byte[] to) {
int j = 0;
byte markerByte = (byte) 0xE1;
for (int i = 0; i < from.length; i = j) {
for (j = i+1; j < from.length && from[j] == from[i] && j-i < 128; j++);
if (j-i >= 3) {
to[i++] = markerByte;
to[i++] = (byte) (j-i);
to[i++] = from[j-1];
} else {
to[i] = from[i];
j = i+1;
}
}
return j;
}```
Then I can iterate through the to[] array and just write it to a smaller one of size j.  Reply With Quote

8. ##  Originally Posted by nessa203 Then I can iterate through the to[] array and just write it to a smaller one of size j.
There is a small mistake in my code: it should return something else, not j because i and j are used for other purposes. Also: be very careful with that 'marker byte' value; what do you do when that byte value occurs in your original byte values? (there are solutions for that little problem).

Here's my corrected code:

Java Code:
```private int copy(char[] from, char[] to) {
int k= 0; // total bytes copied so far
for (int i= 0, j= 0; i < from.length; i= j) {
for (j= i+1; j < from.length && from[j] == from[i] && j-i < 128; j++);
if (j-i >= 3) { // found repeating char from[i] at least 3 times
to[k++]= <special character>;
to[k++]= (byte)(j-i); // fails if j-i > 127, hence the silly test above
to[k++]= from[j-1]; // copy character
}
else {
to[k]= from[k];
k= j= i+1;
}
}
return k; // number of chars copied
}```
kind regards,

Jos  Reply With Quote

9. Member Join Date
Jan 2010
Posts
13
Rep Power
0

## When I run this with a simple file, I don't get any output saying "Instruction: " at any point - leading me to believe it's never actually called.

Is this because in my count method the to[k++] is never actually written?? I'v confused myself rather!

Java Code:
```public class Transformer {

byte[] input;
byte[] temp;
byte[] output;

public Transformer() {
}

public byte[] transform(byte[] in) {
input = in;
temp = new byte[in.length];
output = new byte[count(input, temp)];
output = compress(temp, output);
return output;
}

private int count(byte[] from, byte[] to) {
int k = 0; // total bytes copied so far
for (int i = 0, j = 0; i < from.length; i = j) {
for (j = i+1; j < from.length && from[j] == from[i]; j++);
if (j-i >= 3 && j <= 15) {
to[k++] = getRepeatByte(j);
to[k++] = from[j-1];
} else {
to[i] = from[i];
k++;
}
System.out.println(k);
}
return k;
}

private byte[] compress(byte[] from, byte[] to) {
for (int i = 0, j = 0; i < from.length; i++) {
System.out.println(from[i]);
if (from[i] == 0) {
System.out.println("empty");
} else {
to[j++] = from[i];
}
}
}

private byte getRepeatByte(int numberOfRepeats) {
byte instruction;
switch (numberOfRepeats) {
case 1: instruction = (byte) 0xE1; break;
case 2: instruction = (byte) 0xE2; break;
case 3: instruction = (byte) 0xE3; break;
case 4: instruction = (byte) 0xE4; break;
case 5: instruction = (byte) 0xE5; break;
case 6: instruction = (byte) 0xE6; break;
case 7: instruction = (byte) 0xE7; break;
case 8: instruction = (byte) 0xE8; break;
case 9: instruction = (byte) 0xE9; break;
case 10: instruction = (byte) 0xEA; break;
case 11: instruction = (byte) 0xEB; break;
case 12: instruction = (byte) 0xEC; break;
case 13: instruction = (byte) 0xED; break;
case 14: instruction = (byte) 0xEE; break;
case 15: instruction = (byte) 0xEF; break;
default: instruction = (byte) 0xE0; break;
}
System.out.println("Instruction: " + instruction);
return instruction;
}
}```
I just changed it a little. Now, for example, when I run a file containing "aaaabbbbbaaaabbabaaabbbbbbbaaaaa" I get the following output:
Instruction: -28
2
Instruction: -23
4
Instruction: -19
6
7
8
9
10
11
12
13
-28
97
-23
98
-19
97
0
empty
0
empty
0
empty
0
empty
0
empty
0
empty
0
empty
98
0
empty
97
98
97
0
empty
0
empty
98
0
empty
0
empty
0
empty
0
empty
0
empty
0
empty
97
0
empty
0
empty
0
empty
0
empty
10
Last edited by nessa203; 01-11-2010 at 01:10 PM. Reason: Further information added  Reply With Quote

10. ## You mutilated my method again and you don't need that other method, i.e. you can 'calculate' the value of the marker byte:

Java Code:
```private int count(byte[] from, byte[] to) {
int k = 0; // total bytes copied so far
for (int i = 0, j = 0; i < from.length; i = j) {
for (j = i+1; j < from.length && from[j] == from[i] && j-i < 15; j++);
if (j-i >= 2) {
to[k++] = (byte)(0xE0+j-i);
to[k++] = from[j-1];
} else
to[k++] = from[i];
}
return k;
}```
Your version didn't write the encoded sequence if a character repeated more than 15 times. Also the else { ... } part was wrong. A repetition of two can be encoded in the same number of bytes.

kind regards,

Jos  Reply With Quote

11. Member Join Date
Jan 2010
Posts
13
Rep Power
0

## I'm sorry for mutilating your method. :( It wasn't my intention at all.
Thank you very much for all of your help!  Reply With Quote

12. ##  Originally Posted by nessa203 I'm sorry for mutilating your method. :( It wasn't my intention at all.
Thank you very much for all of your help!
Don't worry about it, do with that method whatever you want. I do have a couple of questions for you though:

1) Why do you want to filter out 0 values?
2) How do you want to take care of byte values 0xE0, 0xE1 ... 0xEF in your source array?
3) What is the purpose of this entire exercise?

kind regards,

Jos  Reply With Quote

13. Member Join Date
Jan 2010
Posts
13
Rep Power
0

## 1) In order to make the file size smaller.
2) I haven't really considered that yet! I though about it earlier when thinking about using the marker byte and came across reserved opcodes: numbers 254 (0xfe) and 255 (0xff), but i've not really researched them to see if they're actually suitable.
3) I'm trying to make an encoder - to read in files, compress them and output them into a smaller file. After that a decompressor. The decompressor is quite simple in that it should have a dictionary of 0xFF phrases. The zeroth phrase is a byte with value 0x0, the first phrase is a byte with value 0x1 and so on, (the nth element of the array is a phrase consisting of a bytse with value n). The decompressor reads the input one byte at a time and treats each one as an instruction. I was told that doing it using hexadecimal notation made it easier to see bit patterns..  Reply With Quote

14. ##  Originally Posted by nessa203 1) In order to make the file size smaller.
2) I haven't really considered that yet! I though about it earlier when thinking about using the marker byte and came across reserved opcodes: numbers 254 (0xfe) and 255 (0xff), but i've not really researched them to see if they're actually suitable.
3) I'm trying to make an encoder - to read in files, compress them and output them into a smaller file. After that a decompressor. The decompressor is quite simple in that it should have a dictionary of 0xFF phrases. The zeroth phrase is a byte with value 0x0, the first phrase is a byte with value 0x1 and so on, (the nth element of the array is a phrase consisting of a bytse with value n). The decompressor reads the input one byte at a time and treats each one as an instruction. I was told that doing it using hexadecimal notation made it easier to see bit patterns..
About 1) you can't just leave out bytes with value 0; e.g. suppose this is the byte stream from a compressed file: 1, 2, 3. Was/were there zero valued bytes that were filtered out? If yes, how many and at what position should they be inserted in order to decompress the file?

About 2) google for 'run length encoding' for solutions to that small problem.

About 3) I don't understand what you're saying here.

kind regards,

Jos  Reply With Quote

array, compression, input 