View RSS Feed

JosAH's blog

Heap Sort I

Rate this Entry
by , 06-12-2011 at 09:52 AM (1680 Views)
Greetings,

A lot of topics are started here in this forum (and a lot of other forums too) mentioning a problem about sorting data.

Examples of two main problem scenarios are like this:

1) an array containing a first name and another array containing a last name and possibly a third array containing the age of a person; how to sort the arrays according to, say, the first name.

2) a List of objects where each object contains a first name, a last name and an age. The list must be sorted according to, say, the last name. The second scenario can easily be solved if you implement a Comparator; something like the following will do fine:

Java Code:
public class PersonComparator implements Comparator<Person> {
	public int compare(Person l, Person r) {
		return l.getLastName().compareTo(r.getLastName());
	}
}
Note that according to the contract of the Comparable<T> interface just a single method needs to be implemented. The equals() method can be left out, possibly at the cost of a small performance penalty.

You need to pass an instance of such a PersonComparator to either a:

a) Collections.sort()
b) Arrays.sort()

method, i.e. the first method sorts a List using a Comparator for the nitty gritty comparison work, or, the second method sorts an array using this Comparator.

But what about scenario 1 then? Of course scenario 1 was bad design to start with but also of course it wasn't your design, it was a bit of legacy code and you weren't around when the design's first implementation was deployed.
Even worse, the solution to scenario 2 only works for arrays or Lists. Scenario 2 is like scenario 1 for other datastructures that need to be sorted according to some sorting criteria. Those two methods (although useful) aren't of much help for anything else that needs to be sorted.

Again there are two possible scenarios here (I call them A and B): either

A) transmogrify your data to an array or List and use the methods mentioned above; after that, re-transmogrify the sorted data to your original data and go on.

or

B) come up with something else.

This little article is all about scenario B, i.e. we come up with something else. What exactly do we need to know about any type of data to be sortable? First we need to know how many data items are to be sorted. We also need to know whether or not one data item is larger, equal or smaller than another data item. Third we want to be able to interchange (or swap) two data items. Given the first notion we'd like to be able to think of indices, i.e. data item 0, data item 1 etc. etc.

The above notions simply beg for an interface definition. Here is a simple one:

Java Code:
public interface Sortable {
	int length();
	int compare(int i, int j);
	void swap(int i, int j);
}
Note that, no matter how trivial this interface may look, if a sorting method is able to sort a Sortable, it doesn't need to know anything at all about the type of data it sorts and it doesn't care less how two data items are swapped and it doesn't care less why one data item is considered to be less, equal or greater than an other data item.

This little article shows a 'heap sort' sorting method that is very well capable of doing the job. A heapsort method is no worse than the ever propagated quick sort method and it runs around any other naive sorting methods big times.

The big-Oh of a heap sort method is equal or better than a quick sort method and as an additional benefit it is stable w.r.t. that (in)famous big-Oh number. Before I forget, it doesn't need any additional memory proportional to the size of the data to be sorted.

This article presents the heap sort algoritm in terms of the Sortable interface. So when two data items need to be compared, the algorithm calls the compare() method, when they need to be swapped, the Sortable can do it etc. etc.

Here is a definition:

a heap is a binary tree such that for any node P, its children L and R (if present) are not less than P, i.e. compare(P, L) and compare(P, R) both return a value >= 0.

Note that a leaf is a heap by itself (a leaf doesn't have children)

For (almost) no particular reason, here's a binary tree with ten nodes:
Java Code:
                  0
               /     \
              1       2        
            /   \   /   \
           3    4   5    6
          / \  / 
          7 8  9
please forgive me my (lack of) ASCII art. This tree does show some interesting properties (without proof):

1: the left kid of a node i is numbered 2*i+1;
2: the right kid of a node i is numbered 2*i+2;
3: if a node i > 0 has k kids, a node i-1 has at least k kids;
4: every node has as many kids as possible.

Those four notions define a 'complete' binary tree. Math folks like those definitions. So a sequence 0, 1, 2 ... n-1 can be seen as a complete binary tree.

Back to that heapsort thing again: suppose that a node P is larger than both of its children L and R for every node in a tree (as sketched above). Then the root of the tree is the largest element of the them all.

All leaves of a tree are heaps by definition, but what about the rest of the tree nodes (nearer to the root of the tree)? The solution is simple: we can build a heap out of a binary tree, given the fact that both the left and right subtrees are heaps already; it works as follows:

Let P be a parent node and L and R the left and right children of P.

1: if P >= L and P >= R then we already have a heap, otherwise:
2: let M be the maximum of L and R;
3: swap P with that maximum M and continue from step 1 again with P at its new position.

For every complete binary tree with n nodes we know that the last n/2 nodes of that tree are leaves and so they are heaps already. (Check this by using my bad little ASCII art above).

So we can turn a (sub)tree into a heap using the three line algoritm above. Lets do it using the Sortable interface defined earlier; here goes:

Java Code:
// given a Sortable, build a heap starting at node p. There are n nodes in total
private static void heapify(Sortable s, int p, int n) {
		
	for (int r, l= (p<<1)+1; l < n; p= l, l= (p<<1)+1) {
		
		// l is the maximum of l and r, the two subnodes of p 
		if ((r= l+1) < n && s.compare(l, r) < 0) l= r;
			
		// check if parent p is less than maximum l
		if (s.compare(p, l) < 0) s.swap(p, l);
		else break;
	}
}
This is all fine, but how to turn a Sortable into a heap? We'll just use the small method above. Traditionally this part is called the 'phase 1' of the heap sort algorithm. Why break with good traditions?

Java Code:
private static void phase1(Sortable s) {

	// heapify all the non-leaf nodes 		
	for (int n= s.length(), p= n/2; p >= 0; p--)
		heapify(s, p, n);
}
The second phase, traditionally named 'phase 2' *ahem* uses a heap as follows to get everything sorted. The root of the tree is the largest element of all the nodes. If we swap the root with the last element of the Sortable we have this largest element in place (at the end). But now the tree is probably not a heap anymore. Using our first method we can turn the tree into a heap again but we ignore the last node (which was in place already).

We repeat the step above until the heap contains just one single node. This node must be the smallest element of all elements in the Sortable. Here goes:

Java Code:
// sort the Sortable
private static void phase2(Sortable s) {
		
	for (int n= s.length(); --n > 0; ) {
		s.swap(0, n); 		// put the root element in its place
		heapify(s, 0, n); 	// and restore the heap again
	}		
}
I don't think it comes as a surprise that the heap sort method itself simply looks like this:

Java Code:
// driver for the methods
public static Sortable sort(Sortable s) { 
		
	phase1(s); 	// build initial heap
	phase2(s); 	// sort the sortable given the heap
		
	return s; 	// return the Sortable for convenience
}
For completeness, so you can copy and paste this sorting utility straight in your own private bag of tricks, here is the entire class:

Java Code:
public final class HeapSort {

	private HeapSort() { } // no istances of this class possible

	// given a Sortable, build a heap starting at node i. There a n nodes in total
	private static void heapify(Sortable s, int p, int n) {
		
		for (int r, l= (p<<1)+1; l < n; p= l, l= (p<<1)+1) {
		
			// l is the maximum of l and r, the two subnodes of p 
			if ((r= l+1) < n && s.compare(l, r) < 0) l= r;
		
			// check if parent p is less than maximum l
			if (s.compare(p, l) < 0) s.swap(p, l);
			else break;
		}

	// build a heap out of the Sortable in place
	private static void phase1(Sortable s) {

		// heapify all the non-leaf nodes 		
		for (int n= s.length(), p= n/2; p >= 0; p--)
			heapify(s, p, n);
	}

	// sort the Sortable
	private static void phase2(Sortable s) {
		
		for (int n= s.length(); --n > 0; ) {
			s.swap(0, n); 		// put the root element in its place
			heapify(s, 0, n); 	// and restore the heap again
		}		
	}

	// driver for the worked methods
	public static Sortable sort(Sortable s) { 
		
		phase1(s); 	// build initial heap
		phase2(s); 	// sort the sortable given the heap
		
		return s; 	// return the Sortable for convenience
	}
}
This class takes about one page of text in my VI text editor. Let's step back a bit from this code and realize what we've just created: this sort algoritm is very efficient and stable w.r.t. it's runtime big-Oh wise. This implementation doesn't know anything about what exactly needs to be sorted, i.e. it only uses the Sortable interface.

The class is final and contains a private default constructor so there is no way for anyone else to create an instance of a HeapSort object; because no such object is needed: this class is a utility class with just functionality.

kind regards,

Jos

Submit "Heap Sort I" to Facebook Submit "Heap Sort I" to Digg Submit "Heap Sort I" to del.icio.us Submit "Heap Sort I" to StumbleUpon Submit "Heap Sort I" to Google

Updated 11-18-2013 at 08:46 AM by JosAH

Tags: None Add / Edit Tags
Categories
Uncategorized

Comments