1. ## Incremental Deletion

Hi everyone

I'm looking for any ideas on Incremental Deletion with the min() aggregate. This is not a homework question if you're wondering. ;) Anyways, here it goes:

Let's say that we have a set of numbers and a data structure is used to store the minimum value such it can be recalculated after incremental deletion. In other words, if I've got one billion values, I don't want to rescan all of them to find the minimum.

I thought of storing the values in a binary search tree as the set grows. When a value is removed the minimum is just the left most leaf. It will not be O(logn) if it's not balanced however. I'm wondering if anyone has a better idea. ;)

Thank you :p
Tim
Last edited by tim; 05-17-2010 at 03:27 PM.

2. Store them in a SortedMap (a TreeMap) where the key is the number itself and the value is the frequency the number occurrred in your sequence. I wonder if it can store a billion numbers though ...

kind regards,

Jos

3. Hi Jos

So the SortedMap implements a tree structure sorted on keys. And if an element is removed and the frequency is zero, what then? How would you find the new minimum? This kind of problem is in terms of data warehouses, so one billion should be reasonable for a parallel machine. ;)

Thank you. ;)
Tim
Last edited by tim; 05-17-2010 at 02:47 PM.

4. Okay, we can have a frequency value to reduce the tree size. Now there will be a node for each unique value. Nodes can be stored in an AVL tree.

Wikipedia
In computer science, an AVL tree is a self-balancing binary search tree
Looks like an improvement. Any other ideas? ;)

5. Originally Posted by tim
Okay, we can have a frequency value to reduce the tree size. Now there will be a node for each unique value. Nodes can be stored in an AVL tree.

Wikipedia

Looks like an improvement. Any other ideas? ;)
Not needed; those TreeMaps are implemented by red-black trees. Also the API allows an efficient way to get the leftmost node (the smallest element). Those AVL trees are old fashioned ;-)

kind regards,

Jos

6. Hi Jos

Ah, yes. Red-black trees have the balanced height property. A tree seems to be the way to go then. I was hoping for some simple method similar to binning or something, but for min/max problems, the leaves are handy. Keeping a sorted list is out of the question. :D But anyways.

Thank you Jos
Tim

7. Originally Posted by tim
Hi Jos

Ah, yes. Red-black trees have the balanced height property. A tree seems to be the way to go then. I was hoping for some simple method similar to binning or something, but for min/max problems, the leaves are handy. Keeping a sorted list is out of the question. :D But anyways.

Thank you Jos
Tim
You're welcome of course; if you don't mind: why do you need to keep a billion numbers?

kind regards,

Jos

8. Hi Jos

I just thought of some hypothetical scenario. A data cube constructed from 1 TB of data will have large aggregates. I'm not sure if a billion could be in a cell, but the main issue is scalability.

Thanks for the ideas. ;)
Tim

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•