Results 1 to 13 of 13
  1. #1
    fogus is offline Member
    Join Date
    Mar 2008
    Posts
    43
    Rep Power
    0

    Default I need a 32 or 64 bit hash function

    Hello,

    I have a need for a hash function that takes either a string(0...255 characters) or an integer(0...very_big) and transforms it into a 32 or 64 bit hash function. I would like to be able to take this value in one byte segments (values 0...255) and use them in a hash table structure. I would prefer a 'good' hash function; one that has a nice avalanche effect.

    Does anyone here have a function that he has experience with and would recommend to me?

    Thanks,
    ~fogus

  2. #2
    fogus is offline Member
    Join Date
    Mar 2008
    Posts
    43
    Rep Power
    0

    Default

    Because, like, if I had one, that would be awesome.

    Anyone got something better than mod?
    ~fogus

  3. #3
    emceenugget is offline Senior Member
    Join Date
    Sep 2008
    Posts
    564
    Rep Power
    7

  4. #4
    neilcoffey is offline Senior Member
    Join Date
    Nov 2008
    Posts
    286
    Rep Power
    6

    Default

    When I need a reasonable-quality non-secure hash, I often use a Java implementation of the Numerical Recipies 64 bit hash. Instead of just mixing in each consecutive byte value directly, you use the byte to index a table of 64-bit numbers. That table is calculated by taking successive output from a random number generator, in this case an XORShift generator. You'll see that during table generation, 30 of every 31 numbers generated is skipped -- I think this is to give successive numbers in the table a chance of having very different bits set (a slightly negative property of the XORShift generators is they can get into a "rut" where not many bits are set in the numbers generated).

    In principle, HSTART could be any non-zero value-- I'm guessing either the authors just liked the results with this value in some way, or else it was the value of the system clock/some other source of 64 random bits when they wrote the algorithm...

    I think any large prime or near-prime value of HMULT should work reasonably well too; again, I'm guessing the authors found this number gave good results.

    Java Code:
    public class DataHasher {
    
      private static final long[] byteTable;
      private static final long HSTART = 0xBB40E64DA205B064L;
      private static final long HMULT = 7664345821815920749L;
      
      static {
        byteTable = new long[256];
        long h = 0x544B2FBACAAF1684L;
        for (int i = 0; i < 256; i++) {
          for (int j = 0; j < 31; j++) {
            h = (h >>> 7) ^ h;
            h = (h << 11) ^ h;
            h = (h >>> 10) ^ h;
          }
          byteTable[i] = h;
        }
      }
      
      public static long hash(byte[] data) {
        long h = HSTART;
        final long hmult = HMULT;
        final long[] ht = byteTable;
        for (int len = data.length, i = 0; i < len; i++) {
          h = (h * hmult) ^ ht[data[i] & 0xff];
        }
        return h;
      }
    }

  5. #5
    neilcoffey is offline Senior Member
    Join Date
    Nov 2008
    Posts
    286
    Rep Power
    6

    Default

    P.S. The example code is obviously designed to be used as follows:

    Java Code:
    byte[] data = // some data...
    long hash = DataHasher.hash(data);
    Obviously in real life, to the class to be a static singleton, add a private constructor to the above class so outsiders can't instantiate it. Or maybe you have a reason for making it non-static, e.g. for multiple "hasher" instances with different starting parameters...

  6. #6
    fogus is offline Member
    Join Date
    Mar 2008
    Posts
    43
    Rep Power
    0

    Default

    OK, cool, so could I use it like this:

    Java Code:
    String fogus = "fogus";
    byte[] data = fogus;
    long hash = DataHasher.hash(data);
    ?

    I have to ask because Java doesn't work on my machine:
    The system cannot find the file specified.
    >java nlargest
    java.lang.NoClassDefFoundError: nlargest
    Caused by: java.lang.ClassNotFoundException: nlargest
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClassInternal(Unknown Source)
    Could not find the main class: nlargest. Program will exit.
    Exception in thread "main" >Exit code: 1
    from:
    Java Code:
    /****
    
    import java.util.ArrayList;
    import java.util.Scanner;
    import java.util.Collections;
    import java.lang.*;
    import java.io.*;
    
    public class nlargest{
       
       public static void main (String[] args){
          try{     // All code dealing with the contents of files is inside this try{
             
             Integer n = 5;     // Get n largest elements from the list.
             
             String file_name_in = "input1.txt";
             String file_name_out = "output1.txt";
             String input_file_name = file_name_in.substring(0, file_name_in.length() - 4);
             String output_file_name = file_name_out.substring(0, file_name_out.length() - 4);
             String solution_string = "";
             
             Scanner contents = new Scanner(new File(file_name_in));
             
             ArrayList<Integer> n_largest_elements = new ArrayList<Integer>();
             ArrayList<Integer> contents_by_line = new ArrayList<Integer>();
             
             while (contents.hasNext()){    // Creates an ArrayList of the lines in the input file, called contents_by_line
                Integer line = contents.nextInt();
                contents_by_line.add(line);
                //~ System.out.println(line);    // Prints the entire file, line by line
             }
             
             n_largest_elements = n_largest_from_list(n, contents_by_line);
             String n_largest_string = array_stringer(n_largest_elements, ", ");
             
             System.out.println("n_largest_string = " + n_largest_string);
             
             write_answer(file_name_out, n_largest_string);
             
          }
          catch(Exception e){     // Catch any exception in the code dealing with the conents of the file
             System.out.println("\nproblem opening file");}
       }
       
       public static ArrayList n_largest_from_list(Integer n, ArrayList<Integer> list_1){     // Retruns the n largest elements from a list list_1
          ArrayList<Integer> result_list = new ArrayList<Integer>();
          Integer line = 0;
          int index = 0;
          for (int i = 0; i < list_1.size(); i++){     // Search (binary) for an element of list_1 in result_list, then either insert the element or discard it
             line = list_1.get(i);
             index = Collections.binarySearch(result_list, line);
             //~ System.out.println("index = " + index);
             if (index < 0){
                result_list.add(-(index+1), line);}
             if (index > 0){
                result_list.add(index, line);}
             if (result_list.size() > n){
                result_list.remove(0);}
             //~ System.out.println("result_list = " + result_list);
                }
          return result_list;
       }
       
       public static ArrayList string_arrayer(String s, String breaker){    // Returns an ArrayList with elements from a String broken broken by breaker
          ArrayList<String> string_array= new ArrayList<String>();
          Integer last_end = 0;
          Integer s_length = s.length();
          Integer breaker_length = breaker.length();
          for (int i = 1; i < s_length - breaker_length; i++){
             String possible_breaker = s.substring(i, i + breaker_length);
             if (possible_breaker.equals(breaker)){
                string_array.add(s.substring(last_end, i));
                last_end = i + breaker_length;}}
          string_array.add(s.substring(last_end));
          return string_array;
       }
       
       public static String array_stringer(ArrayList array_1, String breaker){     // Returns a String with all the elements of an ArrayList, broken by breaker
          String stringed_array = "";
          Integer array_size = array_1.size();
          for (int i = 0; i < array_size; i++){     // Concatonate all strings from array_1
             stringed_array = stringed_array + array_1.get(i);
             if (i != array_size){     // If this is not the last addition, add a breaker
                stringed_array = stringed_array + breaker;}}
          return stringed_array;
       }
       
       public static Integer boolean_array_to_integer(ArrayList<Boolean> bools){      // Returns the unsigned Integer value of an ArrayList<Boolean>
          Integer base_ten_int = 0;
          Integer num_bools = bools.size();
          Integer temp_1 = 0;
          String boolean_string = "";
          for (int i = 0; i < num_bools; i++){      // Collect the boolean values in bools to a string of 0's and 1's
             Boolean this_bool = bools.get(i);
             if (this_bool){
                boolean_string = boolean_string + "1";}
             else{
                boolean_string = boolean_string + "0";}}
          base_ten_int = Integer.parseInt(boolean_string, 2);    // Parse the string out in base 2 (binary)
          return base_ten_int;
       }
       
       public static ArrayList<Boolean> integer_to_boolean_array(Integer int_1){     // Returns the ArrayList<Boolean> of an unsigned Integer
          ArrayList<Boolean> base_two_array = new ArrayList<Boolean>();
          String binary_string = Integer.toBinaryString(int_1);
          ArrayList<String> bit_array = string_arrayer(binary_string, "");
          for (int i = 0; i < bit_array.size(); i++){
             if ( bit_array.get(i).toString().equals("1")){
                base_two_array.add(true);}
             else{
                base_two_array.add(false);}
          }
          return base_two_array;
       }
       
       public static String clause_string_creator(ArrayList<String> names, ArrayList<Boolean> bools){     // Creates a string in the format needed to make a clause from the names and bools
          Integer num_names = names.size();
          String s = "";
          ArrayList<String> clauses = new ArrayList<String>();
          for (int i = 0; i < num_names; i++){
             if (bools.get(i)){
                s = names.get(i);}
             else{
                s = "!" + names.get(i);}
             clauses.add(s);}
          return array_stringer(clauses, " ");
       }
       
       public static void write_answer(String file_name, String answer){
          try{
             PrintWriter out_file;
             out_file = new PrintWriter(new FileWriter(file_name));
             out_file.println(answer);
             out_file.close();}
          catch(Exception e){
             System.out.println("problem writing to file");}
       }
    }
    ...in SciTE. This is why I hate Java: it never compiles! Now I have to spend hours and hours screwing around with my machine to try to get "classpath" to work. Every Java update...

    on the command line:

    Java Code:
    ...\Code\Java Test>javac nlargest.java
    'javac' is not recognized as an internal or external command,
    operable program or batch file.
    I just thought that they would have fixed that by now!
    ~fogus

  7. #7
    neilcoffey is offline Senior Member
    Join Date
    Nov 2008
    Posts
    286
    Rep Power
    6

    Default

    The hash function takes a byte array. So you need to convert the string into a byte array.

    Java Code:
    String str = "....";
    byte[] byteData = str.getBytes("UTF-8");
    and now you can pass byteData to hash().

    As a side note, if compiling and running Java from the command line is causing you headaches, I'd really suggest using an IDE such as Eclipse. But either way, you're going to have to sit down for half an hour and just learn either how the command line works or how the IDE works-- there's kind of no way round that.

  8. #8
    fogus is offline Member
    Join Date
    Mar 2008
    Posts
    43
    Rep Power
    0

    Default

    Forgive my arrogence, but what if I wanted to convert an int (or something other than a string)?

    Java Code:
    int my_int = 7;
    byte[] byte_data = int.getBytes(???);
    //;;;;;;;;;; (extra semi-colons, for good measure)
    What do I put for the "???"

    Thanks a million,
    ~fogus

  9. #9
    emceenugget is offline Senior Member
    Join Date
    Sep 2008
    Posts
    564
    Rep Power
    7

    Default

    seriously, why not just use the MessageDigest class?

  10. #10
    neilcoffey is offline Senior Member
    Join Date
    Nov 2008
    Posts
    286
    Rep Power
    6

    Default

    OK, getBytes() is a method specific to String.

    For other things, how you turn it into bytes basically depends on the object in question.

    For primitives (ints, floats, longs etc), one simple option-- although not the most efficient-- is first to call String.valueOf() on the number to get a string representation of it, then convert that string into a byte array and finally call hash() on that byte array. This will work on SOME objects, including BigInteger, although with BigInteger there's a more efficient way (see later):

    Java Code:
    public long hash(int n) {
      String str = String.valueOf(n);
      byte[] bytes = str.getBytes("UTF-8");
      return hash(bytes);
    }
    You could also use a ByteBuffer to help you. N.B. This will give a different hash value to the previous method, because the byte array we're hashing is now the 4 actual bytes that make up the int, not a string representation of it:

    Java Code:
    public long hash(int n) {
      ByteBuffer bb = ByteBuffer.allocate(4);
      bb.putInt(n);
      return hash(bb.array());
    }
    If you're familiar with bitwise operations (shits and AND), then you can extract the 4 bytes from the int "directly".

    Note that if the specific problem you want is to hash ints efficiently, then really you just want to "spread the bits around" a bit. So you could feed the int through a couple of XORShift cycles, something like this:

    Java Code:
    public long hash(int n) {
      long x = n;
      for (int i = 0; i < 4; i++) {
        x ^= (x << 21);
        x ^= (x >>> 35);
        x ^= (x << 4);
      }
      return x;
    }
    or even just multiplying the int by a fairly big prime to get its hash ought to be fine.

    By the way, in computing terms, the maximum value of a Java int (2^31-1) isn't "very big". But if you have a BigInteger, then calling hash() on the array returned by toByteArray() will work fine:

    Java Code:
    public long hash(BigInteger bint) {
      return hash(bint.toByteArray());
    }
    Last edited by neilcoffey; 03-18-2009 at 02:10 AM. Reason: missed code tags

  11. #11
    fogus is offline Member
    Join Date
    Mar 2008
    Posts
    43
    Rep Power
    0

    Default

    Thanks again!

    How big of a prime should I use? Something on the order of 2^31-1? Do you have any lying around that I could borrow (or a place that sells them)?

    "If you're familiar with bitwise operations (shits and AND)..."

    I know the AND one; not so sure about 'shits' though :P I'm not so hot with Java so I don't know how to extract them (but I do understand how the operations work).

    Cheers,
    ~fogus

  12. #12
    neilcoffey is offline Senior Member
    Join Date
    Nov 2008
    Posts
    286
    Rep Power
    6

    Default

    Re MessageDigst-- the reason that this may not be the most appropriate choice is essentially performance. Secure hash functions, as implemented by MessageDigst, are great when you want a slow-to-generate, wide (e.g. 120 or 160 bits) hash code in return for practical impossibility of a hash collision.

    With a hash table, what you usually want is a moderately-sized hash code (e.g. 32-64 bits) and reasonably fast computation, which you get in exchange for some likelihood of collisions. But to put things in perspective, with a good-quality 64-bit hash function, after putting a million items into the table there'd be something in the order of 1/30 million chance of a hash collision. A million items is often enough and a one in several million chance of a collision is often good enough odds to not need to store the keys in the table. (Note I'm talking about HASH collisions, not bucket collisions, which depend on the number of buckets and involve a much smaller number of buckets, hence much greater chance of collisions, though the underlying principles are the same.)

  13. #13
    neilcoffey is offline Senior Member
    Join Date
    Nov 2008
    Posts
    286
    Rep Power
    6

    Default

    Quote Originally Posted by fogus View Post
    Thanks again!
    How big of a prime should I use? Something on the order of 2^31-1? Do you have any lying around that I could borrow (or a place that sells them)?
    You could try the value of HMULT from the code above :-)

Similar Threads

  1. Hash Values ???
    By MuslimCoder in forum New To Java
    Replies: 6
    Last Post: 01-16-2009, 05:26 AM
  2. Hash Table Help
    By michael_mke in forum New To Java
    Replies: 3
    Last Post: 11-27-2008, 06:12 PM
  3. Need to filter the hash set which is dtoring my results
    By appi_usa in forum Advanced Java
    Replies: 0
    Last Post: 08-14-2008, 05:34 PM
  4. Question about hash tables
    By behrk2 in forum New To Java
    Replies: 2
    Last Post: 07-08-2008, 05:40 PM
  5. Hash Table help
    By rhm54 in forum New To Java
    Replies: 0
    Last Post: 02-08-2008, 02:25 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •