This class implements the soundex algorithm as described by Donald Knuth in Volume 3 of The Art of Computer Programming. The
algorithm is intended to hash words (in particular surnames) into a small space using a simple model which approximates the sound of
the word when spoken by an English speaker. Each word is reduced to a four character string, the first character being an upper case
letter and the remaining three being digits. Double letters are collapsed to a single digit.


Knuth's examples of various names and the soundex codes they map to are:

Euler, Ellery -> E460
Gauss, Ghosh -> G200
Hilbert, Heilbronn -> H416
Knuth, Kant -> K530
Lloyd, Ladd -> L300
Lukasiewicz, Lissajous -> L222


As the soundex algorithm was originally used a long time ago in the United States of America, it uses only the English alphabet
and pronunciation.

As it is mapping a large space (arbitrary length strings) onto a small space (single letter plus 3 digits) no inference can be made
about the similarity of two strings which end up with the same soundex code. For example, both "Hilbert" and "Heilbronn" end up
with a soundex code of "H416".

The soundex() method is static, as it maintains no per-instance state; this means you never need to instantiate this class.

Java Code:
public class Soundex {

  /* Implements the mapping
   * to:   00000000111122222222334556
  public static final char[] MAP = {
    //A  B   C   D   E   F   G   H   I   J   K   L   M
    //N  O   P   W   R   S   T   U   V   W   X   Y   Z

  /** Convert the given String to its Soundex code.
   * @return null If the given string can't be mapped to Soundex.
  public static String soundex(String s) {

    // Algorithm works on uppercase (mainframe era).
    String t = s.toUpperCase();

    StringBuffer res = new StringBuffer();
    char c, prev = '?';

    // Main loop: find up to 4 chars that map.
    for (int i=0; i<t.length() && res.length() < 4 &&
      (c = t.charAt(i)) != ','; i++) {

      // Check to see if the given character is alphabetic.
      // Text is already converted to uppercase. Algorithm
      // only handles ASCII letters, do NOT use Character.isLetter()!
      // Also, skip double letters.
      if (c>='A' && c<='Z' && c != prev) {
        prev = c;

        // First char is installed unchanged, for sorting.
        if (i==0)
        else {
          char m = MAP[c-'A'];
          if (m != '0')
    if (res.length() == 0)
      return null;
    for (int i=res.length(); i<4; i++)
    return res.toString();
  /** main */
  public static void main(String[] args) { 
    String[] names = {
      "Darwin, Ian",
      "Davidson, Greg",
      "Darwent, William",
      "Derwin, Daemon"
    for (int i = 0; i< names.length; i++)
      System.out.println(Soundex.soundex(names[i]) + ' ' + names[i]);