# Thread: A beginner's question on String matching

1. Member
Join Date
May 2010
Posts
13
Rep Power
0

## A beginner's question on String matching

Hello everyone, as you might see, I'm new here, and in general pretty much a newbie when it comes to Java.

I'm working on a school assignment, and was wondering if anyone could help me find a simple solution to a problem that seems complicated at first sight. What I'm aiming to do is find the fastest way to match a random jumble of letters (or a subset of said jumble, but that can be dealt with easily once the general method is at hand) with about a zillion entries in a dictionary.
The aim of this is to find a quick solution for a certain game show, where one is presented with a bunch of letters in random order, and instructed to find the longest word possible with said letters.

The long, difficult, and most of all (to me) dark way to do this, is take all possible combinations of the letters and try to match those exactly with the entries in the dictionary. Not only do I not clearly see the method of how to do this, but it also seems to me like it would take an awfully long time.

My intuitive solution was then, just to match the content of the jumbled-up word to every word in a dictionary, in a way which would not take into account the actual order of the letters in the word. If there should be such a match, i would have saved myself quite a few hundred thousand tests, not to mention processing time. I'd then just have to return the word in the dictionary that matched the random sequence of letters, and voilą!

Any help on this topic would be highly appreciated, and if my magical solution is impossible, I'd appreciate help on the sluggish first method as well!

Cheers, Pete
Last edited by nassar; 05-22-2010 at 04:17 PM. Reason: typos

2. You have to change your dictionary to a map: the keys of the map are a bunch of letters in alphabetical order and the values of the map are the words that are anagrams of the key. If you want to look up a word, sort the letters in the word to make a key and look up the key in your dictionary. If the key is found the value contains the original word you wanted to look up.

e.g for the word "banana" you have a key "aaabnn" and the value list contains the word "banana" (and probably no other words because I don't know of any other words with the "banana" letters in it ;-)

kind regards,

Jos

3. Member
Join Date
May 2010
Posts
13
Rep Power
0
@JosAH:

You sir, are a life saver. Thank you ever so much, I think this is completely IDEAL for what I'm trying to work with here! Thanks so much again!

On to a theoretical (maximal) amount of 9! (that's a factorial :p) tests for all the substrings, should the initial test fail. Joy. Is there any way to cut that number down?

4. Member
Join Date
May 2010
Posts
13
Rep Power
0
Back with an update, I'm really having trouble with the substrings of the main string, contrary to anything I might have said earlier today.. This seems to be the hardest part of my algorithm and i can't seem to nail it properly..

I've been working hard for the past 2 hours, trying to get this thing to work, but to no avail.. What i'm trying to do is generate all possible permutations of the substrings of a string where order is non-important in general, and then applying a matching method to those recursively..

First, what i mean by "substrings of a string where order is non-important" is this: for example for "ABCD" we have "ABC", "ABD", "ACD", "AD", "AC", "AB" "BC", "BD", "CD", "A", "B", "C", "D" or at least so I think, but I might have missed a couple or put in a couple too many... the principle remains the same.

This is a snippet of the code I've got so far:

Java Code:
``` public String findWord(String letters){

letters=letters.toUpperCase();
char[] letArr= letters.toCharArray();
String resultt=null;
String alphaLet= alphabetize(letters);

if(isInDict(alphaLet)){

result= wordFoundInDict;

}else{

String tempStr=null;
int des= NBLETTERS;

for(int i=0;i<NBLETTERS-1;i++){

for(int j=0;j<des-1;j++){

while(j!=i){

tempStr+=letArr[j];
}

}

tempStr=null;
des--;
findWord(tempStr);

}

}

return result;
}```
I can sense there's something wrong with my 2 for loops, but I just can't put my finger on it :/
Does anyone have advice for a poor soul? :/

-Pete
Last edited by nassar; 05-22-2010 at 11:12 PM.

5. generate all possible permutations of the substrings of a string
What does your code output now? What does it generate and what does it miss?
Use println()s to show the input and all the output.

6. Senior Member
Join Date
Apr 2010
Location
Posts
278
Rep Power
4
Don't you think that when you work with these words it is better to use
regular expressions?
Or maybe try with Perl, or some lexical analyzer for example flex.

7. Member
Join Date
May 2010
Posts
13
Rep Power
0
@Norm:
Hmm, it's funny you should say that... I tried prinln()ing my tempStr since that's what's being matched, and I got nothing whatsoever, it seems to be standing in an infinite loop... :/
@cselic:
Where would you have preferred to use regExps? :/ I tried dabbling with them earlier, and they were yielding nothing that seemed to be of much use
Last edited by nassar; 05-22-2010 at 11:39 PM.

8. If you change your dictionary as I have decsribed there is no need to generate all permutations of a word. Just sort the letters of the words in alphabetical order and store those as the keys of the anagrams of the same words in the dictionary, e.g.

aaabnn -> banana
loop -> loop, polo, pool
etc.

kind regards,

Jos

9. Senior Member
Join Date
Mar 2010
Posts
953
Rep Power
5
Wouldn't it be nice if you had some sort of Lexicon object that had a boolean isWord(String testString) and a boolean isPrefix(String testString) method? The Lexicon object could store a HashMap<String, String> where the key is a letter combination, and the value is "P" for prefix, "W" for word, or "PW" for both. Of course if a string is neither a word nor a prefix, it would not be in the HashMap at all. The Lexicon's constructor should probably take a file name as a parameter, and should know how to build its HashMap from a list of words in a text file.

-Gary-

10. I got nothing whatsoever
That would say that your println() was not being executed. Add more println()s to show all variables as they're set and changed.

Put delimiters around Strings when printing them to show empty strings: (">" + str + "<")

tempStr=null;
des--;
findWord(tempStr)
Why pass null to findWord() ??

11. Member
Join Date
May 2010
Posts
13
Rep Power
0
@JosAH:
Well, I have actually taken your advice; my dictionary is now an instance of a Map class, complete with an alphabetized index key and Lists of strings as values, containing all anagrams. The problem is, the game starts with 9 provided letters. What i can do now is test those 9 letters against the dictionary, and it will give me an instantaneous (<1sec) response if the jumble of 9 letters is indexed in the dictionary. However, if and when the first test fails, I'm left trying to make as many possible different 8-letter words out of the 9 initial letters, and trying to match those against my dictionary; the same process has to be repeated if no match is found for any 8-letter word, and so on.

As you correctly stated, words in which the same content is repeated are now redundant, which is why I said "substrings where order is unimportant" in my follow-up post". This does not, however, exclude most of the other combinations which would only be part of the original word and therefore have a different index in the dictionary-map. If i were capable of generating these substrings, I'd have a cake of a time looking them up in the dictionary, but it's the actual generation process that's causing me trouble at the moment :/

@gcalvin:
Your idea seems sound to me, and well, it's more aimed at speed than other things, and as you will have read above^, the way i have my dictionary indexed is quick enough, especially since the player will have a 30 second timer before the computer has to spring its own, perfect, answer upon him.
I just caught myself thinking "why wouldn't I just do it for the hell of it", but seeing as I'm doing all this in French, the applications for a new object with its own Prefix/Word map are limited.. Add that to the fact that my deadline is tomorrow at midnight, and you've got yourself a "maybe i should first make the rest of my code work first" type of response from me :/ but the help is much appreciated!

Update: I am turning in an infinite loop... it seems my while loop hasn't been appreciated.. it stays stuck on one letter and doesn't budge, anyone care to throw me a bone? :/

&@Norm:
Well that's not really how i saw the algorithm, but the way you put it now makes it clear I'm doing something very wrong...
The way I wanted my algorithm to run was:

For all values of i,
For all values of j DIFFERENT FROM i,
add a character fom the charArray to the temporary string

and i wanted to test findWord on the temporary String whenever the inside (j) loop ended, and then reinitialize it for the same reason; that way I'd have a tempStr that's 8 characters long the first time, all the time.. and what I'd been trying to do is test the temp string against the dictionary at every inside loop, and once all the inside loops where done, i'd take one character away from the temporary string and then start the whole process running again.. Obviously, the results are different from reality :/

-Pete
Last edited by nassar; 05-23-2010 at 12:39 AM.

12. Senior Member
Join Date
Mar 2010
Posts
953
Rep Power
5
You're trying to make your findWord() method recursive. First off, you should probably alphabetize your String before findWord() gets it, or else you're wasting effort in the recursive calls by alphabetizing it again.

After that, your loop should be a simple one -- remove one letter at a time from your string, and send the remaining string to another findWord() call. You need to keep track of the longest word returned (it's not necessarily the first) and you need to return a null if you don't get any matches, or if you are passed an empty string. You don't need to remove more than one letter from your string in this top-level findWord() -- your recursive calls will take care of that.

-Gary-

13. Member
Join Date
May 2010
Posts
13
Rep Power
0
gcalvin:
gddamnit thanks man, while I was thinking this stupid thing up I kept telling myself it had a strong whiff of recursiveness about it, I guess I just needed someone to take a step back and see the general picture for me to work it out. I'm going to try to fix this straight away. Can't thank you enough!

14. Member
Join Date
May 2010
Posts
13
Rep Power
0
Little update:

After a complete rethinking of my findWord() method, here's what used to be the double for loop:
Java Code:
```        result="";
String tempSol="";
String tempStr="";
for(int i=NBLETTERS-1; i>0;i--){
tempStr=letters.substring(0,i);
//System.out.println(tempStr);
tempSol=findWord(tempStr);
if(tempSol.length()>motResultant.length()){
result=tempSol;
}

}```
It looks nice and simple, and most of all, it looks like something that should run. That is not the case, however. When I'm in this configuration, the test starts at letters with one character missing (which is normal since the end index of substring() is exclusive) and then it starts losing characters, one by one, until it gives me an ugly StringIndexOutOfBoundsException and dies on me :/

If on the other hand i put the condition to NBLETTERS and not NBLETTERS-1, the code loops infinitely around the original String letters. I have no idea why this is, and am completely stumped. Anyone see the obvious newb mistake?

thanks
-Pete
Last edited by nassar; 05-23-2010 at 02:38 AM.

15. Senior Member
Join Date
Mar 2010
Posts
953
Rep Power
5
Take a good look at my last post again. Let's say you're starting with "ABCDE". Your loop will do findWord("E"), then findWord("DE"), then findWord("CDE"), etc. This is not what you want. You should be doing findWord("BCDE"), then findWord("ACDE"), then findWord("ABDE"), then findWord("ABCD"). That's all. You should return the longest word you get, or null if you got no matches. If you start with a five-character string, you should only be calling findWord() with four-letter strings. Then each call you make to findWord() will do its own calls with three-letter strings (if it needs to), and so on.

-Gary-

16. Member
Join Date
May 2010
Posts
13
Rep Power
0
Is there a way to acheive this while not using a double for loop? If so, I don't see it, unless you want to use a clumsy mix of substring and concat statements :/

17. Senior Member
Join Date
Mar 2010
Posts
953
Rep Power
5
Originally Posted by nassar
Is there a way to acheive this while not using a double for loop? If so, I don't see it, unless you want to use a clumsy mix of substring and concat statements :/
Well, I don't know if it needs to be clumsy:
Java Code:
```        public String findWord(String letters) {
// TODO: return void if 0-length String
// TODO: return word if found in dict
String longestWord = "";
int longestLength = 0;
for (int i = 0; i < letters.length(); i++) {
String result = findWord(letters.substring(0, i) + letters.substring(i + 1));
if (result.length() > longestLength) {
longestLength = result.length();
longestWord = result;
}
}
if (longestLength > 0) {
return longestWord;
}
return null;
}```
Not compiled, bug-checked or tested.

-Gary-

18. Member
Join Date
May 2010
Posts
13
Rep Power
0
Thanks gcalvin!

I've got the same exact code running as you gave me, and as a method 'header' I've got this:

Java Code:
```    public String findWord(String letters){

// TODO: if the method is provided with a 0-length string, it returns null
if(letters.length()==0){
return null;
}
// rest of code...

}```
However i keep hitting on a NullPointerException in my code.. I got frustrated and printed out the parameter i keep passing to the method when it gets into the recursive bit of the code, and here's a test run:

Java Code:
```
//Test run
//Printing the parameter returned recursively to the method + its length

BCDEFG
6
CDEFG
5
DEFG
4
EFG
3
FG
2
G
1

0```
This is when it gives me a NullPointerException.. What i don't understand is I've defined a limit-condition, that the function should return null exactly when the String it's provided with is of length 0.. :/

19. Senior Member
Join Date
Mar 2010
Posts
953
Rep Power
5
Well, I warned you that I didn't bug-check it. One thing is I forgot to check for a null after the findWord() call. You need to do that.

-Gary-

20. Member
Join Date
May 2010
Posts
13
Rep Power
0
Hehe it was heeded :D at one point i tried this condition in my for loop:

Java Code:
`if (result.length() > longestLength) {}`
Java Code:
` if(result!=null && result.length > longestLength){}`
but for some reason that halved the number of queries to my method, which meant I had a much worse chance of actually locating a word in the dictionary with it :/

back to tests, thanks again for the help Gary!

Page 1 of 2 12 Last

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•